In this article, we are going to convert the Pyspark dataframe into a list of tuples.
The rows in the dataframe are stored in the list separated by a comma operator. So we are going to create a dataframe by using a nested list
Creating Dataframe for demonstration:
Python3
# importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of students data data = [[ "1" , "sravan" , "vignan" , 67 , 89 ], [ "2" , "ojaswi" , "vvit" , 78 , 89 ], [ "3" , "rohith" , "vvit" , 100 , 80 ], [ "4" , "sridevi" , "vignan" , 78 , 80 ], [ "1" , "sravan" , "vignan" , 89 , 98 ], [ "5" , "gnanesh" , "iit" , 94 , 98 ]] # specify column names columns = [ 'student ID' , 'student NAME' , 'college' , 'subject1' , 'subject2' ] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) # display dataframe.show() |
Output:
Method 1: Using collect() method
By converting each row into a tuple and by appending the rows to a list, we can get the data in the list of tuple format.
tuple(): It is used to convert data into tuple format
Syntax: tuple(rows)
Example: Converting dataframe into a list of tuples.
Python3
# define a list l = [] # collect data from the dataframe for i in dataframe.collect(): l.append( tuple (i)) # convert to tuple and append to list # print list of data print (l) |
Output:
[(‘1’, ‘sravan’, ‘vignan’, 67, 89), (‘2’, ‘ojaswi’, ‘vvit’, 78, 89),
(‘3’, ‘rohith’, ‘vvit’, 100, 80), (‘4’, ‘sridevi’, ‘vignan’, 78, 80),
(‘1’, ‘sravan’, ‘vignan’, 89, 98), (‘5’, ‘gnanesh’, ‘iit’, 94, 98)]
Method 2: Using tuple() with rdd
Convert rdd to a tuple using map() function, we are using map() and tuple() functions to convert from rdd
Syntax: rdd.map(tuple)
Example: Using RDD
Python3
# convert dataframe to rdd rdd = dataframe.rdd # convert rdd to tuple data = rdd. map ( tuple ) # display data data.collect() |
Output:
[('1', 'sravan', 'vignan', 67, 89), ('2', 'ojaswi', 'vvit', 78, 89), ('3', 'rohith', 'vvit', 100, 80), ('4', 'sridevi', 'vignan', 78, 80), ('1', 'sravan', 'vignan', 89, 98), ('5', 'gnanesh', 'iit', 94, 98)]