In this article, we are going to convert Row into a list RDD in Pyspark.
Creating RDD from Row for demonstration:
Python3
# import Row and SparkSession from pyspark.sql import SparkSession, Row # create sparksession spark = SparkSession.builder.appName( 'SparkByExamples.com' ).getOrCreate() # create student data with Row function data = [Row(name = "sravan kumar" , subjects = [ "Java" , "python" , "C++" ], state = "AP" ), Row(name = "Ojaswi" , lang = [ "Spark" , "Java" , "C++" ], state = "Telangana" ), Row(name = "rohith" , subjects = [ "DS" , "PHP" , ".net" ], state = "AP" ), Row(name = "bobby" , lang = [ "Python" , "C" , "sql" ], state = "Delhi" ), Row(name = "rohith" , lang = [ "CSharp" , "VB" ], state = "Telangana" )] rdd = spark.sparkContext.parallelize(data) # display actual rdd rdd.collect() |
Output:
[Row(name='sravan kumar', subjects=['Java', 'python', 'C++'], state='AP'), Row(name='Ojaswi', lang=['Spark', 'Java', 'C++'], state='Telangana'), Row(name='rohith', subjects=['DS', 'PHP', '.net'], state='AP'), Row(name='bobby', lang=['Python', 'C', 'sql'], state='Delhi'), Row(name='rohith', lang=['CSharp', 'VB'], state='Telangana')]
Using map() function we can convert into list RDD
Syntax: rdd_data.map(list)
where, rdd_data is the data is of type rdd.
Finally, by using the collect method we can display the data in the list RDD.
Python3
# convert rdd to list by using map() method b = rdd. map ( list ) # display the data in b with collect method for i in b.collect(): print (i) |
Output:
['sravan kumar', ['Java', 'python', 'C++'], 'AP'] ['Ojaswi', ['Spark', 'Java', 'C++'], 'Telangana'] ['rohith', ['DS', 'PHP', '.net'], 'AP'] ['bobby', ['Python', 'C', 'sql'], 'Delhi'] ['rohith', ['CSharp', 'VB'], 'Telangana']