In this article, we are going to discuss the creation of Pyspark dataframe from the dictionary. To do this spark.createDataFrame() method method is used. This method takes two argument data and columns. The data attribute will contain the dataframe and the columns attribute will contain the list of columns name.
Example 1: Python code to create the student address details and convert them to dataframe
Python3
# importing module import pyspark # importing sparksession from # pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving # an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of college data with dictionary data = [{ 'student_id' : 12 , 'name' : 'sravan' , 'address' : 'kakumanu' }] # creating a dataframe dataframe = spark.createDataFrame(data) # show data frame dataframe.show() |
Output:
Example2: Create three dictionaries and pass them to the data frame in pyspark
Python3
# importing module import pyspark # importing sparksession from # pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving # an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of college data with dictionary # with three dictionaries data = [{ 'student_id' : 12 , 'name' : 'sravan' , 'address' : 'kakumanu' }, { 'student_id' : 14 , 'name' : 'jyothika' , 'address' : 'tenali' }, { 'student_id' : 11 , 'name' : 'deepika' , 'address' : 'repalle' }] # creating a dataframe dataframe = spark.createDataFrame(data) # show data frame dataframe.show() |
Output: