Friday, November 1, 2024
Google search engine
HomeLanguagesConvert Python Dictionary List to PySpark DataFrame

Convert Python Dictionary List to PySpark DataFrame

In this article, we will discuss how to convert Python Dictionary List to Pyspark DataFrame.

It can be done in these ways:

  • Using Infer schema.
  • Using Explicit schema
  • Using SQL Expression

Method 1: Infer schema from the dictionary

We will pass the dictionary directly to the createDataFrame() method.

Syntax: spark.createDataFrame(data)

Example: Python code to create pyspark dataframe from dictionary list using this method

Python3




# import the modules
from pyspark.sql import SparkSession
  
# Create Spark session app name
# is GFG and master name is local
spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate()
  
# dictionary list of college data
data = [{"Name": 'sravan kumar',
         "ID": 1,
         "Percentage": 94.29},
        {"Name": 'sravani',
         "ID": 2,
         "Percentage": 84.29},
        {"Name": 'kumar',
         "ID": 3,
         "Percentage": 94.29}
        ]
  
# Create data frame from dictionary list
df = spark.createDataFrame(data)
  
# display
df.show()


Output:

Method 2: Using Explicit schema

Here we are going to create a schema and pass the schema along with the data to createdataframe() method.

Schema structure:

schema = StructType([

   StructField(‘column_1’, DataType(), False),

   StructField(‘column_2’, DataType(), False)])

Where columns are the name of the columns of the dictionary to get in pyspark dataframe and Datatype is the data type of the particular column.

Syntax: spark.createDataFrame(data, schema)

Where, 

  • data is the dictionary list
  • schema is the schema of the dataframe

Python program to create pyspark dataframe from dictionary lists using this method.

Python3




# import the modules
from pyspark.sql import SparkSession
from pyspark.sql.types import StructField, StructType,
StringType, IntegerType, FloatType
  
  
# Create Spark session app name is
# GFG and master name is local
spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate()
  
# dictionary list of college data
data = [{"Name": 'sravan kumar',
         "ID": 1,
         "Percentage": 94.29},
        {"Name": 'sravani',
         "ID": 2,
         "Percentage": 84.29},
        {"Name": 'kumar',
         "ID": 3,
         "Percentage": 94.29}
        ]
  
# specify the schema
schema = StructType([
    StructField('Name', StringType(), False),
    StructField('ID', IntegerType(), False),
    StructField('Percentage', FloatType(), True)
])
  
# Create data frame from
# dictionary list through the schema
df = spark.createDataFrame(data, schema)
  
# display
df.show()


Output:

Method 3: Using SQL Expression

Here we are using the Row function to convert the python dictionary list to pyspark dataframe.

Syntax: spark.createDataFrame([Row(**iterator) for iterator in data])

where: 

  • createDataFrame() is the method to create the dataframe
  • Row(**iterator) to iterate the dictionary list.
  • data is the dictionary list

Python code to convert dictionary list to pyspark dataframe.

Python3




# import the modules
from pyspark.sql import SparkSession, Row
  
# Create Spark session app name
# is GFG and master name is local
spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate()
  
# dictionary list of college data
data = [{"Name": 'sravan kumar',
         "ID": 1,
         "Percentage": 94.29},
        {"Name": 'sravani',
         "ID": 2,
         "Percentage": 84.29},
        {"Name": 'kumar',
         "ID": 3,
         "Percentage": 94.29}
        ]
  
# create dataframe using sql expression
dataframe = spark.createDataFrame([Row(**variable) 
                                   for variable in data])
  
dataframe.show()


Output:

Dominic Rubhabha-Wardslaus
Dominic Rubhabha-Wardslaushttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Recent Comments