Convert Python Dictionary List to PySpark DataFrame

By Dominic Rubhabha-Wardslaus

26 July 2024

0

4

In this article, we will discuss how to convert Python Dictionary List to Pyspark DataFrame.

It can be done in these ways:

Using Infer schema.
Using Explicit schema
Using SQL Expression

Method 1: Infer schema from the dictionary

We will pass the dictionary directly to the createDataFrame() method.

Syntax: spark.createDataFrame(data)

Example: Python code to create pyspark dataframe from dictionary list using this method

Python3

# import the modules 
from pyspark.sql import SparkSession 
  
# Create Spark session app name 
# is GFG and master name is local 
spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate() 
  
# dictionary list of college data 
data = [{"Name": 'sravan kumar', 
         "ID": 1, 
         "Percentage": 94.29}, 
        {"Name": 'sravani', 
         "ID": 2, 
         "Percentage": 84.29}, 
        {"Name": 'kumar', 
         "ID": 3, 
         "Percentage": 94.29} 
        ] 
  
# Create data frame from dictionary list 
df = spark.createDataFrame(data) 
  
# display 
df.show() 

Output:

Method 2: Using Explicit schema

Here we are going to create a schema and pass the schema along with the data to createdataframe() method.

Schema structure:

schema = StructType([

StructField(‘column_1’, DataType(), False),

StructField(‘column_2’, DataType(), False)])

Where columns are the name of the columns of the dictionary to get in pyspark dataframe and Datatype is the data type of the particular column.

Syntax: spark.createDataFrame(data, schema)

Where,

data is the dictionary list

schema is the schema of the dataframe

Python program to create pyspark dataframe from dictionary lists using this method.

Python3

# import the modules 
from pyspark.sql import SparkSession 
from pyspark.sql.types import StructField, StructType, 
StringType, IntegerType, FloatType 
  
  
# Create Spark session app name is 
# GFG and master name is local 
spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate() 
  
# dictionary list of college data 
data = [{"Name": 'sravan kumar', 
         "ID": 1, 
         "Percentage": 94.29}, 
        {"Name": 'sravani', 
         "ID": 2, 
         "Percentage": 84.29}, 
        {"Name": 'kumar', 
         "ID": 3, 
         "Percentage": 94.29} 
        ] 
  
# specify the schema 
schema = StructType([ 
    StructField('Name', StringType(), False), 
    StructField('ID', IntegerType(), False), 
    StructField('Percentage', FloatType(), True) 
]) 
  
# Create data frame from 
# dictionary list through the schema 
df = spark.createDataFrame(data, schema) 
  
# display 
df.show() 

Output:

Method 3: Using SQL Expression

Here we are using the Row function to convert the python dictionary list to pyspark dataframe.

Syntax: spark.createDataFrame([Row(**iterator) for iterator in data])

where:

createDataFrame() is the method to create the dataframe

Row(**iterator) to iterate the dictionary list.

data is the dictionary list

Python code to convert dictionary list to pyspark dataframe.

Python3

# import the modules 
from pyspark.sql import SparkSession, Row 
  
# Create Spark session app name 
# is GFG and master name is local 
spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate() 
  
# dictionary list of college data 
data = [{"Name": 'sravan kumar', 
         "ID": 1, 
         "Percentage": 94.29}, 
        {"Name": 'sravani', 
         "ID": 2, 
         "Percentage": 84.29}, 
        {"Name": 'kumar', 
         "ID": 3, 
         "Percentage": 94.29} 
        ] 
  
# create dataframe using sql expression 
dataframe = spark.createDataFrame([Row(**variable)  
                                   for variable in data]) 
  
dataframe.show() 

Output:

Convert Python Dictionary List to PySpark DataFrame

Python3

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Interview With Antonio Espinosa – CEO and Founder of NomorePass by Shauli Zacks

What Is the 5 Eyes Alliance? Complete 2024 Guide by Tim Mocan

5 Best VPNs for RaiPlay in 2024: Fast & Easy to Use by Gjurgjica Panova

How to Watch MLB Games From Anywhere in 2024 by Todd Faulk

Recent Comments

EDITOR PICKS

Interview With Antonio Espinosa – CEO and Founder of NomorePass by Shauli Zacks

What Is the 5 Eyes Alliance? Complete 2024 Guide by Tim Mocan

5 Best VPNs for RaiPlay in 2024: Fast & Easy to Use by Gjurgjica Panova

POPULAR POSTS

Interview With Antonio Espinosa – CEO and Founder of NomorePass by Shauli Zacks

What Is the 5 Eyes Alliance? Complete 2024 Guide by Tim Mocan

5 Best VPNs for RaiPlay in 2024: Fast & Easy to Use by Gjurgjica Panova

POPULAR CATEGORY

ABOUT US

FOLLOW US