Create PySpark dataframe from nested dictionary

26 July 2024

1

In this article, we are going to discuss the creation of Pyspark dataframe from the nested dictionary.

We will use the createDataFrame() method from pyspark for creating DataFrame. For this, we will use a list of nested dictionary and extract the pair as a key and value. Select the key, value pairs by mentioning the items() function from the nested dictionary

[Row(**{'': k, **v}) for k,v in data.items()]

Example 1:Python program to create college data with a dictionary with nested address in dictionary

Python3

# importing module 
import pyspark 
  
# importing sparksession from pyspark.sql module 
from pyspark.sql import SparkSession 
from pyspark.sql import Row 
  
# creating sparksession and giving an app name 
spark = SparkSession.builder.appName('sparkdf').getOrCreate() 
  
# creating nested dictionary 
data = { 
    'student_1': { 
        'student id': 7058, 
        'country': 'India', 
        'state': 'AP', 
        'district': 'Guntur'
    }, 
    'student_2': { 
        'student id': 7059, 
        'country': 'Srilanka', 
        'state': 'X', 
        'district': 'Y'
    } 
} 
  
# taking row data 
rowdata = [Row(**{'': k, **v}) for k, 
           v in data.items()] 
  
# creating the pyspark dataframe 
final = spark.createDataFrame(rowdata).select( 
  'student id', 'country', 'state', 'district') 
  
# display pyspark dataframe 
final.show() 

Output:

+----------+--------+-----+--------+
|student id| country|state|district|
+----------+--------+-----+--------+
|      7058|   India|   AP|  Guntur|
|      7059|Srilanka|    X|       Y|
+----------+--------+-----+--------+

Example 2: Python program to create nested dictionaries with 3 columns(3 keys)

Python3

# importing module 
import pyspark 
  
# importing sparksession from pyspark.sql module 
from pyspark.sql import SparkSession 
from pyspark.sql import Row 
  
# creating sparksession and giving an app name 
spark = SparkSession.builder.appName('sparkdf').getOrCreate() 
  
# creating nested dictionary 
data = { 
    'student_1': { 
        'student id': 7058, 
        'country': 'India', 
        'state': 'AP'
    }, 
    'student_2': { 
        'student id': 7059, 
        'country': 'Srilanka', 
        'state': 'X'
  
    } 
} 
  
# taking row data 
rowdata = [Row(**{'': k, **v}) for k, v in data.items()] 
  
# creating the pyspark dataframe 
final = spark.createDataFrame(rowdata).select( 
  'student id', 'country', 'state') 
  
# display pyspark dataframe 
final.show() 

Output:

+----------+--------+-----+
|student id| country|state|
+----------+--------+-----+
|      7058|   India|   AP|
|      7059|Srilanka|    X|
+----------+--------+-----+

Create PySpark dataframe from nested dictionary

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to Secure Your Network-Attached Storage (NAS) in 2024 by Tyler Cross

8 Best Private Search Engines in 2024: Tested by Experts by Tyler Cross

The biggest comeback in tech history [Video]

Google wants to hear your thoughts on the Android 15 QPR2 Beta

Recent Comments

EDITOR PICKS

How to Secure Your Network-Attached Storage (NAS) in 2024 by Tyler Cross

8 Best Private Search Engines in 2024: Tested by Experts by Tyler Cross

The biggest comeback in tech history [Video]

POPULAR POSTS

How to Secure Your Network-Attached Storage (NAS) in 2024 by Tyler Cross

8 Best Private Search Engines in 2024: Tested by Experts by Tyler Cross

The biggest comeback in tech history [Video]

POPULAR CATEGORY

ABOUT US

FOLLOW US