In this article, we are going to convert JSON String to DataFrame in Pyspark.
Method 1: Using read_json()
We can read JSON files using pandas.read_json. This method is basically used to read JSON files through pandas.
Syntax: pandas.read_json(“file_name.json”)
Here we are going to use this JSON file for demonstration:
Code:
Python3
# import pandas to read json file import pandas as pd # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # creating a dataframe from the json file named student dataframe = spark.createDataFrame(pd.read_json('student.json')) # display the dataframe (Pyspark dataframe) dataframe.show() |
Output:
Method 2: Using spark.read.json()
This is used to read a json data from a file and display the data in the form of a dataframe
Syntax: spark.read.json(‘file_name.json’)
JSON file for demonstration:
Code:
Python3
# importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # read json file data = spark.read.json('college.json') # display json data data.show() |
Output:

