In this article, we are going to convert JSON String to DataFrame in Pyspark.
Method 1: Using read_json()
We can read JSON files using pandas.read_json. This method is basically used to read JSON files through pandas.
Syntax: pandas.read_json(“file_name.json”)
Here we are going to use this JSON file for demonstration:
Code:
Python3
# import pandas to read json file import pandas as pd # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # creating a dataframe from the json file named student dataframe = spark.createDataFrame(pd.read_json( 'student.json' )) # display the dataframe (Pyspark dataframe) dataframe.show() |
Output:
Method 2: Using spark.read.json()
This is used to read a json data from a file and display the data in the form of a dataframe
Syntax: spark.read.json(‘file_name.json’)
JSON file for demonstration:
Code:
Python3
# importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # read json file data = spark.read.json( 'college.json' ) # display json data data.show() |
Output: