In this article, we are going to see how to create an empty PySpark dataframe. Empty Pysaprk dataframe is a dataframe containing no data and may or may not specify the schema of the dataframe.
Creating an empty RDD without schema
We’ll first create an empty RDD by specifying an empty schema.
- emptyRDD() method creates an RDD without any data.
- createDataFrame() method creates a pyspark dataframe with the specified data and schema of the dataframe.
Code:
Python3
# Import necessary libraries from pyspark.sql import SparkSession from pyspark.sql.types import * # Create a spark session spark = SparkSession.builder.appName( 'Empty_Dataframe' ).getOrCreate() # Create an empty RDD emp_RDD = spark.sparkContext.emptyRDD() # Create empty schema columns = StructType([]) # Create an empty RDD with empty schema data = spark.createDataFrame(data = emp_RDD, schema = columns) # Print the dataframe print ( 'Dataframe :' ) data.show() # Print the schema print ( 'Schema :' ) data.printSchema() |
Output:
Dataframe : ++ || ++ ++ Schema : root
Creating an emptyRDD with schema
It is possible that we will not get a file for processing. However, we must still manually create a DataFrame with the appropriate schema.
- Specify the schema of the dataframe as columns = [‘Name’, ‘Age’, ‘Gender’].
- Create an empty RDD with an expecting schema.
Code:
Python3
# Import necessary libraries from pyspark.sql import SparkSession from pyspark.sql.types import * # Create a spark session spark = SparkSession.builder.appName( 'Empty_Dataframe' ).getOrCreate() # Create an empty RDD emp_RDD = spark.sparkContext.emptyRDD() # Create an expected schema columns = StructType([StructField( 'Name' , StringType(), True ), StructField( 'Age' , StringType(), True ), StructField( 'Gender' , StringType(), True )]) # Create an empty RDD with expected schema df = spark.createDataFrame(data = emp_RDD, schema = columns) # Print the dataframe print ( 'Dataframe :' ) df.show() # Print the schema print ( 'Schema :' ) df.printSchema() |
Output :
Dataframe : +----+---+------+ |Name|Age|Gender| +----+---+------+ +----+---+------+ Schema : root |-- Name: string (nullable = true) |-- Age: string (nullable = true) |-- Gender: string (nullable = true)
Creating an empty dataframe without schema
- Create an empty schema as columns.
- Specify data as empty([]) and schema as columns in CreateDataFrame() method.
Code:
Python3
# Import necessary libraries from pyspark.sql import SparkSession from pyspark.sql.types import * # Create a spark session spark = SparkSession.builder.appName( 'Empty_Dataframe' ).getOrCreate() # Create an empty schema columns = StructType([]) # Create an empty dataframe with empty schema df = spark.createDataFrame(data = [], schema = columns) # Print the dataframe print ( 'Dataframe :' ) df.show() # Print the schema print ( 'Schema :' ) df.printSchema() |
Output:
Dataframe : ++ || ++ ++ Schema : root
Creating an empty dataframe with schema
- Specify the schema of the dataframe as columns = [‘Name’, ‘Age’, ‘Gender’].
- Specify data as empty([]) and schema as columns in CreateDataFrame() method.
Code:
Python3
# Import necessary libraries from pyspark.sql import SparkSession from pyspark.sql.types import * # Create a spark session spark = SparkSession.builder.appName( 'Empty_Dataframe' ).getOrCreate() # Create an expected schema columns = StructType([StructField( 'Name' , StringType(), True ), StructField( 'Age' , StringType(), True ), StructField( 'Gender' , StringType(), True )]) # Create a dataframe with expected schema df = spark.createDataFrame(data = [], schema = columns) # Print the dataframe print ( 'Dataframe :' ) df.show() # Print the schema print ( 'Schema :' ) df.printSchema() |
Output :
Dataframe : +----+---+------+ |Name|Age|Gender| +----+---+------+ +----+---+------+ Schema : root |-- Name: string (nullable = true) |-- Age: string (nullable = true) |-- Gender: string (nullable = true)