In this article, we are going to check if the Pyspark DataFrame or Dataset is Empty or Not.
At first, let’s create a dataframe
Python3
# import modulesfrom pyspark.sql import SparkSessionfrom pyspark.sql.types import StructType, StructField, StringType # defining schemaschema = StructType([ StructField('COUNTRY', StringType(), True), StructField('CITY', StringType(), True), StructField('CAPITAL', StringType(), True)]) # Create Spark Objectspark = SparkSession.builder.appName("TestApp").getOrCreate() # Create Empty DataFrame with Schema.df = spark.createDataFrame([], schema) # Show schema and datadf.printSchema()df.show(truncate=False) |
Output:
Checking dataframe is empty or not
We have Multiple Ways by which we can Check :
Method 1: isEmpty()
The isEmpty function of the DataFrame or Dataset returns true when the DataFrame is empty and false when it’s not empty. If the dataframe is empty, invoking “isEmpty” might result in NullPointerException.
Note : calling df.head() and df.first() on empty DataFrame returns java.util.NoSuchElementException: next on empty iterator exception.
Python3
print(df.head(1).isEmpty)print(df.first(1).isEmpty)print(df.rdd.isEmpty()) |
Output:
True True True
Method 2: count()
It calculates the count from all partitions from all nodes
Code:
Python3
print(df.count() > 0)print(df.count() == 0) |
False True

