Drop One or Multiple Columns From PySpark DataFrame

27 July 2024

1

In this article, we will discuss how to drop columns in the Pyspark dataframe.

In pyspark the drop() function can be used to remove values/columns from the dataframe.

Syntax: dataframe_name.na.drop(how=”any/all”,thresh=threshold_value,subset=[“column_name_1″,”column_name_2”])

how – This takes either of the two values ‘any’ or ‘all’. ‘any’, drop a row if it contains NULLs on any columns and ‘all’, drop a row only if all columns have NULL values. By default it is set to ‘any’

thresh – This takes an integer value and drops rows that have less than that thresh hold non-null values. By default it is set to ‘None’.

subset – This parameter is used to select a specific column to target the NULL values in it. By default it’s ‘None

Python code to create student dataframe with three columns:

Python3

# importing module 
import pyspark 
  
# importing sparksession from pyspark.sql module 
from pyspark.sql import SparkSession 
  
# creating sparksession and giving an app name 
spark = SparkSession.builder.appName('sparkdf').getOrCreate() 
  
# list  of employee data with 5 row values 
data =[["1", "sravan", "company 1"], 
       ["3", "bobby", "company 3"], 
       ["2", "ojaswi", "company 2"], 
       ["1", "sravan", "company 1"], 
       ["3", "bobby", "company 3"], 
       ["4", "rohith", "company 2"], 
       ["5", "gnanesh", "company 1"]] 
  
# specify column names 
columns = ['Employee ID','Employee NAME','Company Name'] 
  
# creating a dataframe from the lists of data 
dataframe = spark.createDataFrame(data,columns) 
  
dataframe.show() 

Output:

+-----------+-------------+------------+
|Employee ID|Employee NAME|Company Name|
+-----------+-------------+------------+
|          1|       sravan|   company 1|
|          3|        bobby|   company 3|
|          2|       ojaswi|   company 2|
|          1|       sravan|   company 1|
|          3|        bobby|   company 3|
|          4|       rohith|   company 2|
|          5|      gnanesh|   company 1|
+-----------+-------------+------------+

Example 1: Delete a single column.

Here we are going to delete a single column from the dataframe.

Syntax: dataframe.drop(‘column name’)

Code:

Python3

# delete single column 
dataframe = dataframe.drop('Employee ID') 
dataframe.show()

Output:

+-------------+------------+
|Employee NAME|Company Name|
+-------------+------------+
|       sravan|   company 1|
|        bobby|   company 3|
|       ojaswi|   company 2|
|       sravan|   company 1|
|        bobby|   company 3|
|       rohith|   company 2|
|      gnanesh|   company 1|
+-------------+------------+Example 2:

Example 2: Delete multiple columns.

Here we will delete multiple columns from the dataframe.

Syntax: dataframe.drop(*(‘column 1′,’column 2′,’column n’))

Code:

Python3

# delete two columns 
dataframe = dataframe.drop(*('Employee NAME', 
                             'Employee ID')) 
dataframe.show()

Output:

+------------+
|Company Name|
+------------+
|   company 1|
|   company 3|
|   company 2|
|   company 1|
|   company 3|
|   company 2|
|   company 1|
+------------+

Example 3: Delete all columns

Here we will delete all the columns from the dataframe, for this we will take column’s name as a list and pass it into drop().

Python3

list = ['Employee ID','Employee NAME','Company Name'] 
  
# delete two columns 
dataframe = dataframe.drop(*list) 
dataframe.show() 

Output:

++
||
++
||
||
||
||
||
||
||
++

Drop One or Multiple Columns From PySpark DataFrame

Python3

Python3

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

5 Best VPNs for Split-Tunneling in 2025: Fast & Flexible by Raven Wu

Google will gradually reduce Pixel 9a battery capacity on purpose as it ages

Your first Pixel 10 Pro Fold leak is bound to leave you disappointed

Samsung Galaxy S26 could mark a big Exynos comeback

Recent Comments

EDITOR PICKS

5 Best VPNs for Split-Tunneling in 2025: Fast & Flexible by Raven Wu

Google will gradually reduce Pixel 9a battery capacity on purpose as it ages

Your first Pixel 10 Pro Fold leak is bound to leave you disappointed

POPULAR POSTS

5 Best VPNs for Split-Tunneling in 2025: Fast & Flexible by Raven Wu

Google will gradually reduce Pixel 9a battery capacity on purpose as it ages

Your first Pixel 10 Pro Fold leak is bound to leave you disappointed

POPULAR CATEGORY

ABOUT US

FOLLOW US