In this article, we are going to drop multiple columns given in the list in Pyspark dataframe in Python.
For this, we will use the drop() function. This function is used to remove the value from dataframe.
Syntax: dataframe.drop(*[‘column 1′,’column 2′,’column n’])
Where,
- dataframe is the input dataframe
- column names are the columns passed through a list in the dataframe.
Python code to create student dataframe with three columns:
Python3
# importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of students data data = [[ "1" , "sravan" , "vignan" ], [ "2" , "ojaswi" , "vvit" ], [ "3" , "rohith" , "vvit" ], [ "4" , "sridevi" , "vignan" ], [ "1" , "sravan" , "vignan" ], [ "5" , "gnanesh" , "iit" ]] # specify column names columns = [ 'student ID' , 'student NAME' , 'college' ] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data,columns) print ( "Actual data in dataframe" ) # show dataframe dataframe.show() |
Output:
Actual data in dataframe +----------+------------+-------+ |student ID|student NAME|college| +----------+------------+-------+ | 1| sravan| vignan| | 2| ojaswi| vvit| | 3| rohith| vvit| | 4| sridevi| vignan| | 1| sravan| vignan| | 5| gnanesh| iit| +----------+------------+-------+
Example 1: Program to delete multiple column names as a list.
Python3
list = [ 'student NAME' , 'college' ] # drop two columns in dataframe dataframe = dataframe.drop( * list ) dataframe.show() |
Output:
+----------+ |student ID| +----------+ | 1| | 2| | 3| | 4| | 1| | 5| +----------+
Example 2: Example program to drop one column names as a list.
Python3
list = [ 'college' ] # drop two columns in dataframe dataframe = dataframe.drop( * list ) dataframe.show() |
Output:
+----------+------------+ |student ID|student NAME| +----------+------------+ | 1| sravan| | 2| ojaswi| | 3| rohith| | 4| sridevi| | 1| sravan| | 5| gnanesh| +----------+------------+
Example 3: Drop all column names as a list.
Python3
list = [ 'student ID' , 'student NAME' , 'college' ] # drop all columns in dataframe dataframe = dataframe.drop( * list ) dataframe.show() |
Output:
++ || ++ || || || || || || ++