In this article, we are going to know how to rename a PySpark Dataframe column by index using Python. we can rename columns by index using Dataframe.withColumnRenamed() and Dataframe.columns[] methods. with the help of Dataframe.columns[] we get the name of the column on the particular index and then we replace this name with another name using the withColumnRenamed() method.
Example 1: The following program is to rename a column by its index.
Python3
# importing required module import pyspark from pyspark.sql import SparkSession # creating sparksession and giving spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # demo data of college students data = [[ "Mukul" , 23 , "BBA" ], [ "Robin" , 21 , "BCA" ], [ "Rohit" , 24 , "MBA" ], [ "Suraj" , 25 , "MBA" ], [ "Krish" , 22 , "BCA" ]] # giving column names of dataframe columns = [ "Name" , "Age" , "Course" ] # creating a dataframe dataframe = spark.createDataFrame(data, columns) # Rename dataframe df = dataframe.withColumnRenamed(dataframe.columns[ 0 ], "Student Name" ) # Original dataframe print ( "Original Dataframe" ) dataframe.show() # Dataframe after rename column print ( "Dataframe after rename 0 index column" ) df.show() |
Output:
Example 2: The following program is to rename multiple columns by these indexes.
Python3
# importing module import pyspark # importing sparksession from # pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving # an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of students data data = [[ 123 , "Sagar" , "Rajveer" , 22 , "BBA" ], [ 124 , "Rajeev" , "Mukesh" , 23 , "BBA" ], [ 125 , "Harish" , "Parveen" , 25 , "BBA" ], [ 126 , "Gagan" , "Rohit" , 24 , "BBA" ], [ 127 , "Rakesh" , "Mayank" , 25 , "BBA" ], [ 128 , "Gnanesh" , "Dleep" , 26 , "BBA" ]] # specify column names columns = [ 'ID' , 'Name' , 'Father Name' , 'Age' , "Course" , ] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) # display original dataframe print ( 'Actual data in dataframe' ) dataframe.show() # Rename column df = dataframe.withColumnRenamed(dataframe.columns[ 1 ], "Student Name" ).withColumnRenamed( dataframe.columns[ 3 ], "Student Age" ) # display dataframe after rename column print ( 'After rename 1 and 3 index column' ) df.show() |
Output: