Drop a column with same name using column index in PySpark

23 July 2024

1

In this article, we are going to learn how to drop a column with the same name using column index using Pyspark in Python.

Pyspark offers you the essential function ‘drop‘ through which you can easily delete one or more columns. But have you ever got the requirement in which you have various columns with the same column names and the requirement is to delete all the duplicate columns? This can be achieved in Pyspark by obtaining the column index of all the columns with the same name and then deleting those columns using the drop function.

Example 1:

In the example, we have created a data frame with four columns ‘name‘, ‘marks‘, ‘marks‘, ‘marks‘ as follows:

Drop a column with same name using column index in PySpark

Once created, we got the index of all the columns with the same name, i.e., 2, 3, and added the suffix ‘_duplicate‘ to them using a for a loop. Finally, we removed the columns with suffixes ‘_duplicate‘ in them and displayed the data frame.

Python3

# Python program to drop a column with same  
# name using column index in PySpark 
  
# Import the library SparkSession 
from pyspark.sql import SparkSession 
  
# Create a spark session using getOrCreate() function 
spark_session = SparkSession.builder.getOrCreate() 
  
# Create a data frame with duplicate column names 
df = spark_session.createDataFrame( 
  [('Arun',1,2,3),('Aniket',4,5,6), 
                  ('Ishita',7,8,9)], 
  ['name','marks','marks','marks']) 
  
# Store all the column names in the list 
df_cols = df.columns 
  
# Get index of the duplicate columns 
duplicate_col_index = [idx for idx, 
  val in enumerate(df_cols) if val in df_cols[:idx]] 
  
# Create a new list by renaming duplicate  
# columns by adding prefix '_duplicate' 
for i in duplicate_col_index: 
    df_cols[i] = df_cols[i] + '_duplicate'
  
# Rename the duplicate columns in data frame 
df = df.toDF(*df_cols) 
  
# Create a list for the columns to be removed 
cols_to_remove =  
  
# Remove the columns with same name 
df.drop(*cols_to_remove).show()

Output:

Example 2:

In the example, we have created a data frame with five columns with names ‘day’, ‘temperature‘, ‘temperature‘, ‘temperature‘, and ‘temperature‘ as follows:

Once created, we got the index of all the columns with the same name, i.e., 2, 3, 4, and added the prefix ‘day_‘ to them using a for loop. Finally, we removed the columns with the prefixes ‘day_‘ in them and displayed the data frame.

Python3

# Python program to drop a column with same  
# name using column index in PySpark 
  
# Import the library SparkSession 
from pyspark.sql import SparkSession 
  
# Create a spark session using getOrCreate() function 
spark_session = SparkSession.builder.getOrCreate() 
  
# Create a data frame with duplicate column names 
df = spark_session.createDataFrame( 
  [('Monday',25,27,29,30),('Tuesday',40,38,36,34), 
   ('Wednesday',18,20,22,17),('Thursday',25,27,29,19)], 
  ['day','temperature','temperature','temperature', 
                                      'temperature']) 
  
# Store all the column names in the list 
df_cols = df.columns 
  
# Get index of the duplicate columns 
duplicate_col_index = [idx for idx, 
   val in enumerate(df_cols) if val in df_cols[:idx]] 
  
# Create a new list by renaming duplicate  
# columns by adding prefix 'day_' 
for i in duplicate_col_index: 
    df_cols[i] = 'day_'+ df_cols[i] 
  
# Rename the duplicate columns in data frame 
df = df.toDF(*df_cols) 
  
# Create a list for the columns to be removed 
cols_to_remove =  
  
# Remove the columns with same name 
df.drop(*cols_to_remove).show()

Output:

Drop a column with same name using column index in PySpark

Example 1:

Python3

Example 2:

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to Protect Against Walmart Gift Card Scams in 2025 by Manual Thomas

Interview With Dan Chernov – CEO of DerScanner by Shauli Zacks

5 Best Free Antiviruses for Linux in 2025: Expert Ranked by Sam Boyd

5 Best Free Online Virus Scanners & Removers for 2025 by Kate Davidson

Recent Comments

EDITOR PICKS

How to Protect Against Walmart Gift Card Scams in 2025 by Manual Thomas

Interview With Dan Chernov – CEO of DerScanner by Shauli Zacks

5 Best Free Antiviruses for Linux in 2025: Expert Ranked by Sam Boyd

POPULAR POSTS

How to Protect Against Walmart Gift Card Scams in 2025 by Manual Thomas

Interview With Dan Chernov – CEO of DerScanner by Shauli Zacks

5 Best Free Antiviruses for Linux in 2025: Expert Ranked by Sam Boyd

POPULAR CATEGORY

ABOUT US

FOLLOW US