Wednesday, July 3, 2024
HomeLanguagesPythonHow to Find & Drop duplicate columns in a Pandas DataFrame?

How to Find & Drop duplicate columns in a Pandas DataFrame?

Let’s discuss How to Find & Drop duplicate columns in a Pandas DataFrame. First, Let’s create a simple Dataframe with column names ‘Name’, ‘Age’, ‘Domicile’, and ‘Age’/’Marks’. 

Find duplicate columns from a DataFrame

To find duplicate columns we need to iterate through all columns of a DataFrame and for each and every column it will search if any other column exists in DataFrame with the same contents already. If yes then that column name will be stored in the duplicate column set. In the end, the function will return the list of column names of the duplicate column. 

Python3




import pandas as pd
 
def getDuplicateColumns(df):
 
    # Create an empty set
    duplicateColumnNames = set()
 
    # Iterate through all the columns
    # of dataframe
    for x in range(df.shape[1]):
 
        # Take column at xth index.
        col = df.iloc[:, x]
 
        # Iterate through all the columns in
        # DataFrame from (x + 1)th index to
        # last index
        for y in range(x + 1, df.shape[1]):
 
            # Take column at yth index.
            otherCol = df.iloc[:, y]
 
            # Check if two columns at x & y
            # index are equal or not,
            # if equal then adding
            # to the set
            if col.equals(otherCol):
                duplicateColumnNames.add(df.columns.values[y])
 
    # Return list of unique column names
    # whose contents are duplicates.
    return list(duplicateColumnNames)
 
 
# Driver code
if __name__ == "__main__":
 
    # List of Tuples
    students = [
        ('Ankit', 34, 'Uttar pradesh', 34),
        ('Riti', 30, 'Delhi', 30),
        ('Aadi', 16, 'Delhi', 16),
        ('Riti', 30, 'Delhi', 30),
        ('Riti', 30, 'Delhi', 30),
        ('Riti', 30, 'Mumbai', 30),
        ('Ankita', 40, 'Bihar', 40),
        ('Sachin', 30, 'Delhi', 30)
    ]
 
    # Create a DataFrame object
    df = pd.DataFrame(students, columns=['Name', 'Age', 'Domicile', 'Marks'])
 
    # Get list of duplicate columns
    duplicateColNames = getDuplicateColumns(df)
 
    for column in duplicateColNames:
        print('Column Name : ', column)


Output:

Column Name:  Marks

Remove duplicate columns from a DataFrame

Method 1: Drop duplicate columns from a DataFrame using  drop_duplicates()

Pandas  drop_duplicates() method helps in removing duplicates from the Pandas Dataframe In Python.

Python3




# Drop duplicate columns
df2 = df.T.drop_duplicates().T
print(df2)


Output:

     Name Age       Domicile
0   Ankit  34  Uttar pradesh
1    Riti  30          Delhi
2    Aadi  16          Delhi
3    Riti  30         Mumbai
4  Ankita  40          Bihar
5  Sachin  30          Delhi

Method 2:  Remove duplicate columns from a DataFrame using df.loc[]

Pandas df.loc[] attribute access a group of rows and columns by label(s) or a boolean array in the given DataFrame.

Python3




# Remove duplicate columns pandas DataFrame
df2 = df.loc[:,~df.columns.duplicated()]
print(df2)


Output:

     Name Age       Domicile
0   Ankit  34  Uttar pradesh
1    Riti  30          Delhi
2    Aadi  16          Delhi
3    Riti  30         Mumbai
4  Ankita  40          Bihar
5  Sachin  30          Delhi

Method 3: Remove duplicate columns from a DataFrame using df.columns.duplicated()

Pandas df.duplicated() method helps in analyzing duplicate values only. It returns a boolean series which is True only for Unique elements.

Python3




# Use DataFrame.columns.duplicated() to drop duplicate columns
duplicate_cols = df.columns[df.columns.duplicated()]
df.drop(columns=duplicate_cols, inplace=True)
print(df)


Output:

     Name       Domicile
0   Ankit  Uttar pradesh
1    Riti          Delhi
2    Aadi          Delhi
3    Riti         Mumbai
4  Ankita          Bihar
5  Sachin          Delhi

Method 4: Drop duplicate columns in a DataFrame using df.drop

To remove the duplicate columns we can pass the list of duplicate column names returned by our user defines function getDuplicateColumns() to the Dataframe.drop() method. 

Python3




# import pandas library
import pandas as pd
 
def getDuplicateColumns(df):
 
    # Create an empty set
    duplicateColumnNames = set()
     
    # Iterate through all the columns
    # of dataframe
    for x in range(df.shape[1]):
         
        # Take column at xth index.
        col = df.iloc[:, x]
         
        # Iterate through all the columns in
        # DataFrame from (x + 1)th index to
        # last index
        for y in range(x + 1, df.shape[1]):
             
            # Take column at yth index.
            otherCol = df.iloc[:, y]
             
            # Check if two columns at x & y
            # index are equal or not,
            # if equal then adding
            # to the set
            if col.equals(otherCol):
                duplicateColumnNames.add(df.columns.values[y])
                 
    # Return list of unique column names
    # whose contents are duplicates.
    return list(duplicateColumnNames)
 
# Driver code
if __name__ == "__main__" :
 
    # List of Tuples
    students = [
            ('Ankit', 34, 'Uttar pradesh', 34),
            ('Riti', 30, 'Delhi', 30),
            ('Aadi', 16, 'Delhi', 16),
            ('Riti', 30, 'Delhi', 30),
            ('Riti', 30, 'Delhi', 30),
            ('Riti', 30, 'Mumbai', 30),
            ('Ankita', 40, 'Bihar', 40),
            ('Sachin', 30, 'Delhi', 30)
        ]
 
    # Create a DataFrame object
    df = pd.DataFrame(students,
                        columns =['Name', 'Age', 'Domicile', 'Marks'])
 
    # Dropping duplicate columns
    rslt_df = df.drop(columns = getDuplicateColumns(df))
 
    print("Resultant Dataframe :")
 
    # Show the dataframe
rslt_df


Output:

 

Dominic Rubhabha Wardslaus
Dominic Rubhabha Wardslaushttps://neveropen.dev
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments