Python | Pandas dataframe.drop_duplicates()

28 July 2024

1

Pandas drop_duplicates() method helps in removing duplicates from the Pandas Dataframe In Python.

Syntax of df.drop_duplicates()

Syntax: DataFrame.drop_duplicates(subset=None, keep=’first’, inplace=False)

Parameters:

subset: Subset takes a column or list of column label. It’s default value is none. After passing columns, it will consider them only for duplicates.

keep: keep is to control how to consider duplicate value. It has only three distinct value and default is ‘first’.

If ‘first‘, it considers first value as unique and rest of the same values as duplicate.

If ‘last‘, it considers last value as unique and rest of the same values as duplicate.

If False, it consider all of the same values as duplicates

inplace: Boolean values, removes rows with duplicates if True.

Return type: DataFrame with removed duplicate rows depending on Arguments passed.

Example:

As we can see one of the TeamA and team has been dropped due to duplicate value.

Python3

import pandas as pd
  
data = {
    "A": ["TeamA", "TeamB", "TeamB", "TeamC", "TeamA"],
    "B": [50, 40, 40, 30, 50],
    "C": [True, False, False, False, True]
}
  
df = pd.DataFrame(data)
  
display(df.drop_duplicates())

Output:

    A        B    C
0    TeamA    50    True
1    TeamB    40    False
3    TeamC    30    False

To download the CSV file used, Click Here.

Example 1: Removing rows with the same First Name

In the following example, rows having the same First Name are removed and a new data frame is returned.

Python3

# importing pandas package
import pandas as pd
  
# making data frame from csv file
data = pd.read_csv("employees.csv")
  
# sorting by first name
data.sort_values("First Name", inplace=True)
  
# dropping ALL duplicate values
data.drop_duplicates(subset="First Name",
                     keep=False, inplace=True)
  
# displaying data
data

Output:

As shown in the image, the rows with the same names were removed from a data frame.

Example 2: Removing rows with all duplicate values

In this example, rows having all values will be removed. Since the CSV file isn’t having such a row, a random row is duplicated and inserted into the data frame first.

Python3

# length before adding row
length1 = len(data)
  
# manually inserting duplicate of a row of row 440
data.loc[1001] = [data["First Name"][440],
                  data["Gender"][440],
                  data["Start Date"][440],
                  data["Last Login Time"][440],
                  data["Salary"][440],
                  data["Bonus %"][440],
                  data["Senior Management"][440],
                  data["Team"][440]]
  
# length after adding row
length2 = len(data)
  
# sorting by first name
data.sort_values("First Name", inplace=True)
  
# dropping duplicate values
data.drop_duplicates(keep=False, inplace=True)
  
# length after removing duplicates
length3 = len(data)
  
# printing all data frame lengths
print(length1, length2, length3)

Output:

As shown in the output image, the length after removing duplicates is 999. Since the keep parameter was set to False, all of the duplicate rows were removed.

Python | Pandas dataframe.drop_duplicates()

Syntax of df.drop_duplicates()

Example:

Python3

Example 1: Removing rows with the same First Name

Python3

Example 2: Removing rows with all duplicate values

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

5 Best VPNs for Brunei in 2025: Surf & Stream Privately by Raven Wu

NordVPN vs. Mullvad VPN 2025: Which VPN Is Better? by Gjurgjica Panova

Surfshark vs. Atlas VPN 2025: Which VPN Is Better? by Gjurgjica Panova

PureVPN vs. Private Internet Access 2025: Which Is Better? by Gjurgjica Panova

Recent Comments

EDITOR PICKS

5 Best VPNs for Brunei in 2025: Surf & Stream Privately by Raven Wu

NordVPN vs. Mullvad VPN 2025: Which VPN Is Better? by Gjurgjica Panova

Surfshark vs. Atlas VPN 2025: Which VPN Is Better? by Gjurgjica Panova

POPULAR POSTS

5 Best VPNs for Brunei in 2025: Surf & Stream Privately by Raven Wu

NordVPN vs. Mullvad VPN 2025: Which VPN Is Better? by Gjurgjica Panova

Surfshark vs. Atlas VPN 2025: Which VPN Is Better? by Gjurgjica Panova

POPULAR CATEGORY

ABOUT US

FOLLOW US