Python | Pandas Dataframe.duplicated()

25 June 2025

0

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
An important part of Data analysis is analyzing Duplicate Values and removing them. Pandas duplicated() method helps in analyzing duplicate values only. It returns a boolean series which is True only for Unique elements.
Syntax:

DataFrame.duplicated(subset=None, keep='first')

Parameters:

subset: Takes a column or list of column label. It’s default value is none. After passing columns, it will consider them only for duplicates.
keep: Controls how to consider duplicate value. It has only three distinct value and default is ‘first’.
–> If ‘first’, it considers first value as unique and rest of the same values as duplicate.
–> If ‘last’, it considers last value as unique and rest of the same values as duplicate.
–> If False, it consider all of the same values as duplicates.

To download the CSV file used, Click Here.
Example #1: Returning a boolean series
In the following example, a boolean series is returned on the basis of duplicate values in the First Name column.

Python

# importing pandas package
import pandas as pd
 
# making data frame from csv file
data = pd.read_csv("employees.csv")
 
# sorting by first name
data.sort_values("First Name", inplace = True)
 
# making a bool series
bool_series = data["First Name"].duplicated()
 
# displaying data
data.head()
 
# display data
data[bool_series]

Output:
As shown in the output image, since the keep parameter was default that is ‘first’, hence whenever the name is occurred, the first one is considered Unique and res Duplicate.

Example #2: Removing duplicates
In this example, the keep parameter is set to False, so that only Unique values are taken and the duplicate values are removed from data.

Python

# importing pandas package
import pandas as pd
 
# making data frame from csv file
data = pd.read_csv("employees.csv")
 
# sorting by first name
data.sort_values("First Name", inplace = True)
 
# making a bool series
bool_series = data["First Name"].duplicated(keep = False)
 
# bool series
bool_series
 
# passing NOT of bool series to see unique values only
data = data[~bool_series]
 
# displaying data
data.info()
data

Output:
Since the duplicated() method returns False for Duplicates, the NOT of the series is taken to see unique value in Data Frame.

Python | Pandas Dataframe.duplicated()

Python

Python

Working with Titles and Heading – Python docx Module

Creating a Receipt Calculator using Python

One Liner for Python if-elif-else Statements

LEAVE A REPLY Cancel reply

Most Popular

Android 16 QPR2 Beta 3 lands with a flurry of bug fixes

Google is working on dedicated ‘Bills’ and ‘Travel’ folders for Gmail

Mint Mobile’s big bet on 5G home internet might change everything

Interviewed With Kyle Smith – Founder and CEO of Escalated by Shauli Zacks

EDITOR PICKS

Android 16 QPR2 Beta 3 lands with a flurry of bug fixes

Google is working on dedicated ‘Bills’ and ‘Travel’ folders for Gmail

Mint Mobile’s big bet on 5G home internet might change everything

POPULAR POSTS

Android 16 QPR2 Beta 3 lands with a flurry of bug fixes

Google is working on dedicated ‘Bills’ and ‘Travel’ folders for Gmail

Mint Mobile’s big bet on 5G home internet might change everything

POPULAR CATEGORY

ABOUT US

FOLLOW US