Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
Pandas Index.duplicated()
function returns Index object with the duplicate values remove. Duplicated values are indicated as True values in the resulting array. Either all duplicates, all except the first, or all except the last occurrence of duplicates can be indicated.
Syntax: Index.duplicated(keep=’first’)
Parameters :
keep : {‘first’, ‘last’, False}, default ‘first’
The value or values in a set of duplicates to mark as missing.
-> ‘first’ : Mark duplicates as True except for the first occurrence.
-> ‘last’ : Mark duplicates as True except for the last occurrence.
-> False : Mark all duplicates as True.Returns : numpy.ndarray
Example #1: Use Index.duplicated()
function to indicate all the duplicated value in the Index except the first one.
# importing pandas as pd import pandas as pd # Creating the Index idx = pd.Index([ 'Labrador' , 'Beagle' , 'Labrador' , 'Lhasa' , 'Husky' , 'Beagle' ]) # Print the Index idx |
Output :
Let’s find if a value present in Index is a duplicate value or unique.
# Identify the duplicated values except the first idx.duplicated(keep = 'first' ) |
Output :
As we can see in the output, the Index.duplicated()
function has marked all the occurrence of duplicate value as True
except the first occurrence.
Example #2: Use Index.duplicated()
function to identify all the duplicate values. here all the duplicate values will be marked as True
# importing pandas as pd import pandas as pd # Creating the Index idx = pd.Index([ 100 , 50 , 45 , 100 , 12 , 50 , None ]) # Print the Index idx |
Output :
Let’s identify all the duplicated values in the Index.
Note : We are having NaN
values in the Index.
# Identify all duplicated occurrence of values idx.duplicated(keep = False ) |
Output :
The function has marked all the duplicate value as True. It has also treated the single occurrence of NaN
value as unique and has marked it false.