Prerequisite: Pandas
In this article, we will discuss various methods to obtain unique values from multiple columns of Pandas DataFrame.
Method 1: Using pandas Unique() and Concat() methods
Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.
Python3
import pandas as pd import numpy as np # Creating a custom dataframe. df = pd.DataFrame({ 'FirstName' : [ 'Arun' , 'Navneet' , 'Shilpa' , 'Prateek' , 'Pyare' , 'Prateek' ], 'LastName' : [ 'Singh' , 'Yadav' , 'Yadav' , 'Shukla' , 'Lal' , 'Mishra' ], 'Age' : [ 26 , 25 , 25 , 27 , 28 , 30 ]}) # To get unique values in 1 series/column print (f "Unique FN: {df['FirstName'].unique()}" ) # Extending the idea from 1 column to multiple columns print (f"Unique Values from 3 Columns:\ {pd.concat([df[ 'FirstName' ],df[ 'LastName' ],df[ 'Age' ]]).unique()}") |
Output:
Unique FN: [‘Arun’ ‘Navneet’ ‘Shilpa’ ‘Prateek’ ‘Pyare’]
Unique Values from 3 Columns:[‘Arun’ ‘Navneet’ ‘Shilpa’ ‘Prateek’ ‘Pyare’ ‘Singh’ ‘Yadav’ ‘Shukla’
‘Lal’ ‘Mishra’ 26 25 27 28 30]
Method 2: Using Numpy.unique() method
With the help of np.unique() method, we can get the unique values from an array given as parameter in np.unique() method.
Note: This approach has one limitation i.e. we cannot combine str and numerical columns together, and therefore if such a situation arises where we need to club different datatypes columns together then go for Method 1.
Python3
import pandas as pd import numpy as np # Creating a custom dataframe. df = pd.DataFrame({ 'FirstName' : [ 'Arun' , 'Navneet' , 'Shilpa' , 'Prateek' , 'Pyare' , 'Prateek' ], 'LastName' : [ 'Singh' , 'Yadav' , 'Yadav' , 'Shukla' , 'Lal' , 'Mishra' ], 'Age' : [ 26 , 25 , 25 , 27 , 28 , 30 ]}) print (np.unique(df[[ 'LastName' , 'FirstName' ]].values)) # Will throw error as Age is numerical datatype # and LastName is str # print(np.unique(df[['LastName','Age']].values)) |
Output:
[‘Arun’ ‘Lal’ ‘Mishra’ ‘Navneet’ ‘Prateek’ ‘Pyare’ ‘Shilpa’ ‘Shukla’
‘Singh’ ‘Yadav’]
Method 3: Using Sets in Python
The Set has a property that only contains unique values and therefore we convert individual series into a Set object and then take the set union of them. Unlike Method 2 this also works for all datatype combinations.
Python3
import pandas as pd import numpy as np # Creating a custom dataframe. df = pd.DataFrame({ 'FirstName' : [ 'Arun' , 'Navneet' , 'Shilpa' , 'Prateek' , 'Pyare' , 'Prateek' ], 'LastName' : [ 'Singh' , 'Yadav' , 'Yadav' , 'Shukla' , 'Lal' , 'Mishra' ], 'Age' : [ 26 , 25 , 25 , 27 , 28 , 30 ]}) # Typecasting pandas series into set and then # taking set union (|) print ( set (df.FirstName) | set (df.LastName) | set (df.Age)) |
Output:
{‘Singh’, ‘Pyare’, ‘Mishra’, 27, ‘Navneet’, ‘Arun’, ‘Lal’, ‘Shukla’, 30, 25, 26, ‘Yadav’, 28, ‘Shilpa’, ‘Prateek’}