Saturday, November 16, 2024
Google search engine
HomeLanguagesPandas – Find unique values from multiple columns

Pandas – Find unique values from multiple columns

Prerequisite: Pandas 

In this article, we will discuss various methods to obtain unique values from multiple columns of Pandas DataFrame.

Method 1: Using pandas Unique() and Concat() methods

Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.

Python3




import pandas as pd
import numpy as np
 
# Creating a custom dataframe.
df = pd.DataFrame({'FirstName': ['Arun', 'Navneet', 'Shilpa',
                                 'Prateek', 'Pyare', 'Prateek'],
                    
                   'LastName': ['Singh', 'Yadav', 'Yadav', 'Shukla',
                                'Lal', 'Mishra'],
                    
                   'Age': [26, 25, 25, 27, 28, 30]})
 
# To get unique values in 1 series/column
print(f"Unique FN: {df['FirstName'].unique()}")
 
# Extending the idea from 1 column to multiple columns
print(f"Unique Values from 3 Columns:\
{pd.concat([df['FirstName'],df['LastName'],df['Age']]).unique()}")


Output:

Unique FN: [‘Arun’ ‘Navneet’ ‘Shilpa’ ‘Prateek’ ‘Pyare’]

Unique Values from 3 Columns:[‘Arun’ ‘Navneet’ ‘Shilpa’ ‘Prateek’ ‘Pyare’ ‘Singh’ ‘Yadav’ ‘Shukla’

 ‘Lal’ ‘Mishra’ 26 25 27 28 30]

Method 2: Using Numpy.unique() method

With the help of np.unique() method, we can get the unique values from an array given as parameter in np.unique() method.

Note: This approach has one limitation i.e. we cannot combine str and numerical columns together, and therefore if such a situation arises where we need to club different datatypes columns together then go for Method 1.

Python3




import pandas as pd
import numpy as np
 
# Creating a custom dataframe.
df = pd.DataFrame({'FirstName': ['Arun', 'Navneet', 'Shilpa',
                                 'Prateek', 'Pyare', 'Prateek'],
                    
                   'LastName': ['Singh', 'Yadav', 'Yadav', 'Shukla',
                                'Lal', 'Mishra'],
                    
                   'Age': [26, 25, 25, 27, 28, 30]})
 
print(np.unique(df[['LastName', 'FirstName']].values))
 
# Will throw error as Age is numerical datatype
# and LastName is str
# print(np.unique(df[['LastName','Age']].values))


Output: 

[‘Arun’ ‘Lal’ ‘Mishra’ ‘Navneet’ ‘Prateek’ ‘Pyare’ ‘Shilpa’ ‘Shukla’

 ‘Singh’ ‘Yadav’]

Method 3: Using Sets in Python 

The Set has a property that only contains unique values and therefore we convert individual series into a Set object and then take the set union of them. Unlike Method 2 this also works for all datatype combinations.  

Python3




import pandas as pd
import numpy as np
 
 
# Creating a custom dataframe.
df = pd.DataFrame({'FirstName': ['Arun', 'Navneet', 'Shilpa',
                                 'Prateek', 'Pyare', 'Prateek'],
                    
                   'LastName': ['Singh', 'Yadav', 'Yadav', 'Shukla',
                                'Lal', 'Mishra'],
                    
                   'Age': [26, 25, 25, 27, 28, 30]})
 
# Typecasting pandas series into set and then
# taking set union (|)
print(set(df.FirstName) | set(df.LastName) | set(df.Age))


Output: 

{‘Singh’, ‘Pyare’, ‘Mishra’, 27, ‘Navneet’, ‘Arun’, ‘Lal’, ‘Shukla’, 30, 25, 26, ‘Yadav’, 28, ‘Shilpa’, ‘Prateek’}

 

RELATED ARTICLES

Most Popular

Recent Comments