Wednesday, December 25, 2024
Google search engine
HomeLanguagesHow to Fix: Can only compare identically-labeled series objects

How to Fix: Can only compare identically-labeled series objects

In this article, we are going to see how to fix it: Can only compare identically-labeled series objects in Python.

Reason for Error

Can only compare identically-labeled series objects: It is Value Error, occurred when we compare 2 different DataFrames (Pandas 2-D Data Structure). If we compare DataFrames which are having different labels or indexes then this error can be thrown.

How to Reproduce the Error

Python3




# import necessary packages
import pandas as pd
 
# create 2 dataframes with different indexes
hostelCandidates1 = pd.DataFrame({'Height in CMs': [150, 170, 160],
                                  'Weight in KGs': [70, 55, 60]},
                                 index=[1, 2, 3])
 
hostelCandidates2 = pd.DataFrame({'Height in CMs': [150, 170, 160],
                                  'Weight in KGs': [70, 55, 60]},
                                 index=['A', 'B', 'C'])
 
# displaying 2 dataframes
print(hostelCandidates1)
print(hostelCandidates2)
 
# compare 2 dataframes
hostelCandidates1 == hostelCandidates2


Output:

Even though the data in the 2 DataFrames are the same but the indexes of these are different. So in order to compare the data of 2 DataFrames are the same or not, we need to follow the below approaches/solutions

Method 1: With consideration of indexes

Here we compare data along with index labels between DataFrames to specify whether they are the same or not. So instead of ‘==’ use equals method while the comparison.

Python3




# import necessary packages
import pandas as pd
 
# create 2 dataframes with different indexes
hostelCandidates1 = pd.DataFrame({'Height in CMs':
                                  [150, 170, 160],
                                  'Weight in KGs':
                                  [70, 55, 60]},
                                 index=[1, 2, 3])
 
hostelCandidates2 = pd.DataFrame({'Height in CMs':
                                  [150, 170, 160],
                                  'Weight in KGs':
                                  [70, 55, 60]},
                                 index=['A', 'B', 'C'])
 
# displaying 2 dataframes
print(hostelCandidates1)
print(hostelCandidates2)
 
# compare 2 dataframes
hostelCandidates1.equals(hostelCandidates2)


Output:

As the data is the same but the index labels of these 2 data frames are different so it returns false instead of an error.

Method 2: Without consideration of indexes

To drop indexes of DataFrame use reset_index method. By dropping the indexes, makes an easy task such that interpreters just check the data irrespective of index values.

Syntax: dataframeName.reset_index(drop=True)

There are 2 ways of comparing data:

  • Whole DataFrame
  • Row by Row

Example 1: Entire DataFrame Comparison

Python3




# import necessary packages
import pandas as pd
 
# create 2 dataframes with different indexes
hostelCandidates1 = pd.DataFrame({'Height in CMs':
                                  [150, 170, 160],
                                  'Weight in KGs':
                                  [70, 55, 60]},
                                 index=[1, 2, 3])
 
hostelCandidates2 = pd.DataFrame({'Height in CMs':
                                  [150, 170, 160],
                                  'Weight in KGs':
                                  [70, 55, 60]},
                                 index=['A', 'B', 'C'])
 
# displaying 2 dataframes
print(hostelCandidates1)
print(hostelCandidates2)
 
# compare 2 dataframes
hostelCandidates1.reset_index(drop=True).equals(
    hostelCandidates2.reset_index(drop=True))


Output:

Here the data is the same and even though the indexes are different we are comparing the DataFrames by eliminating the index labels so it returns true.

Example 2: Row by Row Comparison

Python3




# import necessary packages
import pandas as pd
 
# create 2 dataframes with different indexes
hostelCandidates1 = pd.DataFrame({'Height in CMs':
                                  [150, 170, 160],
                                  'Weight in KGs':
                                  [70, 55, 60]},
                                 index=[1, 2, 3])
 
hostelCandidates2 = pd.DataFrame({'Height in CMs':
                                  [150, 170, 160],
                                  'Weight in KGs':
                                  [70, 55, 60]},
                                 index=['A', 'B', 'C'])
 
# displaying 2 dataframes
print(hostelCandidates1)
print(hostelCandidates2)
 
# compare 2 dataframes
hostelCandidates1.reset_index(
    drop=True) == hostelCandidates2.reset_index(drop=True)


Output:

This approach helps us to identify where there are differences between 2 DataFrames and do not compare its index labels as their index labels are dropped while comparison.

RELATED ARTICLES

Most Popular

Recent Comments