Monday, January 6, 2025
Google search engine
HomeLanguagesHow to LEFT ANTI join under some matching condition in Pandas

How to LEFT ANTI join under some matching condition in Pandas

LEFT ANTI Join is the opposite of semi-join. excluding the intersection, it returns the left table. It only returns the columns from the left table and not the right.

Method 1: Using isin()

On the created dataframes we perform left join and subset using isin() function to check if the part on which the datasets are merged is in the subset of the merged dataset.

Syntax:

DataFrame.isin(values)

Parameters

  • values: iterable, Series, DataFrame or dict

Returns

DataFrame

Example:

In the below code, we used the indicator to find the rows which are ‘Left_only’ and subset the merged dataset, and assign it to df. finally, we retrieve the part which is only in our first data frame df1. the output is antijoin of the two data frames.

Python3




# importing packages
import pandas as pd
  
# anti-join
# creating dataframes using pd.DataFrame() method.
df1 = pd.DataFrame({
    "city": ["new york", "chicago", "orlando", 'mumbai'],
    "temperature": [21, 14, 35, 30],
    "humidity": [65, 68, 75, 75],
})
df2 = pd.DataFrame({
    "city": ["chicago", "new york", "orlando"],
    "humidity": [67, 60, 70]
})
  
# carrying out anti join using merge method
df3 = df1.merge(df2, on='city', how='left', indicator=True)
  
df = df3.loc[df3['_merge'] == 'left_only', 'city']
  
d = df1[df1['city'].isin(df)]
  
print(d)


Output:

     city  temperature  humidity
3  mumbai           30        75

Method 2: Using semi join

We can use the ‘~’ operator on the semi-join. It results in anti-join. 

Semi-join: Similar to inner join, semi-join returns the intersection but it only returns the columns from the left table and not the right. it has no duplicate values. 

Syntax:

[~df1[‘column_name’].isin(df2[‘column_name’])]

where, 

  • df1 is the first dataframe
  • df2 is the second dataframe
  • column_name is the matching column in both the dataframes

Example:

In this example, we merge df1 and df2 on ‘city’ by default it is  ‘inner join’, after merging, We exclude the part of df1 which is in df3 and print out the resultant dataframe. 

Python3




# code
import pandas as pd
  
# inverse of semi-join:
# creating dataframes using pd.DataFrame() method.
df1 = pd.DataFrame({
    "city": ["new york", "chicago", "orlando", 'mumbai'],
    "temperature": [21, 14, 35, 30],
    "humidity": [65, 68, 75, 75],
})
df2 = pd.DataFrame({
    "city": ["chicago", "new york", "orlando"],
    "humidity": [67, 60, 70]
})
  
# carrying out anti join using merge method
df3 = df1.merge(df2, on='city')
  
df = df1[~df1['city'].isin(df3['city'])]
  
print(df)


Output:

     city  temperature  humidity
3  mumbai           30        75

RELATED ARTICLES

Most Popular

Recent Comments