In this article, let’s discuss how to filter pandas dataframe with multiple conditions. There are possibilities of filtering data from Pandas dataframe with multiple conditions during the entire software development.
Filter Pandas Dataframe with multiple conditions
The reason is dataframe may be having multiple columns and multiple rows. Selective display of columns with limited rows is always the expected view of users. To fulfill the user’s expectations and also help in machine deep learning scenarios, filtering of Pandas dataframe with multiple conditions is much necessary.
Let us see the different ways to do the same.
Creating a sample dataframe to proceed further
Python3
# import module import pandas as pd # assign data dataFrame = pd.DataFrame({ 'Name' : [ ' RACHEL ' , ' MONICA ' , ' PHOEBE ' , ' ROSS ' , 'CHANDLER' , ' JOEY ' ], 'Age' : [ 30 , 35 , 37 , 33 , 34 , 30 ], 'Salary' : [ 100000 , 93000 , 88000 , 120000 , 94000 , 95000 ], 'JOB' : [ 'DESIGNER' , 'CHEF' , 'MASUS' , 'PALENTOLOGY' , 'IT' , 'ARTIST' ]}) # display dataframe display(dataFrame) |
Output:
Filter Pandas Dataframe with multiple conditions Using loc
Here we will get all rows having Salary greater or equal to 100000 and Age < 40 and their JOB starts with ‘D’ from the dataframe. Print the details with Name and their JOB. For the above requirement, we can achieve this by using loc. It is used to access single or more rows and columns by label(s) or by a boolean array. loc works with column labels and indexes.
Python3
# import module import pandas as pd # assign data dataFrame = pd.DataFrame({ 'Name' : [ ' RACHEL ' , ' MONICA ' , ' PHOEBE ' , ' ROSS ' , 'CHANDLER' , ' JOEY ' ], 'Age' : [ 30 , 35 , 37 , 33 , 34 , 30 ], 'Salary' : [ 100000 , 93000 , 88000 , 120000 , 94000 , 95000 ], 'JOB' : [ 'DESIGNER' , 'CHEF' , 'MASUS' , 'PALENTOLOGY' , 'IT' , 'ARTIST' ]}) # filter dataframe display(dataFrame.loc[(dataFrame[ 'Salary' ]> = 100000 ) & (dataFrame[ 'Age' ]< 40 ) & (dataFrame[ 'JOB' ]. str .startswith( 'D' )), [ 'Name' , 'JOB' ]]) |
Output:
Output resolves for the given conditions and finally, we are going to show only 2 columns namely Name and JOB.
Filter Pandas Dataframe Using NumPy
Here will get all rows having Salary greater or equal to 100000 and Age < 40 and their JOB starts with ‘D’ from the data frame. We need to use NumPy.
Python3
# import module import pandas as pd import numpy as np # assign data dataFrame = pd.DataFrame({ 'Name' : [ ' RACHEL ' , ' MONICA ' , ' PHOEBE ' , ' ROSS ' , 'CHANDLER' , ' JOEY ' ], 'Age' : [ 30 , 35 , 37 , 33 , 34 , 30 ], 'Salary' : [ 100000 , 93000 , 88000 , 120000 , 94000 , 95000 ], 'JOB' : [ 'DESIGNER' , 'CHEF' , 'MASUS' , 'PALENTOLOGY' , 'IT' , 'ARTIST' ]}) # filter dataframe filtered_values = np.where((dataFrame[ 'Salary' ]> = 100000 ) & (dataFrame[ 'Age' ]< 40 ) & (dataFrame[ 'JOB' ]. str .startswith( 'D' ))) print (filtered_values) display(dataFrame.loc[filtered_values]) |
Output:
In the above example, print(filtered_values) will give the output as (array([0], dtype=int64),) which indicates the first row with index value 0 will be the output. After that output will have 1 row with all the columns and it is retrieved as per the given conditions.
Filter Pandas Dataframe Using Query (eval and query works only with columns)
In this approach, we get all rows having Salary lesser or equal to 100000 and Age < 40, and their JOB starts with ‘C’ from the dataframe. Its just query the columns of a DataFrame with a single or more Boolean expressions and if multiple, it is having & condition in the middle.
Python3
# import module import pandas as pd # assign data dataFrame = pd.DataFrame({ 'Name' : [ ' RACHEL ' , ' MONICA ' , ' PHOEBE ' , ' ROSS ' , 'CHANDLER' , ' JOEY ' ], 'Age' : [ 30 , 35 , 37 , 33 , 34 , 30 ], 'Salary' : [ 100000 , 93000 , 88000 , 120000 , 94000 , 95000 ], 'JOB' : [ 'DESIGNER' , 'CHEF' , 'MASUS' , 'PALENTOLOGY' , 'IT' , 'ARTIST' ]}) # filter dataframe display(dataFrame.query( 'Salary <= 100000 & Age < 40 & JOB.str.startswith("C").values' )) |
Output:
Pandas Boolean indexing multiple conditions standard way (“Boolean indexing” works with values in a column only)
In this approach, we get all rows having Salary lesser or equal to 100000 and Age < 40 and their JOB starts with ‘P’ from the dataframe. In order to select the subset of data using the values in the dataframe and applying Boolean conditions, we need to follow these ways
Python3
# import module import pandas as pd # assign data dataFrame = pd.DataFrame({ 'Name' : [ ' RACHEL ' , ' MONICA ' , ' PHOEBE ' , ' ROSS ' , 'CHANDLER' , ' JOEY ' ], 'Age' : [ 30 , 35 , 37 , 33 , 34 , 30 ], 'Salary' : [ 100000 , 93000 , 88000 , 120000 , 94000 , 95000 ], 'JOB' : [ 'DESIGNER' , 'CHEF' , 'MASUS' , 'PALENTOLOGY' , 'IT' , 'ARTIST' ]}) # filter dataframe display(dataFrame[(dataFrame[ 'Salary' ]> = 100000 ) & (dataFrame[ 'Age' ]< 40 ) & dataFrame[ 'JOB' ]. str .startswith( 'P' )][[ 'Name' , 'Age' , 'Salary' ]]) |
Output:
We are mentioning a list of columns that need to be retrieved along with the Boolean conditions and since many conditions, it is having ‘&’.
Eval multiple conditions (“eval” and “query” works only with columns )
Here, we get all rows having Salary lesser or equal to 100000 and Age < 40 and their JOB starts with ‘A’ from the dataframe.
Python3
# import module import pandas as pd # assign data dataFrame = pd.DataFrame({ 'Name' : [ ' RACHEL ' , ' MONICA ' , ' PHOEBE ' , ' ROSS ' , 'CHANDLER' , ' JOEY ' ], 'Age' : [ 30 , 35 , 37 , 33 , 34 , 30 ], 'Salary' : [ 100000 , 93000 , 88000 , 120000 , 94000 , 95000 ], 'JOB' : [ 'DESIGNER' , 'CHEF' , 'MASUS' , 'PALENTOLOGY' , 'IT' , 'ARTIST' ]}) # filter dataframe display(dataFrame[dataFrame. eval ( "Salary <=100000 & (Age <40) & JOB.str.startswith('A').values" )]) |
Output:
Dataframes are a very essential concept in Python and filtration of data is required can be performed based on various conditions. They can be achieved in any one of the above ways. Points to be noted:
- loc works with column labels and indexes.
- eval and query works only with columns.
- Boolean indexing works with values in a column only.