Wednesday, December 25, 2024
Google search engine
HomeLanguagesBoolean Indexing in Pandas

Boolean Indexing in Pandas

In boolean indexing, we will select subsets of data based on the actual values of the data in the DataFrame and not on their row/column labels or integer locations. In boolean indexing, we use a boolean vector to filter the data. 

Boolean indexing is a type of indexing that uses actual values of the data in the DataFrame. In boolean indexing, we can filter a data in four ways:

  • Accessing a DataFrame with a boolean index
  • Applying a boolean mask to a dataframe
  • Masking data based on column value
  • Masking data based on an index value

Accessing a DataFrame with a boolean index: 

In order to access a dataframe with a boolean index, we have to create a dataframe in which the index of dataframe contains a boolean value that is “True” or “False”.

Example  

Python3




# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
  
df = pd.DataFrame(dict, index = [True, False, True, False])
  
print(df)


Output: 
 

Now we have created a dataframe with the boolean index after that user can access a dataframe with the help of the boolean index. User can access a dataframe using three functions that is .loc[], .iloc[], .ix[] 

Accessing a Dataframe with a boolean index using .loc[]

In order to access a dataframe with a boolean index using .loc[], we simply pass a boolean value (True or False) in a .loc[] function. 

Python3




# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
 
# creating a dataframe with boolean index
df = pd.DataFrame(dict, index = [True, False, True, False])
 
# accessing a dataframe using .loc[] function
print(df.loc[True])


Output: 
 

Accessing a Dataframe with a boolean index using .iloc[]

In order to access a dataframe using .iloc[], we have to pass a boolean value (True or False)  but iloc[] function accepts only integer as an argument so it will throw an error so we can only access a dataframe when we pass an integer in iloc[] function 

Code #1:  

Python3




# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
 
# creating a dataframe with boolean index 
df = pd.DataFrame(dict, index = [True, False, True, False])
 
# accessing a dataframe using .iloc[] function
print(df.iloc[True])


Output: 

TypeError

Code #2:  

Python3




# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
 
# creating a dataframe with boolean index 
df = pd.DataFrame(dict, index = [True, False, True, False])
  
 
# accessing a dataframe using .iloc[] function
print(df.iloc[1])


Output:

 

Accessing a Dataframe with a boolean index using .ix[]

In order to access a dataframe using .ix[], we have to pass boolean value (True or False) and integer value to .ix[] function because as we know that .ix[] function is a hybrid of .loc[] and .iloc[] function. 

Code #1:  

Python3




# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
 
# creating a dataframe with boolean index
df = pd.DataFrame(dict, index = [True, False, True, False])
  
 
# accessing a dataframe using .ix[] function
print(df.ix[True])


Output: 


 

Code #2: 

Python




# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
 
# creating a dataframe with boolean index
df = pd.DataFrame(dict, index = [True, False, True, False])
  
 
# accessing a dataframe using .ix[] function
print(df.ix[1])


Output: 
 

Applying a boolean mask to a dataframe : 

In a dataframe, we can apply a boolean mask. In order to do that we can use __getitems__ or [] accessor. We can apply a boolean mask by giving a list of True and False of the same length as contain in a dataframe. When we apply a boolean mask it will print only that dataframe in which we pass a boolean value True. To download “nba1.1” CSV file click here.

Code #1: 

Python3




# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
  
df = pd.DataFrame(dict, index = [0, 1, 2, 3])
  
 
 
print(df[[True, False, True, False]])


Output: 
 

Code #2: 
 

Python3




# importing pandas package
import pandas as pd
  
# making data frame from csv file
data = pd.read_csv("nba1.1.csv")
  
df = pd.DataFrame(data, index = [0, 1, 2, 3, 4, 5, 6,
                                 7, 8, 9, 10, 11, 12])
 
  
print(df[[True, False, True, False, True,
    False, True, False, True, False,
                True, False, True]])


Output: 
 

Masking data based on column value: 

In a dataframe we can filter a data based on a column value.  In order to filter data, we can apply certain conditions on the dataframe using different operators like ==, >, <, <=, >=. When we apply these operators to the dataframe then it produces a Series of True and False. To download the “nba.csv” CSV, click here.

Code #1: 

Python




# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["BCA", "BCA", "M.Tech", "BCA"],
        'score':[90, 40, 80, 98]}
 
# creating a dataframe
df = pd.DataFrame(dict)
  
# using a comparison operator for filtering of data
print(df['degree'] == 'BCA')


Output: 
 

Code #2: 

Python




# importing pandas package
import pandas as pd
  
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
  
# using greater than operator for filtering of data
print(data['Age'] > 25)


Output: 
 

Masking data based on index value : 

In a dataframe we can filter a data based on a column value. In order to filter data, we can create a mask based on the index values using different operators like ==, >, <, etc… . To download “nba1.1” CSV file click here.

Code #1:  

Python3




# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["BCA", "BCA", "M.Tech", "BCA"],
        'score':[90, 40, 80, 98]}
  
 
df = pd.DataFrame(dict, index = [0, 1, 2, 3])
 
mask = df.index == 0
 
print(df[mask])


Output: 
 

Code #2: 
 

Python3




# importing pandas package
import pandas as pd
  
# making data frame from csv file
data = pd.read_csv("nba1.1.csv")
 
# giving a index to a dataframe
df = pd.DataFrame(data, index = [0, 1, 2, 3, 4, 5, 6,
                                 7, 8, 9, 10, 11, 12])
 
# filtering data on index value
mask = df.index > 7
 
print(df[mask])


Output: 
 

 

RELATED ARTICLES

Most Popular

Recent Comments