Sunday, November 17, 2024
Google search engine
HomeLanguagesPython | Data Comparison and Selection in Pandas

Python | Data Comparison and Selection in Pandas

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages, and makes importing and analyzing data much easier.

The most important thing in Data Analysis is comparing values and selecting data accordingly. The “==” operator works for multiple values in a Pandas Data frame too. Following two examples will show how to compare and select data from a Pandas Data frame.

To download the CSV file used, Click Here.

Example #1: Comparing Data
In the following example, a data frame is made from a csv file. In the Gender Column, there are only 3 types of values (“Male”, “Female” or NaN). Every row of Gender column is compared to “Male” and a boolean series is returned after that.




# importing pandas package
import pandas as pd
  
# making data frame from csv file
data = pd.read_csv("employees.csv")
  
# storing boolean series in new
new = data["Gender"] == "Male"
  
# inserting new series in data frame
data["New"]= new
  
# display
data


Output:
As show in the output image, for Gender= “Male”, the value in New Column is True and for “Female” and NaN values it is False.


 
Example #2: Selecting Data
In the following example, the boolean series is passed to the data and only Rows having Gender=”Male” are returned.




# importing pandas package
import pandas as pd
  
# making data frame from csv file
data = pd.read_csv("employees.csv")
  
# storing boolean series in new
new = data["Gender"] != "Female"
  
# inserting new series in data frame
data["New"]= new
  
# display
data[new]
  
# OR 
# data[data["Gender"]=="Male"]
# Both are the same


Output:
As shown in the output image, Data frame having Gender=”Male” is returned.

Note: For NaN values, the boolean value is False.

RELATED ARTICLES

Most Popular

Recent Comments