Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages, and makes importing and analyzing data much easier.
The most important thing in Data Analysis is comparing values and selecting data accordingly. The “==” operator works for multiple values in a Pandas Data frame too. Following two examples will show how to compare and select data from a Pandas Data frame.
To download the CSV file used, Click Here.
Example #1: Comparing Data
In the following example, a data frame is made from a csv file. In the Gender Column, there are only 3 types of values (“Male”, “Female” or NaN). Every row of Gender column is compared to “Male” and a boolean series is returned after that.
# importing pandas package import pandas as pd # making data frame from csv file data = pd.read_csv( "employees.csv" ) # storing boolean series in new new = data[ "Gender" ] = = "Male" # inserting new series in data frame data[ "New" ] = new # display data |
Output:
As show in the output image, for Gender= “Male”, the value in New Column is True and for “Female” and NaN values it is False.
Example #2: Selecting Data
In the following example, the boolean series is passed to the data and only Rows having Gender=”Male” are returned.
# importing pandas package import pandas as pd # making data frame from csv file data = pd.read_csv( "employees.csv" ) # storing boolean series in new new = data[ "Gender" ] ! = "Female" # inserting new series in data frame data[ "New" ] = new # display data[new] # OR # data[data["Gender"]=="Male"] # Both are the same |
Output:
As shown in the output image, Data frame having Gender=”Male” is returned.
Note: For NaN values, the boolean value is False.