Getting More Value from the Pandas value_counts

23 August 2024

4

Data exploration is an important aspect of the machine learning pipeline. Before we decide which model to train and how many to train, we must have an idea of what our data contains. The Pandas library is equipped with a number of useful functions for this very purpose and value_counts is one of them. This function returns the count of unique items in a pandas dataframe. However, most of the time, we end up using value_counts with the default parameters. In this brief article, I’ll show you how to achieve more by altering the default parameters.

value_counts()

The value_counts() method returns a Series containing the counts of unique values. This means, for any column in a dataframe, this method returns the count of unique entries in that column.

Syntax

Series.value_counts()

Parameters

Basic usage

Let’s see the basic usage of this method by on a dataset. I’ll be using the Titanic dataset for the demo. I have also published an accompanying notebook on Kaggle, incase you want to get directly to the codes.

Importing the dataset

Let’s begin by importing the necessary libraries and the dataset. This is a fundamental step in every data analysis process.

# Importing necessary librariesimport pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline# Reading in the data
train = pd.read_csv('../input/titanic/train.csv')

Explore the first few rows of the dataset

train.head()

Calculating the number of null values

train.isnull().sum()

Thus, the Age, Cabin and Embarked columns have null values. With this, we have a bare idea of what are dataset looks like. Let’s now see how we can use value_counts() in five different ways to explore this data further.

1. value_counts() with default parameters

Let’s call the value_counts() on the Embarked column of the dataset. This will return the count of unique occurrences in this column.

train['Embarked'].value_counts()
-------------------------------------------------------------------S      644
C      168
Q       77

The function returns the count of all unique values in the given index in descending order without any null values. We can quickly see that the maximum people embarked from Southampton, followed by Cherbourg and then Queenstown.

2. value_counts() with relative frequencies of the unique values.

Sometimes, getting a percentage is a better criterion then the count. By setting normalize=True, the object returned will contain the relative frequencies of the unique values. The normalizeparameter is set to False by default.

train['Embarked'].value_counts(normalize=True)
-------------------------------------------------------------------S    0.724409
C    0.188976
Q    0.086614

Knowing that 72% of people embarked from Southampton is a better metric than saying 644 people embarked from Southampton.

3. value_counts() in ascending order

The series returned by value_counts() is in descending order by default. We can reverse the case by setting the ascending parameter to True .

train['Embarked'].value_counts(ascending=True)
-------------------------------------------------------------------Q     77
C    168
S    644

4. value_counts() displaying the NaN values

By default, the count of null values is excluded from the result. But, the same can be displayed easily by setting the dropna parameter to False .

train['Embarked'].value_counts(dropna=False)
-------------------------------------------------------------------S      644
C      168
Q       77
NaN      2

We can easily see that there are two null values in the column.

5. value_counts() to bin continuous data into discrete intervals

This is one of my favorite uses of the value_counts() function and an underutilized one too. value_counts() can be used to bin continuous data into discrete intervals with the help of the bin parameter. This option works only with numerical data. It is similar to the pd.cut function. Let’s see how it works using the Fare column.

# applying value_counts on a numerical column without the bin parametertrain['Fare'].value_counts()

This doesn’t convey much information as the output contains a lot of categories for every value of Fare. Instead, let’s group them into seven bins.

train['Fare'].value_counts(bins=7)

Binning makes it easy to understand the idea being conveyed. We can easily see that most of the people out of the total population paid less than 73.19 for their ticket. Also, we can see that having five bins serves our purpose since no passenger falls into the last two bins.

Thus, we can see that value_counts() is a handy tool, and we can do some interesting analysis with this single line of code.

References

pandas.Series.value_counts documentation

Originally Posted Here

Getting More Value from the Pandas value_counts

value_counts()

Syntax

Parameters

Basic usage

Importing the dataset

Explore the first few rows of the dataset

Calculating the number of null values

1. value_counts() with default parameters

2. value_counts() with relative frequencies of the unique values.

3. value_counts() in ascending order

4. value_counts() displaying the NaN values

5. value_counts() to bin continuous data into discrete intervals

References

Run Local AWS Cloud Stack using LocalStack on Linux

Learn Terraform Automation in 3 days using Video Courses

How To Expose Ansible AWX Service using Nginx Ingress

LEAVE A REPLY Cancel reply

Most Popular

10 Best Antivirus Black Friday/Cyber Monday Deals 2024 by Katarina Glamoslija

Bitdefender Black Friday & Cyber Monday Deals 2024 by Sam Boyd

Kaspersky Black Friday & Cyber Monday Deals in 2024 by Kamso Oguejiofor

Norton Black Friday & Cyber Monday Deals 2024 by Sam Boyd

Recent Comments

EDITOR PICKS

10 Best Antivirus Black Friday/Cyber Monday Deals 2024 by Katarina Glamoslija

Bitdefender Black Friday & Cyber Monday Deals 2024 by Sam Boyd

Kaspersky Black Friday & Cyber Monday Deals in 2024 by Kamso Oguejiofor

POPULAR POSTS

10 Best Antivirus Black Friday/Cyber Monday Deals 2024 by Katarina Glamoslija

Bitdefender Black Friday & Cyber Monday Deals 2024 by Sam Boyd

Kaspersky Black Friday & Cyber Monday Deals in 2024 by Kamso Oguejiofor

POPULAR CATEGORY

ABOUT US

FOLLOW US