When dealing with data frames, two commonly used methods are size() and count(). While they might seem similar at first glance, they serve different purposes and produce different results. In this article, we’ll explore the What’s the differences between size() and count() in Pandas and when to use each of them.
Difference between the size and count of Pandas?
Let’s see some of the key differences between size() and count() in Pandas.
Inclusion of NaN Values
- size() counts all elements, including NaN values.
- count() counts only non-null (valid) values, excluding NaN values.
Result Type
- size() returns a single integer representing the total number of elements.
- count() returns a Series with the count of non-null values for each column (if applied to a DataFrame) or a single integer (if applied to a Series).
When to Use size() or count() Methods
Knowing when to use size() or count() depends on your specific data analysis needs:
- Use Pandas size() when you want to understand the overall size of your dataset, including missing values. For example, you might use it to calculate proportions or ratios involving missing data.
- Use Pandas count() when you need to know how many valid data points you have in a specific column or when you want to filter out rows with missing values for further analysis.
What is the size of Pandas?
We use size() as the number of rows in a group returned as length. There is no differentiation between nan and non-null values.
Using size() with a Pandas Series
In this example, the size() method is applied to a Series, and it counts all elements, including the NaN value. The result is 5.
Python3
import pandas as pd data = { 'A' : [ 1 , 2 , 3 , None , 5 ]} series = pd.Series(data[ 'A' ]) total_size = series.size print (total_size) |
Output:
5
Using size() with filtering
In this example, we first filter the DataFrame to include only rows where column ‘A’ is not null. Then, we use size() to count all elements in the filtered DataFrame, resulting in a size of 4.
Python3
import pandas as pd data = { 'A' : [ 1 , 2 , 3 , None , 5 ]} df = pd.DataFrame(data) # Filtering rows where column 'A' is not null filtered_df = df[df[ 'A' ].notnull()] filtered_size = filtered_df.size print (filtered_df) print (filtered_size) |
Output
A
0 1.0
1 2.0
2 3.0
4 5.0
4
What is count in Pandas?
In count() it generally count the non -null values in Data frames. This method does differentiation between non-nan and nan value.
Using count() with a Pandas DataFrame
In this example, the count() method is applied to a DataFrame, and it counts the number of non-null values in each column. Both column ‘A’ and ‘B’ have 4 valid values.
Python3
import pandas as pd data = { 'A' : [ 1 , 2 , 3 , None , 5 ], 'B' : [ 6 , 7 , None , 9 , 10 ]} df = pd.DataFrame(data) count_valid = df.count() print (count_valid) |
Output
A 4
B 4
dtype: int64
Using count() with filtering
Here, we filter the DataFrame to include only rows where column ‘A’ is not null, and then we use count() on that filtered column to count the valid values. The result is 4, which represents the count of non-null values in column ‘A’ after filtering.
Python3
import pandas as pd data = { 'A' : [ 1 , 2 , 3 , None , 5 ]} df = pd.DataFrame(data) # Filtering rows where column 'A' is not null filtered_df = df[df[ 'A' ].notnull()] count_filtered = filtered_df[ 'A' ].count() print (filtered_df) print (count_filtered) |
Output
A
0 1.0
1 2.0
2 3.0
4 5.0
4
Difference between size() and count() in Pandas
Code to demonstration to show the difference between size and count in Pandas.
Python3
import pandas as pd data = { 'A' : [ 1 , 2 , 3 , None , 5 ], 'B' : [ 6 , 7 , 8 , 9 , 10 ]} df = pd.DataFrame(data) # Calculates the total number of elements in the DataFrame total_size = df.size # Output: 10 (5 elements in column 'A' + 5 elements in column 'B') print (total_size) # Counts non-null values in column 'A' count_column_A = df[ 'A' ].count() # Output: 4 (4 non-null values in column 'A') print (count_column_A) |
Output:
10
4