Pandas DataFrame describe() Method

28 July 2024

2

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas DataFrame describe()

Pandas describe() is used to view some basic statistical details like percentile, mean, std, etc. of a data frame or a series of numeric values. When this method is applied to a series of strings, it returns a different output which is shown in the examples below.

Syntax: DataFrame.describe(percentiles=None, include=None, exclude=None)

Parameters:

percentile: list like data type of numbers between 0-1 to return the respective percentile

include: List of data types to be included while describing dataframe. Default is None

exclude: List of data types to be Excluded while describing dataframe. Default is None

Return type: Statistical summary of data frame.

Creating DataFrame for demonstration:

To download the data set used in the following example, click here. In the following examples, the data frame used contains data from some NBA players. Let’s have a look at the data by importing it.

Python3

import pandas as pd
# reading and printing csv file
data = pd.read_csv('nba.csv')
print(data.head())

Output:

            Name            Team  Number Position   Age Height  Weight            College     Salary
0  Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0              Texas  7730337.0  
1    Jae Crowder  Boston Celtics    99.0       SF  25.0    6-6   235.0          Marquette  6796117.0 
2   John Holland  Boston Celtics    30.0       SG  27.0    6-5   205.0  Boston University        NaN 
3    R.J. Hunter  Boston Celtics    28.0       SG  22.0    6-5   185.0      Georgia State  1148640.0 
4  Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0                NaN  5000000.0

Using Describe function in Pandas

We can easily learn about several statistical measures, including mean, median, standard deviation, quartiles, and more, by using describe() on a DataFrame.

Python3

print(data.descibe())

           Number         Age      Weight        Salary
count  457.000000  457.000000  457.000000  4.460000e+02
mean    17.678337   26.938731  221.522976  4.842684e+06
std     15.966090    4.404016   26.368343  5.229238e+06
min      0.000000   19.000000  161.000000  3.088800e+04
25%      5.000000   24.000000  200.000000  1.044792e+06
50%     13.000000   26.000000  220.000000  2.839073e+06
75%     25.000000   30.000000  240.000000  6.500000e+06
max     99.000000   40.000000  307.000000  2.500000e+07

Explanation of the description of numerical columns:

count: Total Number of Non-Empty values
mean: Mean of the column values
std: Standard Deviation of the column values
min: Minimum value from the column
25%: 25 percentile
50%: 50 percentile
75%: 75 percentile
max: Maximum value from the column

Pandas describe() behavior for numeric dtypes

In this example, the data frame is described and [‘object’] is passed to include a parameter to see a description of the object series. [.20, .40, .60, .80] is passed to the percentile parameter to view the respective percentile of the Numeric series.

Python3

import pandas as pd
data = pd.read_csv('nba.csv')
 
# removing null values to avoid errors
data.dropna(inplace=True)
 
# percentile list
perc = [.20, .40, .60, .80]
 
# list of dtypes to include
include = ['object', 'float', 'int']
 
# calling describe method
desc = data.describe(percentiles=perc, include=include)
 
# display
desc

Output:

                 Name                  Team      Number Position         Age  \
count             364                   364  364.000000      364  364.000000   
unique            364                    30         NaN        5         NaN   
top     Avery Bradley  New Orleans Pelicans         NaN       SG         NaN   
freq                1                    16         NaN       87         NaN   
mean              NaN                   NaN   16.829670      NaN   26.615385   
std               NaN                   NaN   14.994162      NaN    4.233591   
min               NaN                   NaN    0.000000      NaN   19.000000   
20%               NaN                   NaN    4.000000      NaN   23.000000   
40%               NaN                   NaN    9.000000      NaN   25.000000   
50%               NaN                   NaN   12.000000      NaN   26.000000   
60%               NaN                   NaN   17.000000      NaN   27.000000   
80%               NaN                   NaN   30.000000      NaN   30.000000   
max               NaN                   NaN   99.000000      NaN   40.000000  
       Height      Weight   College        Salary  
count     364  364.000000       364  3.640000e+02  
unique     17         NaN       115           NaN  
top       6-9         NaN  Kentucky           NaN  
freq       49         NaN        22           NaN  
mean      NaN  219.785714       NaN  4.620311e+06  
std       NaN   24.793099       NaN  5.119716e+06  
min       NaN  161.000000       NaN  5.572200e+04  
20%       NaN  195.000000       NaN  9.472760e+05  
40%       NaN  212.000000       NaN  1.638754e+06  
50%       NaN  220.000000       NaN  2.515440e+06  
60%       NaN  228.000000       NaN  3.429934e+06  
80%       NaN  242.400000       NaN  7.838202e+06  
max       NaN  279.000000       NaN  2.287500e+07

As shown in the output image, the Statistical description of the Dataframe was returned with the respectively passed percentiles. For the columns with strings, NaN was returned for numeric operations.

Describing series of strings

In this example, the described method is called by the Name column to see the behavior with the object data type.

Python3

# importing pandas module
import pandas as pd
 
# making data frame
data = pd.read_csv("nba.csv")
 
# removing null values to avoid errors
data.dropna(inplace=True)
 
# calling describe method
desc = data["Name"].describe()
 
# display
desc

Output: As shown in the output image, the behavior of describe() is different with a series of strings. Different stats were returned like count of values, unique values, top, and frequency of occurrence in this case.

count               457
unique              457
top       Avery Bradley
freq                  1
Name: Name, dtype: object

Pandas DataFrame describe() Method

Pandas DataFrame describe()

Python3

Using Describe function in Pandas

Python3

Pandas describe() behavior for numeric dtypes

Python3

Describing series of strings

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

Google Messages can now show your profile exactly how it’s supposed to be

Recent Comments

EDITOR PICKS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR POSTS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR CATEGORY

ABOUT US

FOLLOW US