Monday, November 18, 2024
Google search engine
HomeLanguagesPandas DataFrame describe() Method

Pandas DataFrame describe() Method

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. 

Pandas DataFrame describe()

Pandas describe() is used to view some basic statistical details like percentile, mean, std, etc. of a data frame or a series of numeric values. When this method is applied to a series of strings, it returns a different output which is shown in the examples below.

Syntax: DataFrame.describe(percentiles=None, include=None, exclude=None) 

Parameters: 

  • percentile: list like data type of numbers between 0-1 to return the respective percentile 
  • include: List of data types to be included while describing dataframe. Default is None 
  • exclude: List of data types to be Excluded while describing dataframe. Default is None 

Return type: Statistical summary of data frame.

Creating DataFrame for demonstration:

To download the data set used in the following example, click here. In the following examples, the data frame used contains data from some NBA players. Let’s have a look at the data by importing it.

Python3




import pandas as pd
# reading and printing csv file
data = pd.read_csv('nba.csv')
print(data.head())


Output: 

            Name            Team  Number Position   Age Height  Weight            College     Salary
0  Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0              Texas  7730337.0  
1    Jae Crowder  Boston Celtics    99.0       SF  25.0    6-6   235.0          Marquette  6796117.0 
2   John Holland  Boston Celtics    30.0       SG  27.0    6-5   205.0  Boston University        NaN 
3    R.J. Hunter  Boston Celtics    28.0       SG  22.0    6-5   185.0      Georgia State  1148640.0 
4  Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0                NaN  5000000.0 

Using Describe function in Pandas

We can easily learn about several statistical measures, including mean, median, standard deviation, quartiles, and more, by using describe() on a DataFrame.

Python3




print(data.descibe())


           Number         Age      Weight        Salary
count  457.000000  457.000000  457.000000  4.460000e+02
mean    17.678337   26.938731  221.522976  4.842684e+06
std     15.966090    4.404016   26.368343  5.229238e+06
min      0.000000   19.000000  161.000000  3.088800e+04
25%      5.000000   24.000000  200.000000  1.044792e+06
50%     13.000000   26.000000  220.000000  2.839073e+06
75%     25.000000   30.000000  240.000000  6.500000e+06
max     99.000000   40.000000  307.000000  2.500000e+07

Explanation of the description of numerical columns:

count: Total Number of Non-Empty values
mean: Mean of the column values
std: Standard Deviation of the column values
min: Minimum value from the column
25%: 25 percentile
50%: 50 percentile
75%: 75 percentile
max: Maximum value from the column

Pandas describe() behavior for numeric dtypes

In this example, the data frame is described and [‘object’] is passed to include a parameter to see a description of the object series. [.20, .40, .60, .80] is passed to the percentile parameter to view the respective percentile of the Numeric series. 

Python3




import pandas as pd
data = pd.read_csv('nba.csv')
 
# removing null values to avoid errors
data.dropna(inplace=True)
 
# percentile list
perc = [.20, .40, .60, .80]
 
# list of dtypes to include
include = ['object', 'float', 'int']
 
# calling describe method
desc = data.describe(percentiles=perc, include=include)
 
# display
desc


Output:

                 Name                  Team      Number Position         Age  \
count             364                   364  364.000000      364  364.000000   
unique            364                    30         NaN        5         NaN   
top     Avery Bradley  New Orleans Pelicans         NaN       SG         NaN   
freq                1                    16         NaN       87         NaN   
mean              NaN                   NaN   16.829670      NaN   26.615385   
std               NaN                   NaN   14.994162      NaN    4.233591   
min               NaN                   NaN    0.000000      NaN   19.000000   
20%               NaN                   NaN    4.000000      NaN   23.000000   
40%               NaN                   NaN    9.000000      NaN   25.000000   
50%               NaN                   NaN   12.000000      NaN   26.000000   
60%               NaN                   NaN   17.000000      NaN   27.000000   
80%               NaN                   NaN   30.000000      NaN   30.000000   
max               NaN                   NaN   99.000000      NaN   40.000000  
       Height      Weight   College        Salary  
count     364  364.000000       364  3.640000e+02  
unique     17         NaN       115           NaN  
top       6-9         NaN  Kentucky           NaN  
freq       49         NaN        22           NaN  
mean      NaN  219.785714       NaN  4.620311e+06  
std       NaN   24.793099       NaN  5.119716e+06  
min       NaN  161.000000       NaN  5.572200e+04  
20%       NaN  195.000000       NaN  9.472760e+05  
40%       NaN  212.000000       NaN  1.638754e+06  
50%       NaN  220.000000       NaN  2.515440e+06  
60%       NaN  228.000000       NaN  3.429934e+06  
80%       NaN  242.400000       NaN  7.838202e+06  
max       NaN  279.000000       NaN  2.287500e+07  

As shown in the output image, the Statistical description of the Dataframe was returned with the respectively passed percentiles. For the columns with strings, NaN was returned for numeric operations. 

Describing series of strings 

In this example, the described method is called by the Name column to see the behavior with the object data type. 

Python3




# importing pandas module
import pandas as pd
 
# making data frame
data = pd.read_csv("nba.csv")
 
# removing null values to avoid errors
data.dropna(inplace=True)
 
# calling describe method
desc = data["Name"].describe()
 
# display
desc


Output: As shown in the output image, the behavior of describe() is different with a series of strings. Different stats were returned like count of values, unique values, top, and frequency of occurrence in this case. 

count               457
unique              457
top       Avery Bradley
freq                  1
Name: Name, dtype: object

RELATED ARTICLES

Most Popular

Recent Comments