In this article, we will see how to Count NaN or missing values in Pandas DataFrame using isnull()
and sum()
method of the DataFrame.
Dataframe.isnull() method
Pandas isnull()
function detect missing values in the given object. It return a boolean same-sized object indicating if the values are NA. Missing values gets mapped to True and non-missing value gets mapped to False.
Syntax: DataFrame.isnull()
Parameters: None
Return Type: Dataframe of Boolean values which are True for NaN values otherwise False.
dataframe.sum() method
Pandas sum()
function return the sum of the values for the requested axis. If the input is index axis then it adds all the values in a column and repeats the same for all the columns and returns a series containing the sum of all the values in each column. It also provides support to skip the missing values while calculating the.
Syntax: DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
Parameters :
- axis : {index (0), columns (1)}
- skipna : Exclude NA/null values when computing the result.
- level : If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
- numeric_only : Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
- min_count : The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
Returns : sum of Series or DataFrame (if level specified).
Let’s create a pandas dataframe.
# import numpy library as np import numpy as np # import pandas library as pd import pandas as pd # List of Tuples students = [( 'Ankit' , 22 , 'Up' , 'Geu' ), ( 'Ankita' , np.NaN, 'Delhi' , np.NaN), ( 'Rahul' , 16 , 'Tokyo' , 'Abes' ), ( 'Simran' , 41 , 'Delhi' , 'Gehu' ), ( 'Shaurya' , np.NaN, 'Delhi' , 'Geu' ), ( 'Shivangi' , 35 , 'Mumbai' , np.NaN ), ( 'Swapnil' , 35 , np.NaN, 'Geu' ), (np.NaN, 35 , 'Uk' , 'Geu' ), ( 'Jeet' , 35 , 'Guj' , 'Gehu' ), (np.NaN, np.NaN, np.NaN, np.NaN) ] # Create a DataFrame object from # list of tuples with columns # and indices. details = pd.DataFrame(students, columns = [ 'Name' , 'Age' , 'Place' , 'College' ], index = [ 'a' , 'b' , 'c' , 'd' , 'e' , 'f' , 'g' , 'i' , 'j' , 'k' ]) details |
Output:
Example 1 : Count total NaN at each column in DataFrame.
# import numpy library as np import numpy as np # import pandas library as pd import pandas as pd # List of Tuples students = [( 'Ankit' , 22 , 'Up' , 'Geu' ), ( 'Ankita' , np.NaN, 'Delhi' , np.NaN), ( 'Rahul' , 16 , 'Tokyo' , 'Abes' ), ( 'Simran' , 41 , 'Delhi' , 'Gehu' ), ( 'Shaurya' , np.NaN, 'Delhi' , 'Geu' ), ( 'Shivangi' , 35 , 'Mumbai' , np.NaN ), ( 'Swapnil' , 35 , np.NaN, 'Geu' ), (np.NaN, 35 , 'Uk' , 'Geu' ), ( 'Jeet' , 35 , 'Guj' , 'Gehu' ), (np.NaN, np.NaN, np.NaN, np.NaN) ] # Create a DataFrame object from list of tuples # with columns and indices. details = pd.DataFrame(students, columns = [ 'Name' , 'Age' , 'Place' , 'College' ], index = [ 'a' , 'b' , 'c' , 'd' , 'e' , 'f' , 'g' , 'i' , 'j' , 'k' ]) # show the boolean dataframe print ( " \nshow the boolean Dataframe : \n\n" , details.isnull()) # Count total NaN at each column in a DataFrame print ( " \nCount total NaN at each column in a DataFrame : \n\n" , details.isnull(). sum ()) |
Output:
Example 2 : Count total NaN at each row in DataFrame .
# import numpy library as np import numpy as np # import pandas library as pd import pandas as pd # List of Tuples students = [( 'Ankit' , 22 , 'Up' , 'Geu' ), ( 'Ankita' , np.NaN, 'Delhi' , np.NaN), ( 'Rahul' , 16 , 'Tokyo' , 'Abes' ), ( 'Simran' , 41 , 'Delhi' , 'Gehu' ), ( 'Shaurya' , np.NaN, 'Delhi' , 'Geu' ), ( 'Shivangi' , 35 , 'Mumbai' , np.NaN ), ( 'Swapnil' , 35 , np.NaN, 'Geu' ), (np.NaN, 35 , 'Uk' , 'Geu' ), ( 'Jeet' , 35 , 'Guj' , 'Gehu' ), (np.NaN, np.NaN, np.NaN, np.NaN) ] # Create a DataFrame object from # list of tuples with columns # and indices. details = pd.DataFrame(students, columns = [ 'Name' , 'Age' , 'Place' , 'College' ], index = [ 'a' , 'b' , 'c' , 'd' , 'e' , 'f' , 'g' , 'i' , 'j' , 'k' ]) # show the boolean dataframe print ( " \nshow the boolean Dataframe : \n\n" , details.isnull()) # index attribute of a dataframe # gives index list # Count total NaN at each row in a DataFrame for i in range ( len (details.index)) : print ( " Total NaN in row" , i + 1 , ":" , details.iloc[i].isnull(). sum ()) |
Output:
Example 3 : Count total NaN in DataFrame .
# import numpy library as np import numpy as np # import pandas library as pd import pandas as pd # List of Tuples students = [( 'Ankit' , 22 , 'Up' , 'Geu' ), ( 'Ankita' , np.NaN, 'Delhi' , np.NaN), ( 'Rahul' , 16 , 'Tokyo' , 'Abes' ), ( 'Simran' , 41 , 'Delhi' , 'Gehu' ), ( 'Shaurya' , np.NaN, 'Delhi' , 'Geu' ), ( 'Shivangi' , 35 , 'Mumbai' , np.NaN ), ( 'Swapnil' , 35 , np.NaN, 'Geu' ), (np.NaN, 35 , 'Uk' , 'Geu' ), ( 'Jeet' , 35 , 'Guj' , 'Gehu' ), (np.NaN, np.NaN, np.NaN, np.NaN) ] # Create a DataFrame object from # list of tuples with columns # and indices. details = pd.DataFrame(students, columns = [ 'Name' , 'Age' , 'Place' , 'College' ], index = [ 'a' , 'b' , 'c' , 'd' , 'e' , 'f' , 'g' , 'i' , 'j' , 'k' ]) # show the boolean dataframe print ( " \nshow the boolean Dataframe : \n\n" , details.isnull()) # Count total NaN in a DataFrame print ( " \nCount total NaN in a DataFrame : \n\n" , details.isnull(). sum (). sum ()) |
Output: