NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float. NaN value is one of the major problems in Data Analysis. It is very essential to deal with NaN in order to get the desired results. In this article, we will discuss how to drop rows with NaN values.
Pandas DataFrame dropna() Method
We can drop Rows having NaN Values in Pandas DataFrame by using dropna() function
df.dropna()
It is also possible to drop rows with NaN values with regard to particular columns using the following statement:
df.dropna(subset, inplace=True)
With in place set to True and subset set to a list of column names to drop all rows with NaN under those columns.
Example 1:
In this case, we’re making our own Dataframe and removing the rows with NaN values so that we can see clean data.
Python3
# importing libraries import pandas as pd import numpy as np num = { 'Integers' : [ 10 , 15 , 30 , 40 , 55 , np.nan, 75 , np.nan, 90 , 150 , np.nan]} # Create the dataframe df = pd.DataFrame(num, columns = [ 'Integers' ]) # dropping the rows having NaN values df = df.dropna() # printing the result df |
Output:
Example 2:
In this example, we drop the rows having NaN values and then reset the indices using the method reset_index()
df = df.reset_index(drop=True)
Python3
# importing libraries import pandas as pd import numpy as np car = { 'Year of Launch' : [ 1999 , np.nan, 1986 , 2020 , np.nan, 1991 , 2007 , 2011 , 2001 , 2017 ], 'Engine Number' : [np.nan, 15 , 22 , 43 , 44 , np.nan, 55 , np.nan, 57 , np.nan], 'Chasis Unique Id' : [ 4023 , np.nan, 3115 , 4522 , 3643 , 3774 , 2955 , np.nan, 3587 , np.nan]} # Create the dataframe df = pd.DataFrame(car, columns = [ 'Year of Launch' , 'Engine Number' , 'Chasis Unique Id' ]) # dropping the rows having NaN values df = df.dropna() # To reset the indices df = df.reset_index(drop = True ) # Print the dataframe df |
Output:
Example 3:
In the example, we used thresh = 2 inside the df.dropna() function which means it will drop all those rows where Nan/NaT values are 2 or more than 2, others will remain as it is.
Python3
# Importing libraries import pandas as pd import numpy as np # Creating a dictionary dit = { 'August' : [ 10 , np.nan, 34 , 4.85 , 71.2 , 1.1 ], 'September' : [np.nan, 54 , 68 , 9.25 , pd.NaT, 0.9 ], 'October' : [np.nan, 5.8 , 8.52 , np.nan, 1.6 , 11 ], 'November' : [pd.NaT, 5.8 , 50 , 8.9 , 77 , pd.NaT]} # Converting it to data frame df = pd.DataFrame(data = dit) # Dropping the rows having NaN/NaT values # when threshold of nan values is 2 df = df.dropna(thresh = 2 ) # Resetting the indices using df.reset_index() df = df.reset_index(drop = True ) df |
Output:
Example 4:
In the example, we use subset = [‘October’] inside the df.dropna() function which means it will remove all rows having Nan/NaT values under the label ‘October’.
Python3
# Importing libraries import pandas as pd import numpy as np # Creating a dictionary dit = { 'August' : [ 10 , np.nan, 34 , 4.85 , 71.2 , 1.1 ], 'September' : [np.nan, 54 , 68 , 9.25 , pd.NaT, 0.9 ], 'October' : [np.nan, 5.8 , 8.52 , np.nan, 1.6 , 11 ], 'November' : [pd.NaT, 5.8 , 50 , 8.9 , 77 , pd.NaT]} # Converting it to data frame df = pd.DataFrame(data = dit) # Dropping the rowns having NaN/NaT values # under certain label df = df.dropna(subset = [ 'October' ]) # Resetting the indices using df.reset_index() df = df.reset_index(drop = True ) df |
Output: