How to fill NAN values with mean in Pandas?

28 July 2024

2

It is a quite compulsory process to modify the data we have as the computer will show you an error of invalid input as it is quite impossible to process the data having ‘NaN’ with it and it is not quite practically possible to manually change the ‘NaN’ to its mean. Therefore, to resolve this problem we process the data and use various functions by which the ‘NaN’ is removed from our data and is replaced with the particular mean and ready be get process by the system.

Mainly there are two steps to remove ‘NaN’ from the data-

Using Dataframe.fillna() from the pandas’ library.
Using SimpleImputer from sklearn.impute (this is only useful if the data is present in the form of csv file)

Using Dataframe.fillna() from the pandas’ library

With the help of Dataframe.fillna() from the pandas’ library, we can easily replace the ‘NaN’ in the data frame.

Procedure:

To calculate the mean() we use the mean function of the particular column
Now with the help of fillna() function we will change all ‘NaN’ of that particular column for which we have its mean.
We will print the updated column.

Syntax: df.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)

Parameter:

value : Value to use to fill holes

method : Method to use for filling holes in reindexed Series pad / fill

axis : {0 or ‘index’}

inplace : If True, fill in place.

limit : If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill

downcast : dict, default is None

Example 1:

To calculate the mean() we use the mean function of the particular column
Then apply fillna() function, we will change all ‘NaN’ of that particular column for which we have its mean and print the updated data frame.

Python3

import numpy as np
import pandas as pd
  
# A dictionary with list as values
GFG_dict = { 'G1': [10, 20,30,40],
                'G2': [25, np.NaN, np.NaN, 29],
                'G3': [15, 14, 17, 11],
                'G4': [21, 22, 23, 25]}
  
# Create a DataFrame from dictionary
gfg = pd.DataFrame(GFG_dict)
  
#Finding the mean of the column having NaN
mean_value=gfg['G2'].mean()
  
# Replace NaNs in column S2 with the
# mean of values in the same column
gfg['G2'].fillna(value=mean_value, inplace=True)
print('Updated Dataframe:')
print(gfg)

Output:

Example 2:

Python3

import pandas as pd
import numpy as np
  
df = pd.DataFrame({
    'ID': [10, np.nan, 20, 30, np.nan, 50, np.nan,
           150, 200, 102, np.nan, 130],
      
    'Sale': [10, 20, np.nan, 11, 90, np.nan,
             55, 14, np.nan, 25, 75, 35],
      
    'Date': ['2020-10-05', '2020-09-10', np.nan,
             '2020-08-17', '2020-09-10', '2020-07-27', 
             '2020-09-10', '2020-10-10', '2020-10-10',
             '2020-06-27', '2020-08-17', '2020-04-25'],
})
  
df['Sale'].fillna(int(df['Sale'].mean()), inplace=True)
print(df)

Output:

Using SimpleImputer() from sklearn.impute

This function Imputation transformer for completing missing values which provide basic strategies for imputing missing values. These values can be imputed with a provided constant value or using the statistics (mean, median, or most frequent) of each column in which the missing values are located. This class also allows for different missing value encoding.

Syntax: class sklearn.impute.SimpleImputer(*, missing_values=nan, strategy=’mean’, fill_value=None, verbose=0, copy=True, add_indicator=False)

Parameters:

missing_values: int float, str, np.nan or None, default=np.nan

strategy string: default=’mean’

fill_valuestring or numerical value: default=None

verbose: integer, default=0

copy: boolean, default=True

add_indicator: boolean, default=False

Note : Data Used in below examples is here

Example 1 : (Computation on PID column)

Python3

import pandas as pd
import numpy as np
  
Dataset= pd.read_csv("property data.csv")
X = Dataset.iloc[:,0].values
  
# To calculate mean use imputer class
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer = imputer.fit(X)
  
X = imputer.transform(X)
print(X)

Output:

Example 2 : (Computation on ST_NUM column)

Python3

from sklearn.impute import SimpleImputer
import pandas as pd
import numpy as np
  
Dataset = pd.read_csv("property data.csv")
X = Dataset.iloc[:, 1].values
  
# To calculate mean use imputer class
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer = imputer.fit(X)
X = imputer.transform(X)
print(X)

Output:

How to fill NAN values with mean in Pandas?

Using Dataframe.fillna() from the pandas’ library

Python3

Python3

Using SimpleImputer() from sklearn.impute

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Vietnam’s Success in Software Outsourcing

Install Python 3 / Python 2.7 on Rocky Linux 8 |AlmaLinux 8

How To Manage Angular JS Projects using Angular CLI

How To Install PHP 8.2 on Ubuntu 22.04|20.04|18.04

Recent Comments

EDITOR PICKS

Vietnam’s Success in Software Outsourcing

Install Python 3 / Python 2.7 on Rocky Linux 8 |AlmaLinux 8

How To Manage Angular JS Projects using Angular CLI

POPULAR POSTS

Vietnam’s Success in Software Outsourcing

Install Python 3 / Python 2.7 on Rocky Linux 8 |AlmaLinux 8

How To Manage Angular JS Projects using Angular CLI

POPULAR CATEGORY

ABOUT US

FOLLOW US