Imputing Missing Values Before Building an Estimator in Scikit Learn

20 July 2024

1

The missing values in a dataset can cause problems during the building of an estimator. Scikit Learn provides different ways to handle missing data, which include imputing missing values. Imputing involves filling in missing data with estimated values that are based on other available data in the dataset.

Steps needed:

The following steps are required for imputing missing values before building an estimator in Scikit Learn:

Import the required libraries: first You need to import the required libraries, including Scikit Learn and NumPy.
Load the dataset: Then load the dataset which contains missing values.
Identify missing values: After that identify missing values in the dataset.
Impute missing values: We use Scikit Learn’s imputer class to impute missing values in the dataset.
Build the estimator: To build the estimator, we are using here the Linear regression algorithm.

Examples

Let’s consider an example of a dataset containing missing values. The following code imputes missing values in the dataset using Scikit Learn’s SimpleImputer class:

Python

# Import the required libraries
from sklearn.impute import SimpleImputer
import numpy as np
 
# Load the dataset
X = np.array([[1, 2, np.nan],
              [3, np.nan, 4],
              [5, 6, np.nan],
              [7, 8, 9]])
Y = np.array([14, 20, 29, 40])
 
# Identify missing values
print('Check Null values \n',np.isnan(X))
 
# Impute missing values
imputer = SimpleImputer(strategy='mean')
X_imputed = imputer.fit_transform(X)
 
# Build the estimator
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_imputed, Y)
 
print('\nCoefficient :',regressor.coef_)
print('Intercempt :',regressor.intercept_)
 
# Prediction
Y_pred = X_imputed @ regressor.coef_ + regressor.intercept_
print("Prediction :",Y_pred )

Output :

Check Null values 
 [[False False  True]
 [False  True False]
 [False False  True]
 [False False False]]

Coefficient : [2.25 1.5  1.4 ]
Intercempt : -0.3499999999999943
Prediction : [14. 20. 29. 40.]

In the above example, we first loaded a dataset which containing missing values. We then identified missing values in the following dataset using the NumPy library. We then used Scikit Learn’s SimpleImputer class to impute missing values in the dataset. Finally, we built a linear regression estimator using the imputed dataset.

Imputing Missing Values Before Building an Estimator in Scikit Learn

Related topic of concepts:

Steps needed:

Examples

Python

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Interview With Willem Dewulf – CEO of ProBackup by Shauli Zacks

Recent Comments

EDITOR PICKS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR POSTS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR CATEGORY

ABOUT US

FOLLOW US