Python | Imputation using the KNNimputer()

27 July 2024

0

KNNimputer is a scikit-learn class used to fill out or predict the missing values in a dataset. It is a more useful method which works on the basic approach of the KNN algorithm rather than the naive approach of filling all the values with mean or the median. In this approach, we specify a distance from the missing values which is also known as the K parameter. The missing value will be predicted in reference to the mean of the neighbours. It is implemented by the KNNimputer() method which contains the following arguments:

n_neighbors: number of data points to include closer to the missing value. metric: the distance metric to be used for searching. values – {nan_euclidean. callable} by default – nan_euclidean weights: to determine on what basis should the neighboring values be treated values -{uniform , distance, callable} by default- uniform.

Code: Python code to illustrate KNNimputor class

python3

# import necessary libraries
import numpy as np
import pandas as pd
 
# import the KNNimputer class
from sklearn.impute import KNNImputer
 
 
# create dataset for marks of a student
dict = {'Maths':[80, 90, np.nan, 95],
        'Chemistry': [60, 65, 56, np.nan],
        'Physics':[np.nan, 57, 80, 78],
       'Biology' : [78,83,67,np.nan]}
 
# creating a data frame from the list
Before_imputation = pd.DataFrame(dict)
#print dataset before imputation
print("Data Before performing imputation\n",Before_imputation)
 
# create an object for KNNImputer
imputer = KNNImputer(n_neighbors=2)
After_imputation = imputer.fit_transform(Before_imputation)
# print dataset after performing the operation
print("\n\nAfter performing imputation\n",After_imputation)

Output:

Data Before performing imputation
    Maths  Chemistry  Physics  Biology
0   80.0       60.0      NaN     78.0
1   90.0       65.0     57.0     83.0
2    NaN       56.0     80.0     67.0
3   95.0        NaN     78.0      NaN


After performing imputation
 [[80.  60.  68.5 78. ]
 [90.  65.  57.  83. ]
 [87.5 56.  80.  67. ]
 [95.  58.  78.  72.5]]

Note: After transforming the data becomes a numpy array.

Python | Imputation using the KNNimputer()

python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

NordVPN Not Working in China? Try These Tips by Tim Mocan

Interview with Ihor Demkovych – Chief Security Officer and Head of Engineering at Geniusee by Shauli Zacks

6 Best (REALLY FREE) iPad & iPhone Antivirus Apps in 2025 by Katarina Glamoslija

The Evolution of Phishing Attacks and How to Combat Them Copy by

Recent Comments

EDITOR PICKS

NordVPN Not Working in China? Try These Tips by Tim Mocan

Interview with Ihor Demkovych – Chief Security Officer and Head of Engineering at Geniusee by Shauli Zacks

6 Best (REALLY FREE) iPad & iPhone Antivirus Apps in 2025 by Katarina Glamoslija

POPULAR POSTS

NordVPN Not Working in China? Try These Tips by Tim Mocan

Interview with Ihor Demkovych – Chief Security Officer and Head of Engineering at Geniusee by Shauli Zacks

6 Best (REALLY FREE) iPad & iPhone Antivirus Apps in 2025 by Katarina Glamoslija

POPULAR CATEGORY

ABOUT US

FOLLOW US