Missing data imputation with fancyimpute

27 July 2024

1

August 2013 — Created by Jason Krieger. Available for download from www.kriegs.net.

In a real world dataset, there will always be some data missing. This mainly associates with how the data was collected. Missing data plays an important role creating a predictive model, because there are algorithms which does not perform very well with missing dataset.

Fancyimput

fancyimpute is a library for missing data imputation algorithms. Fancyimpute use machine learning algorithm to impute missing values. Fancyimpute uses all the column to impute the missing values. There are two ways missing data can be imputed using Fancyimpute

KNN or K-Nearest Neighbor
MICE or Multiple Imputation by Chained Equation

K-Nearest Neighbor

To fill out the missing values KNN finds out the similar data points among all the features. Then it took the average of all the points to fill in the missing values.

Python3

import pandas as pd 
import numpy as np 
# importing the KNN from fancyimpute library 
from fancyimpute import KNN 
  
df = pd.DataFrame([[np.nan, 2, np.nan, 0], 
                   [3, 4, np.nan, 1], 
                   [np.nan, np.nan, np.nan, 5], 
                   [np.nan, 3, np.nan, 4], 
                   [5,      7,  8,     2], 
                   [2,      5,  7,     9]], 
                  columns = list('ABCD')) 
  
# printing the dataframe 
print(df) 
  
# calling the KNN class 
knn_imputer = KNN() 
# imputing the missing value with knn imputer 
df = knn_imputer.fit_transform(df) 
  
# printing dataframe 
print(df) 

Output:

    A    B    C  D
0  NaN  2.0  NaN  0
1  3.0  4.0  NaN  1
2  NaN  NaN  NaN  5
3  NaN  3.0  NaN  4
4  5.0  7.0  8.0  2
5  2.0  5.0  7.0  9
Imputing row 1/6 with 2 missing, elapsed time: 0.001
[[3.23556938 2.         7.75630267 0.]
 [3.         4.         7.825      1.]
 [3.67647071 3.46386587 7.64000033 5.]
 [3.35514006 3.         7.59183674 4.]
 [5.         7.         8.         2.]
 [2.         5.         7.         9.]]

Multiple Imputation by Chained Equation:

MICE uses multiple imputation instead of single imputation which results in statistical uncertainty. MICE perform multiple regression over the sample data and take averages of them

Python3

import pandas as pd 
import numpy as np 
# importing the MICE from fancyimpute library 
from fancyimpute import IterativeImputer 
  
df = pd.DataFrame([[np.nan, 2, np.nan, 0], 
                   [3, 4, np.nan, 1], 
                   [np.nan, np.nan, np.nan, 5], 
                   [np.nan, 3, np.nan, 4], 
                   [5,      7,  8,     2], 
                   [2,      5,  7,     9]], 
                  columns = list('ABCD')) 
  
# printing the dataframe 
print(df) 
  
# calling the  MICE class 
mice_imputer = IterativeImputer() 
# imputing the missing value with mice imputer 
df = mice_imputer.fit_transform(df) 
  
# printing dataframe 
print(df) 

Output

    A    B    C   D
0  NaN  2.0  NaN  0
1  3.0  4.0  NaN  1
2  NaN  NaN  NaN  5
3  NaN  3.0  NaN  4
4  5.0  7.0  8.0  2
5  2.0  5.0  7.0  9
[[3.27262261 2.         7.9809332  0 ]
 [3.         4.         7.9193547  1.]
 [2.91717117 4.35730239 7.47523962 5.]
 [2.77722048 3.         7.53760743 4.]
 [5.         7.         8.         2.]
 [2.         5.         7.         9.]]

Missing data imputation with fancyimpute

Fancyimput

K-Nearest Neighbor

Python3

Output:

Multiple Imputation by Chained Equation:

Python3

Output

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

5 Best Malwarebytes Alternatives in 2024: Top Choices by Manual Thomas

3 Best Antiviruses for Amazon Fire in 2024: Tested by Sam Boyd

5 Best Free Firewall Programs in 2024: Safe & Secure by Tyler Cross

This coveted Galaxy S25 Ultra feature might not hit any other phones all year, again

Recent Comments

EDITOR PICKS

5 Best Malwarebytes Alternatives in 2024: Top Choices by Manual Thomas

3 Best Antiviruses for Amazon Fire in 2024: Tested by Sam Boyd

5 Best Free Firewall Programs in 2024: Safe & Secure by Tyler Cross

POPULAR POSTS

5 Best Malwarebytes Alternatives in 2024: Top Choices by Manual Thomas

3 Best Antiviruses for Amazon Fire in 2024: Tested by Sam Boyd

5 Best Free Firewall Programs in 2024: Safe & Secure by Tyler Cross

POPULAR CATEGORY

ABOUT US

FOLLOW US