Python | Replace NaN values with average of columns

27 July 2024

1

In machine learning and data analytics data visualization is one of the most important steps. Cleaning and arranging data is done by different algorithms. Sometimes in data sets, we get NaN (not a number) values which are not possible to use for data visualization. To solve this problem, one possible method is to replace nan values with an average of columns. Given below are a few methods to solve this problem.

Method #1: Using np.colmean and np.take

Python3

# Python code to demonstrate
# to replace nan values
# with an average of columns
 
import numpy as np
 
# Initialising numpy array
ini_array = np.array([[1.3, 2.5, 3.6, np.nan], 
                      [2.6, 3.3, np.nan, 5.5],
                      [2.1, 3.2, 5.4, 6.5]])
 
# printing initial array
print ("initial array", ini_array)
 
# column mean
col_mean = np.nanmean(ini_array, axis = 0)
 
# printing column mean
print ("columns mean", str(col_mean))
 
# find indices where nan value is present
inds = np.where(np.isnan(ini_array))
 
# replace inds with avg of column
ini_array[inds] = np.take(col_mean, inds[1])
 
# printing final array
print ("final array", ini_array)

Output:

initial array [[ 1.3  2.5  3.6  nan]
 [ 2.6  3.3  nan  5.5]
 [ 2.1  3.2  5.4  6.5]]
columns mean [ 2.   3.   4.5  6. ]

final array [[ 1.3  2.5  3.6  6. ]
 [ 2.6  3.3  4.5  5.5]
 [ 2.1  3.2  5.4  6.5]]

Method #2: Using np.ma and np.where

Python3

# Python code to demonstrate
# to replace nan values
# with average of columns
 
import numpy as np
 
# Initialising numpy array
ini_array = np.array([[1.3, 2.5, 3.6, np.nan],
                      [2.6, 3.3, np.nan, 5.5],
                      [2.1, 3.2, 5.4, 6.5]])
 
# printing initial array
print ("initial array", ini_array)
 
# replace nan with col means
res = np.where(np.isnan(ini_array), np.ma.array(ini_array,
               mask = np.isnan(ini_array)).mean(axis = 0), ini_array)   
 
# printing final array
print ("final array", res)

Output:

initial array [[ 1.3  2.5  3.6  nan]
 [ 2.6  3.3  nan  5.5]
 [ 2.1  3.2  5.4  6.5]]
final array [[ 1.3  2.5  3.6  6. ]
 [ 2.6  3.3  4.5  5.5]
 [ 2.1  3.2  5.4  6.5]]

Method #3: Using Naive and zip

Python3

# Python code to demonstrate
# to replace nan values
# with average of columns
 
import numpy as np
 
# Initialising numpy array
ini_array = np.array([[1.3, 2.5, 3.6, np.nan],
                      [2.6, 3.3, np.nan, 5.5],
                      [2.1, 3.2, 5.4, 6.5]])
 
# printing initial array
print ("initial array", ini_array)
 
# indices where values is nan in array
indices = np.where(np.isnan(ini_array))
 
# Iterating over numpy array to replace nan with values
for row, col in zip(*indices):
    ini_array[row, col] = np.mean(ini_array[
           ~np.isnan(ini_array[:, col]), col])
 
# printing final array
print ("final array", ini_array)

Output:

initial array [[ 1.3  2.5  3.6  nan]
 [ 2.6  3.3  nan  5.5]
 [ 2.1  3.2  5.4  6.5]]
final array [[ 1.3  2.5  3.6  6. ]
 [ 2.6  3.3  4.5  5.5]
 [ 2.1  3.2  5.4  6.5]]

Approach#4: Using list comprehension and built-in functions

This approach uses list comprehension and built-in functions to compute the column means and replace the NaN values in the array with the corresponding column means. It first computes the column means using a list comprehension with the help of the filter and zip functions. Then, it replaces the NaN values in the array with the corresponding column means using another list comprehension with the help of the enumerate function. Finally, it returns the modified list.

Algorithm

1. Compute the column means.
2. Replace the NaN values in the array with the corresponding column means using list comprehension and built-in functions.
3. Return the modified list.

Python3

def replace_nan_with_mean(arr):
    col_means = [sum(filter(lambda x: x is not None, col))/len(list(filter(lambda x: x is not None, col))) for col in zip(*arr)]
    for i in range(len(arr)):
        arr[i] = [col_means[j] if x is None else x for j, x in enumerate(arr[i])]
    return arr
arr=[[1.3, 2.5, 3.6, None],
     [2.6, 3.3, None, 5.5],
     [2.1, 3.2, 5.4, 6.5]]
print(replace_nan_with_mean(arr))

Output

[[1.3, 2.5, 3.6, 6.0], [2.6, 3.3, 4.5, 5.5], [2.1, 3.2, 5.4, 6.5]]

Time complexity: O(n*m), where n is the number of rows and m is the number of columns.

Auxiliary Space: O(n*m).

Approach#4: Using zip()+lambda()

Compute the column means excluding None values using a loop over the transposed array zip(*arr). Replace None values with column means using map() and lambda functions.

Algorithm

1. Initialize an empty list means to store the column means.
2. Loop over the transposed array zip(*arr) to iterate over columns.
3. For each column, filter out None values and compute the mean of the remaining values. If there are no remaining values, set the mean to 0.
4. Append the mean to the means list.
5. Use map() and lambda functions to replace None values with the corresponding column mean in each row of the array arr.
6. Return the modified array arr.

Python3

# initial array
arr = [[1.3, 2.5, 3.6, None],
       [2.6, 3.3, None, 5.5],
       [2.1, 3.2, 5.4, 6.5]]
 
# compute column means
means = []
for col in zip(*arr):
    values = [x for x in col if x is not None]
    means.append(sum(values)/len(values) if values else 0)
 
# replace NaN values with column means
arr = list(map(lambda row: [means[j] if x is None else x for j,x in enumerate(row)], arr))
 
# print final array
print(arr)

Output

[[1.3, 2.5, 3.6, 6.0], [2.6, 3.3, 4.5, 5.5], [2.1, 3.2, 5.4, 6.5]]

Time Complexity: O(n^2), where n is the number of values in the array.

Auxiliary Space: O(n), where n is the number of values in the array.

Python | Replace NaN values with average of columns

Python3

Python3

Python3

Approach#4: Using list comprehension and built-in functions

Algorithm

Python3

Approach#4: Using zip()+lambda()

Algorithm

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

I tried a Xiaomi mid-ranger for the first time in years, and I’m glad the Pixel 8a exists in the US

Recent Comments

EDITOR PICKS

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

POPULAR POSTS

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

POPULAR CATEGORY

ABOUT US

FOLLOW US