In this article, we will cover how to normalize a NumPy array so the values range exactly between 0 and 1.
Normalization is done on the data to transform the data to appear on the same scale across all the records. After normalization, The minimum value in the data will be normalized to 0 and the maximum value is normalized to 1. All the other values will range from 0 to 1. Normalization is necessary for the data represented in different scales. Because Machine Learning models may get over-influenced by the parameter with higher values. There are different ways to normalize the data. One of the standard procedures is the min-max value approach.
Normalization using Min Max Values
Here normalization of data can be done by subtracting the data with the minimum value in the data and dividing the result by the difference between the maximum value and the minimum value in the given data. we will look into more deep to the code for a better understanding.
Example
The maximum value and minimum value in a NumPy array can be determined by the min() and max(). The formula for normalization using min-max values is given below
Normalized data= ( data- min(data) )/( max(data)-min(data) )
Python3
# import necessary packages import numpy as np # create an array data = np.array([[ 10 , 20 ], [ 30 , 40 ], [ 5 , 15 ], [ 0 , 10 ]]) normalizedData = (data - np. min (data)) / (np. max (data) - np. min (data)) # normalized data using min max value print (normalizedData) |
Output
[[0.25 0.5 ] [0.75 1. ] [0.125 0.375] [0. 0.25 ]]
There are other ways too to normalize the data. They are:
- Normalization using sklearn MinMaxScaler
- Normalization using numpy.linalg.norm
- Normalization using Maths formula
Normalization using sklearn MinMaxScaler
In Python, sklearn module provides an object called MinMaxScaler that normalizes the given data using minimum and maximum values. Here fit_tranform method scales the data between 0 and 1 using the MinMaxScaler object.
Python3
# import necessary packages import numpy as np from sklearn import preprocessing as p # create an array data = np.array([[ 10 , 20 ], [ 30 , 40 ], [ 5 , 15 ], [ 0 , 10 ]]) min_max_scaler = p.MinMaxScaler() normalizedData = min_max_scaler.fit_transform(data) # normalized data using MinMaxScaler print (normalizedData) |
Output
[[0.33333333 0.33333333] [1. 1. ] [0.16666667 0.16666667] [0. 0. ]]
Normalization using numpy.linalg.norm
The NumPy library provides a method called norm that returns one of eight different matrix norms or one of an infinite number of vector norms. It entirely depends on the ord parameter in the norm method. By default, the norm considers the Frobenius norm. The data here is normalized by dividing the given data with the returned norm by the norm method.
Python3
# import necessary packages import numpy as np # create an array data = np.array([[ 10 , 20 ], [ 30 , 40 ], [ 5 , 15 ], [ 0 , 10 ]]) normalizedData = data / np.linalg.norm(data) # normalized data using linalg.norm print (normalizedData) |
Output
[[0.17277369 0.34554737] [0.51832106 0.69109474] [0.08638684 0.25916053] [0. 0.17277369]]
Normalization using Maths Formula
Here the data is normalized by diving the data with the square root of the sum of squares of given data. In order to implement this, a simple NumPy library is required which provides square root and sum methods that help to reduce lines of code. Below is the implemented code to normalize the data using the sum of squares of data
Python3
# import necessary packages import numpy as np # create an array data = np.array([[ 10 , 20 ], [ 30 , 40 ], [ 5 , 15 ], [ 0 , 10 ]]) normalizedData = data / np.sqrt(np. sum (data * * 2 )) # normalized data using sum of squares print (normalizedData) |
Output
[[0.17277369 0.34554737] [0.51832106 0.69109474] [0.08638684 0.25916053] [0. 0.17277369]]