Friday, December 27, 2024
Google search engine
HomeLanguagesData Pre-Processing with Sklearn using Standard and Minmax scaler

Data Pre-Processing with Sklearn using Standard and Minmax scaler

Data Scaling is a data preprocessing step for numerical features. Many machine learning algorithms like Gradient descent methods, KNN algorithm, linear and logistic regression, etc. require data scaling to produce good results. Various scalers are defined for this purpose. This article concentrates on Standard Scaler and Min-Max scaler. The task here is to discuss what they mean and how they are implemented using in-built functions that come with this package.

Apart from supporting library functions other functions that will be used to achieve the functionality are:

  • The fit(data) method is used to compute the mean and std dev for a given feature so that it can be used further for scaling.
  • The transform(data) method is used to perform scaling using mean and std dev calculated using the .fit() method.
  • The fit_transform() method does both fit and transform.

Standard Scaler

Standard Scaler helps to get standardized distribution, with a zero mean and standard deviation of one (unit variance). It standardizes features by subtracting the mean value from the feature and then dividing the result by feature standard deviation. 

The standard scaling is calculated as: 

z = (x - u) / s

Where,

  • z is scaled data.
  • x is to be scaled data.
  • u is the mean of the training samples
  • s is the standard deviation of the training samples.

Sklearn preprocessing supports StandardScaler() method to achieve this directly in merely 2-3 steps.

Syntax: class sklearn.preprocessing.StandardScaler(*, copy=True, with_mean=True, with_std=True)

Parameters:

  • copy: If False, inplace scaling is done. If True , copy is created instead of inplace scaling.
  • with_mean: If True, data is centered before scaling.
  • with_std: If True, data is scaled to unit variance.

Approach:

  • Import module
  • Create data
  • Compute required values
  • Print processed data

Example:

Python3




# import module
from sklearn.preprocessing import StandardScaler
 
# create data
data = [[11, 2], [3, 7], [0, 10], [11, 8]]
 
# compute required values
scaler = StandardScaler()
model = scaler.fit(data)
scaled_data = model.transform(data)
 
# print scaled data
print(scaled_data)


Output:

[[ 0.97596444 -1.61155897]

 [-0.66776515  0.08481889]

 [-1.28416374  1.10264561]

 [ 0.97596444  0.42409446]]

MinMax Scaler

There is another way of data scaling, where the minimum of feature is made equal to zero and the maximum of feature equal to one. MinMax Scaler shrinks the data within the given range, usually of 0 to 1. It transforms data by scaling features to a given range. It scales the values to a specific value range without changing the shape of the original distribution.

The MinMax scaling is done using:

x_std = (x – x.min(axis=0)) / (x.max(axis=0) – x.min(axis=0))

x_scaled = x_std * (max – min) + min

Where,

  • min, max = feature_range
  • x.min(axis=0) : Minimum feature value
  • x.max(axis=0):Maximum feature value

Sklearn preprocessing defines MinMaxScaler() method to achieve this.

Syntax: class sklearn.preprocessing.MinMaxScaler(feature_range=0, 1, *, copy=True, clip=False)

Parameters:

  • feature_range: Desired range of scaled data. The default range for the feature returned by MinMaxScaler is  0 to 1. The range is provided in tuple form as (min,max).
  • copy: If False, inplace scaling is done. If True , copy is created instead of inplace scaling.
  • clip: If True, scaled data is clipped to provided feature range.

Approach:

  • Import module
  • Create data
  • Scale data
  • print scaled data

Example:

Python3




# import module
from sklearn.preprocessing import MinMaxScaler
 
# create data
data = [[11, 2], [3, 7], [0, 10], [11, 8]]
 
# scale features
scaler = MinMaxScaler()
model=scaler.fit(data)
scaled_data=model.transform(data)
 
# print scaled features
print(scaled_data)


 

 

Output:

 

[[1.         0.        ]

[0.27272727 0.625     ]

[0.         1.        ]

[1.         0.75      ]]

 

RELATED ARTICLES

Most Popular

Recent Comments