Normalize A Column In Pandas

28 July 2024

1

In this article, we will learn how to normalize a column in Pandas. Let’s discuss some concepts first :

Pandas: Pandas is an open-source library that’s built on top of the NumPy library. It is a Python package that provides various data structures and operations for manipulating numerical data and statistics. It’s mainly popular for importing and analyzing data much easier. Pandas is fast and it’s high-performance & productive for users.
Data Normalization: Data Normalization could also be a typical practice in machine learning which consists of transforming numeric columns to a standard scale. In machine learning, some feature values differ from others multiple times. The features with higher values will dominate the learning process.

Steps Needed

Here, we will apply some techniques to normalize the column values and discuss these with the help of examples. For this, let’s understand the steps needed for normalization with Pandas.

Import Library (Pandas)
Import / Load / Create data.
Use the technique to normalize the column.

Examples:

Here, we create data by some random values and apply some normalization techniques on a column.

Python3

# importing packages
import pandas as pd
  
# create data
df = pd.DataFrame({'Column 1':[200,-4,90,13.9,5,
                               -90,20,300.7,30,-200,400],
                     
                   'Column 2':[20,30,23,45,19,38,
                               25,45,34,37,12]})
  
# view data
display(df)

Output:

Dataset consists of two columns where Column 1 is not normalized but Column 2 is normalized. So we apply normalization techniques in Column 1.

Python3

df['Column 1'].plot(kind = 'bar')

Output:

Using The maximum absolute scaling:

The maximum absolute scaling rescales each feature between -1 and 1 by dividing every observation by its maximum absolute value. We can apply the maximum absolute scaling in Pandas using the .max() and .abs() methods, as shown below.

Python3

# copy the data
df_max_scaled = df.copy()
  
# apply normalization techniques on Column 1
column = 'Column 1'
df_max_scaled[column] = df_max_scaled[column] /df_max_scaled[column].abs().max()
  
# view normalized data
display(df_max_scaled)

Output:

Using The min-max feature scaling:

The min-max approach (often called normalization) rescales the feature to a hard and fast range of [0,1] by subtracting the minimum value of the feature then dividing by the range. We can apply the min-max scaling in Pandas using the .min() and .max() methods.

Python3

# copy the data
df_min_max_scaled = df.copy()
  
# apply normalization techniques by Column 1
column = 'Column 1'
df_min_max_scaled[column] = (df_min_max_scaled[column] - df_min_max_scaled[column].min()) / (df_min_max_scaled[column].max() - df_min_max_scaled[column].min())    
  
# view normalized data
display(df_min_max_scaled)

Output :

Let’s check with this plot.

Python3

df_min_max_scaled['Column 1'].plot(kind = 'bar')

Using The z-score method:

The z-score method (often called standardization) transforms the info into distribution with a mean of 0 and a typical deviation of 1. Each standardized value is computed by subtracting the mean of the corresponding feature then dividing by the quality deviation.

Python3

# copy the data
df_z_scaled = df.copy()
  
# apply normalization technique to Column 1
column = 'Column 1'
df_z_scaled[column] = (df_z_scaled[column] - df_z_scaled[column].mean()) / df_z_scaled[column].std()    
  
# view normalized data  
display(df_z_scaled)

Output :

Let’s check with this plot.

Python3

df_z_scaled['Column 1'].plot(kind = 'bar')

Using sklearn:

Transform features by scaling each feature to a given range. This estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. between zero and one. Here, we will use minmax scaler.

Python3

from sklearn.preprocessing import MinMaxScaler
import numpy as np
  
# copy the data
df_sklearn = df.copy()
  
# apply normalization techniques
column = 'Column 1'
df_sklearn[column] = MinMaxScaler().fit_transform(np.array(df_sklearn[column]).reshape(-1,1))
  
# view normalized data  
display(df_sklearn)

Output :

Let’s check with this plot:

Python3

df_sklearn['Column 1'].plot(kind = 'bar')

Normalize A Column In Pandas

Steps Needed

Examples:

Python3

Python3

Using The maximum absolute scaling:

Python3

Using The min-max feature scaling:

Python3

Python3

Using The z-score method:

Python3

Python3

Using sklearn:

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to Unban Someone on Twitch in 2025: Quick & Easy by Penka Hristovska

Best VPNs for WhatsApp in 2025: Fast and Reliable by Danica Djokic

How to Change Facebook Location in 2025 + Marketplace & Dating by Danica Djokic

Is Akinator Safe for Kids? What Parents Need to Know in 2025 by Penka Hristovska

Recent Comments

EDITOR PICKS

How to Unban Someone on Twitch in 2025: Quick & Easy by Penka Hristovska

Best VPNs for WhatsApp in 2025: Fast and Reliable by Danica Djokic

How to Change Facebook Location in 2025 + Marketplace & Dating by Danica Djokic

POPULAR POSTS

How to Unban Someone on Twitch in 2025: Quick & Easy by Penka Hristovska

Best VPNs for WhatsApp in 2025: Fast and Reliable by Danica Djokic

How to Change Facebook Location in 2025 + Marketplace & Dating by Danica Djokic

POPULAR CATEGORY

ABOUT US

FOLLOW US