Label Encoding in Python

28 July 2024

2

In machine learning projects, we usually deal with datasets having different categorical columns where some columns have their elements in the ordinal variable category for e.g a column income level having elements as low, medium, or high in this case we can replace these elements with 1,2,3. where 1 represents ‘low’ 2 ‘medium’ and 3′ high’. Through this type of encoding, we try to preserve the meaning of the element where higher weights are assigned to the elements having higher priority.

Label Encoding

Label Encoding is a technique that is used to convert categorical columns into numerical ones so that they can be fitted by machine learning models which only take numerical data. It is an important pre-processing step in a machine-learning project.

Example Of Label Encoding

Suppose we have a column Height in some dataset that has elements as Tall, Medium, and short. To convert this categorical column into a numerical column we will apply label encoding to this column. After applying label encoding, the Height column is converted into a numerical column having elements 0,1, and 2 where 0 is the label for tall, 1 is the label for medium, and 2 is the label for short height.

Height	Height
Tall	0
Medium	1
Short	2

Example of Label Encoding

We will apply Label Encoding on the iris dataset on the target column which is Species. It contains three species Iris-setosa, Iris-versicolor, Iris-virginica.

Python3

# Import libraries 
import numpy as np
import pandas as pd
  
# Import dataset
df = pd.read_csv('../../data/Iris.csv')
  
df['species'].unique()

Output:

array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)

After applying Label Encoding with LabelEncoder() our categorical value will replace with the numerical value[int].

Python3

# Import label encoder
from sklearn import preprocessing
  
# label_encoder object knows 
# how to understand word labels.
label_encoder = preprocessing.LabelEncoder()
  
# Encode labels in column 'species'.
df['species']= label_encoder.fit_transform(df['species'])
  
df['species'].unique()

Output:

array([0, 1, 2], dtype=int64)

Limitation of label Encoding

Label encoding converts the categorical data into numerical ones, but it assigns a unique number(starting from 0) to each class of data. This may lead to the generation of priority issues during model training of data sets. A label with a high value may be considered to have high priority than a label having a lower value.

Example For Limitation of Label Encoding

An attribute having output classes Mexico, Paris, Dubai. On Label Encoding, this column lets Mexico is replaced with 0, Paris is replaced with 1, and Dubai is replaced with 2.

With this, it can be interpreted that Dubai has high priority than Mexico and Paris while training the model, But actually, there is no such priority relation between these cities here.

Label Encoding in Python

Label Encoding

Example Of Label Encoding

Example of Label Encoding

Python3

Python3

Limitation of label Encoding

Example For Limitation of Label Encoding

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

I tried a Xiaomi mid-ranger for the first time in years, and I’m glad the Pixel 8a exists in the US

Recent Comments

EDITOR PICKS

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

POPULAR POSTS

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

POPULAR CATEGORY

ABOUT US

FOLLOW US