How to Convert Categorical Variable to Numeric in Pandas?

28 July 2024

1

In this article, we will learn how to convert a categorical variable into a Numeric by using pandas.

When we look at the categorical data, the first question that arises to anyone is how to handle those data, because machine learning is always good at dealing with numeric values. We could make machine learning models by using text data. So, to make predictive models we have to convert categorical data into numeric form.

Method 1: Using replace() method

Replacing is one of the methods to convert categorical terms into numeric. For example, We will take a dataset of people’s salaries based on their level of education. This is an ordinal type of categorical variable. We will convert their education levels into numeric terms.

Syntax:

replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method=’pad’)

Consider the given data:

Data

Python3

#import pandas
import pandas as pd
 
# read csv file
df = pd.read_csv('data.csv')
 
# replacing values
df['Education'].replace(['Under-Graduate', 'Diploma '],
                        [0, 1], inplace=True)

Output:

In the above program, we have replaced “under-graduate” as 0 and “Diploma” as 1.

Method 2: Using get_dummies()

Replacing the values is not the most efficient way to convert them. Pandas provide a method called get_dummies which will return the dummy variable columns.

Syntax: pandas.get_dummies(data, prefix=None, prefix_sep=’_’, dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)

Stepwise Implementation

Step 1: Create dummies columns

get_dummies() method is called and the parameter name of the column is given. This method will return the dummy variable columns. In this case, we have 3 types of Categorical variables so, it returned three columns

Step 2: Concatenate

Syntax: pandas.concat(objs, axis=0, join=’outer’, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True

The next step is to concatenate the dummies columns into the data frame. In pandas, there is a concat() method, which you can call to join two data frames. You should supply it with the name of two data frames and the axis. This will give you the merged data frame.

Step 3: Drop columns

We have to drop the original ‘education’ column because we have the dummy variable column and we don’t need the text column. And we might also drop one of the dummy variable columns So that we could avoid the dummy variable trap which could mess up the model. After dropping the columns, the desired dataframe is obtained

We will implement this at code

Python3

#import pandas
import pandas as pd
 
# read csv
df = pd.read_csv('salary.csv')
 
# get the dummies and store it in a variable
dummies = pd.get_dummies(df.Education)
 
# Concatenate the dummies to original dataframe
merged = pd.concat([df, dummies], axis='columns')
 
# drop the values
merged.drop(['Education', 'Under-Graduate'], axis='columns')
 
# print the dataframe
print(merged)

Output:

How to Convert Categorical Variable to Numeric in Pandas?

Method 1: Using replace() method

Python3

Method 2: Using get_dummies()

Stepwise Implementation

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

5 Best Cloud Antiviruses in 2025: Expert Ranked by Tyler Cross

The wrong time for the Galaxy S25 Edge? [Video]

Samsung may be losing love for under display cameras

OnePlus 13 users can now test drive Android 16

Recent Comments

EDITOR PICKS

5 Best Cloud Antiviruses in 2025: Expert Ranked by Tyler Cross

The wrong time for the Galaxy S25 Edge? [Video]

Samsung may be losing love for under display cameras

POPULAR POSTS

5 Best Cloud Antiviruses in 2025: Expert Ranked by Tyler Cross

The wrong time for the Galaxy S25 Edge? [Video]

Samsung may be losing love for under display cameras

POPULAR CATEGORY

ABOUT US

FOLLOW US