It’s difficult to create machine learning models that can’t have features that have categorical values, such models cannot function. categorical variables have string-type values. thus we have to convert string values to numbers. This can be accomplished by creating new features based on the categories and setting values to them. In this article, we are going to see how to convert Categorical features to Numerical Features in Python
Stepwise Implementation
Step 1: Import the necessary packages and modules
Python3
# import packages and modules import numpy as np import pandas as pd from sklearn import preprocessing |
Step 2: Import the CSV file
We will use the pandas read_csv() method to import the CSV file. To view and download the CSV file used click here.
Python3
# import the CSV file df = pd.read_csv( 'cluster_mpg.csv' ) print (df.head()) |
Output:
Step 3: Get all features with categorical values
We use df.info() to find categorical features. Categorical features have Dtype as “object”.
Python3
df.info() |
Output:
In the given database columns “origin” and “name” is object type.
Step 4: Convert string values of origin column to numerical values
We will fit the “origin” column using preprocessing.LabelEncoder().fit() method.
Python3
label_encoder = preprocessing.LabelEncoder() label_encoder.fit(df[ "origin" ]) |
Step 5: Get the unique values out of the categorical features
We will use label_encoder.classes_ attribute for this purpose.
classes_:ndarray of shape (n_classes,)
Holds the label for each class.
Python3
# finding the unique classes print ( list (label_encoder.classes_)) print () |
Output
['europe', 'japan', 'usa']
Step 6: Transforming the categorical values
Python3
# values after transforming the categorical column. print (label_encoder.transform(df[ "origin" ])) |
Output: