Python – Removing Constant Features From the Dataset

26 July 2024

1

Those features which contain constant values (i.e. only one value for all the outputs or target values) in the dataset are known as Constant Features. These features don’t provide any information to the target feature. These are redundant data available in the dataset. Presence of this feature has no effect on the target, so it is good to remove these features from the dataset. This process of removing redundant features and keeping only the necessary features in the dataset comes under the filter method of Feature Selection Methods.

Now Let’s see how we can remove constant features in Python.

Consider the self created dataset for the article:

Portal	Article’s_category	Views
Lazyroar	Python	545
Lazyroar	Data Science	1505
Lazyroar	Data Science	1157
Lazyroar	Data Science	2541
Lazyroar	Mathematics	5726
Lazyroar	Python	3125
Lazyroar	Data Science	3131
Lazyroar	Mathematics	6525
Lazyroar	Mathematics	15000

Code: Create DataFrame of the above data

# Import pandas to create DataFrame 
import pandas as pd 
  
# Make DataFrame of the given data 
data = pd.DataFrame({"Portal":['Lazyroar', 'Lazyroar', 'Lazyroar', 'Lazyroar', 'Lazyroar',  
                               'Lazyroar', 'Lazyroar', 'Lazyroar', 'Lazyroar'], 
                    "Article's_category":['Python', 'Data Science', 'Data Science', 'Data Science', 'Mathematics',  
                                          'Python', 'Data Science', 'Mathematics', 'Mathematics'], 
                    "Views":[545, 1505, 1157, 2541, 5726, 3125, 3131, 6525, 15000]}) 

Code: Convert the categorical data to numerical data

# import ordinal encoder from sklearn 
from sklearn.preprocessing import OrdinalEncoder 
ord_enc = OrdinalEncoder() 
  
# Transform the data 
data[["Portal","Article's_category"]] = ord_enc.fit_transform(data[["Portal","Article's_category"]]) 

Code: Fit the data to VarianceThreshold.

# import VarianceThreshold 
from sklearn.feature_selection import VarianceThreshold 
var_threshold = VarianceThreshold(threshold=0)   # threshold = 0 for constant 
  
# fit the data 
var_threshold.fit(data) 
  
# We can check the variance of different features as 
print(var_threshold.variances_) 

Output: Variance of different features:

[0.00000000e+00 6.17283951e-01 1.76746269e+07]

Code: Transform the data

print(var_threshold.transform(data)) 
print('*' * 10,"Separator",'*' * 10) 
  
# shapes of data before transformed and after transformed 
print("Earlier shape of data: ", data.shape) 
print("Shape after transformation: ", var_threshold.transform(data).shape) 

Output:

[[2.000e+00 5.450e+02]
 [0.000e+00 1.505e+03]
 [0.000e+00 1.157e+03]
 [0.000e+00 2.541e+03]
 [1.000e+00 5.726e+03]
 [2.000e+00 3.125e+03]
 [0.000e+00 3.131e+03]
 [1.000e+00 6.525e+03]
 [1.000e+00 1.500e+04]]
********** Separator **********
Earlier shape of data:  (9, 3)
Shape after transformation:  (9, 2)

As you can observe earlier we had 9 observations with 3 features.
After transformation we have 9 observations with 2 features. We can clearly observe that the removed feature is ‘Portal’.

Python – Removing Constant Features From the Dataset

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Vietnam’s Success in Software Outsourcing

Install Python 3 / Python 2.7 on Rocky Linux 8 |AlmaLinux 8

How To Manage Angular JS Projects using Angular CLI

How To Install PHP 8.2 on Ubuntu 22.04|20.04|18.04

Recent Comments

EDITOR PICKS

Vietnam’s Success in Software Outsourcing

Install Python 3 / Python 2.7 on Rocky Linux 8 |AlmaLinux 8

How To Manage Angular JS Projects using Angular CLI

POPULAR POSTS

Vietnam’s Success in Software Outsourcing

Install Python 3 / Python 2.7 on Rocky Linux 8 |AlmaLinux 8

How To Manage Angular JS Projects using Angular CLI

POPULAR CATEGORY

ABOUT US

FOLLOW US