Monday, September 29, 2025
HomeLanguagesML | Chi-square Test for feature selection

ML | Chi-square Test for feature selection

Feature selection is also known as attribute selection is a process of extracting the most relevant features from the dataset and then applying machine learning algorithms for the better performance of the model. A large number of irrelevant features increases the training time exponentially and increase the risk of overfitting.

Chi-square Test for Feature Extraction:
Chi-square test is used for categorical features in a dataset. We calculate Chi-square between each feature and the target and select the desired number of features with best Chi-square scores. It determines if the association between two categorical variables of the sample would reflect their real association in the population.
Chi- square score is given by :

where –

Observed frequency = No. of observations of class
Expected frequency = No. of expected observations of class if there was no relationship between the feature and the target.

Python Implementation of Chi-Square feature selection:




# Load libraries
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
  
# Load iris data
iris_dataset = load_iris()
  
# Create features and target
X = iris_dataset.data
y = iris_dataset.target
  
# Convert to categorical data by converting data to integers
X = X.astype(int)
  
# Two features with highest chi-squared statistics are selected
chi2_features = SelectKBest(chi2, k = 2)
X_kbest_features = chi2_features.fit_transform(X, y)
  
# Reduced features
print('Original feature number:', X.shape[1])
print('Reduced feature number:', X_kbest.shape[1])


Output:

Original feature number: 4
Reduced feature number : 2
RELATED ARTICLES

Most Popular

Dominic
32324 POSTS0 COMMENTS
Milvus
84 POSTS0 COMMENTS
Nango Kala
6695 POSTS0 COMMENTS
Nicole Veronica
11860 POSTS0 COMMENTS
Nokonwaba Nkukhwana
11918 POSTS0 COMMENTS
Shaida Kate Naidoo
6807 POSTS0 COMMENTS
Ted Musemwa
7073 POSTS0 COMMENTS
Thapelo Manthata
6763 POSTS0 COMMENTS
Umr Jansen
6771 POSTS0 COMMENTS