Saturday, November 16, 2024
Google search engine
HomeLanguagesML – Nearest Centroid Classifier

ML – Nearest Centroid Classifier

The Nearest Centroid (NC) Classifier is one of the most underrated and underutilised classifiers in Machine Learning. However, it is quite powerful and is highly efficient for certain Machine Learning classification tasks. The Nearest Centroid classifier is somewhat similar to the K-Nearest Neighbours classifier. To know more about the K-Nearest Neighbours (KNN) classifier, you can refer to the link below : 
K-Nearest-Neighbours/ 
An often-overlooked principle in Machine Learning is to build simple algorithms off of simple, yet meaningful data, that can do specific tasks efficiently, instead of using complex models. This is also called the principle of sufficiency in statistics. The Nearest Centroid classifier is arguably the simplest Classification algorithm in Machine Learning. The Nearest Centroid classifier works on a simple principle : Given a data point (observation), the Nearest Centroid classifier simply assign it the label (class) of the training sample whose mean or centroid is closest to it. 
When applied on text classification, the Nearest Centroid classifier is also called the Rocchio classifier. The scikit-learn library in Python offers a simple function to implement the Nearest Centroid Classifier. 
How the nearest centroid classifier works? 
Basically, what the nearest centroid classifier does can be explained in three steps: 
 

  • The centroid for each target class is computed while training.
  • After training, given any point, say ‘X’. The distances between the point X and each class’ centroid is calculated.
  • Out of all the calculated distances, the minimum distance is picked. The centroid to which the given point’s distance is minimum, it’s class is assigned to the given point.

The Nearest Centroid Classifier is quite easy to understand and is one of the simplest classifier algorithms. 
Implementation of Nearest Centroid Classifier in Python: 
For this example, we will be using the popular ‘iris’ dataset that is available in the scikit-learn library. After training the classifier, we will print the accuracy of the classifier on the training and test sets. Then, we print the classifier report. 
Code: Python code implementing NearestCentroid classifier 
 

python3




# Importing the required libraries
from sklearn.neighbors import NearestCentroid
from sklearn.datasets import load_iris
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
import pandas as pd
 
# Loading the dataset
dataset = load_iris()
 
# Separating data and target labels
X = pd.DataFrame(dataset.data)
y = pd.DataFrame(dataset.target)
 
# Splitting training and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, shuffle = True, random_state = 0)
 
# Creating the Nearest Centroid Classifier
model = NearestCentroid()
 
# Training the classifier
model.fit(X_train, y_train.values.ravel())
 
# Printing Accuracy on Training and Test sets
print(f"Training Set Score : {model.score(X_train, y_train) * 100} %")
print(f"Test Set Score : {model.score(X_test, y_test) * 100} %")
 
# Printing classification report of classifier on the test set set data
print(f"Model Classification Report : \n{classification_report(y_test, model.predict(X_test))}")


Output: 
 

Training Set Score : 94.16666666666667 %
Test Set Score : 90.0 %
Model Classification Report : 
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        11
           1       0.86      0.92      0.89        13
           2       0.80      0.67      0.73         6

    accuracy                           0.90        30
   macro avg       0.89      0.86      0.87        30
weighted avg       0.90      0.90      0.90        30

So, we have managed to achieve an accuracy of 94.17% and 90% on the training and test sets respectively. 
Conclusion: 
Now that you know what a Nearest Centroid Classifier is and how to implement it, you should try using it next time when you have some simple classification tasks that require a light-weight and simple classifier. 
 

RELATED ARTICLES

Most Popular

Recent Comments