In this article, we shall look at different approaches to evaluate Clustering Algorithms using Scikit Learn Python Machine Learning Library. Clustering is an Unsupervised Machine Learning algorithm that deals with grouping the dataset to its similar kind data point. Clustering is widely used for Segmentation, Pattern Finding, Search engine, and so on.
Let’s consider an example to perform Clustering on a dataset and look at different performance evaluation metrics to evaluate the model.
Python3
from sklearn.datasets import make_blobs import matplotlib.pyplot as plt feature, target = make_blobs(n_samples = 500 , centers = 5 , random_state = 42 , shuffle = False ) plt.scatter(feature[:, 0 ], feature[:, 1 ]) |
Output:
In this dataset, we shall use KMeans Clustering Algorithm which is a density-based Clustering Algorithm.
Python3
from sklearn.cluster import KMeans model = KMeans(n_clusters = 4 ) model.fit(feature) plt.scatter(feature[:, 0 ], feature[:, 1 ], color = "r" ) plt.scatter(model.cluster_centers_[ 1 ], model.cluster_centers_[ 3 ], color = "k" , marker = "*" ) plt.scatter(model.cluster_centers_[ 2 ], model.cluster_centers_[ 0 ], color = "k" , marker = "*" ) |
Output:
Performance Evaluation Metrics
Once we build a model, we usually do some predictions. But how do we verify the results? and on what basis do we come to the conclusion? That is when Evaluation Metrics come into the picture. Evaluation Metrics are the critical step in Machine Learning implementation. These are mainly used to evaluate the performance of the model on the inference data or testing data in comparison to actual data.
Now let us see some common Clustering Performance Evaluations in Scikit Learn.
5 Commonly used Clustering Performance Evaluation Metrics
Adjusted Rand Index
The adjusted rand index is an evaluation metric that is used to measure the similarity between two clustering by considering all the pairs of the n_samples and calculating the counting pairs of the assigned in the same or different clusters in the actual and predicted clustering.
The adjusted rand index score is defined as:
ARI = (RI - Expected_RI) / (max(RI) - Expected_RI)
Python3
from sklearn.metrics import adjusted_rand_score ari = adjusted_rand_score(target, model.labels_) print (ari) |
Output:
0.7812362998684788
A score above 0.7 is considered to be a good match.
Rand Index
The Rand index is different from the adjusted rand index. Rand index does find the similarity between two clustering by considering all the pairs of the n_sample but it ranges from 0 to 1. whereas ARI ranges from -1 to 1.
The rand index is defined as:
RI = (number of agreeing pairs) / (number of pairs)
Python3
from sklearn.metrics import rand_score ris = rand_score(target, model.labels_) print (ris) |
Output:
0.9198396793587175
Silhouette Score aka Silhouette Coefficient
Silhouette score aka Silhouette Coefficient is an evaluation metric that results in the range of -1 to 1. A score near 1 signifies the best importance that the data point is very compact within the cluster to which it belongs and far away from the other clusters. The score near -1 signifies the least or worst importance of the data point. A score near 0 signifies overlapping clusters.
Python3
from sklearn.metrics import silhouette_score ss = silhouette_score(feature, model.labels_) print (ss) |
Output:
0.7328381899726921
4. Davies-Bouldin Index
Davies-Bouldin Index score is defined as the average similarity measure of each cluster with its most similar cluster, where similarity is the ratio of within-cluster distances to between-cluster distances. Thus, clusters that are farther apart and less dispersed will result in a better score. The minimum score is 0, with lower values indicating better clustering.
Python3
from sklearn.metrics import davies_bouldin_score dbs = davies_bouldin_score(feature, model.labels_) print (dbs) |
Output:
0.3389800864889033
Mutual Information
Mutual Information between two clusters is a measure of the similarity between two labels of the same data. That is it is used to check the mutual information in the actual label target vs the predicted model label.
Python3
from sklearn.metrics import mutual_info_score mis = mutual_info_score(target, model.labels_) print (mis) |
Output:
1.3321790402101235