Tuesday, October 7, 2025
HomeLanguagesAnalysis of test data using K-Means Clustering in Python

Analysis of test data using K-Means Clustering in Python

This article demonstrates an illustration of K-means clustering on a sample random data using open-cv library.

Pre-requisites: Numpy, OpenCV, matplot-lib
Let’s first visualize test data with Multiple Features using matplot-lib tool.




# importing required tools
import numpy as np
from matplotlib import pyplot as plt
  
# creating two test data
X = np.random.randint(10,35,(25,2))
Y = np.random.randint(55,70,(25,2))
Z = np.vstack((X,Y))
Z = Z.reshape((50,2))
  
# convert to np.float32
Z = np.float32(Z)
  
plt.xlabel('Test Data')
plt.ylabel('Z samples')
  
plt.hist(Z,256,[0,256])
  
plt.show()


Here ‘Z’ is an array of size 100, and values ranging from 0 to 255. Now, reshaped ‘z’ to a column vector. It will be more useful when more than one features are present. Then change the data to np.float32 type.

Output:

Now, apply the k-Means clustering algorithm to the same example as in the above test data and see its behavior.
Steps Involved:
1) First we need to set a test data.
2) Define criteria and apply kmeans().
3) Now separate the data.
4) Finally Plot the data.




import numpy as np
import cv2
from matplotlib import pyplot as plt
  
X = np.random.randint(10,45,(25,2))
Y = np.random.randint(55,70,(25,2))
Z = np.vstack((X,Y))
  
# convert to np.float32
Z = np.float32(Z)
  
# define criteria and apply kmeans()
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
ret,label,center = cv2.kmeans(Z,2,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
  
# Now separate the data
A = Z[label.ravel()==0]
B = Z[label.ravel()==1]
  
# Plot the data
plt.scatter(A[:,0],A[:,1])
plt.scatter(B[:,0],B[:,1],c = 'r')
plt.scatter(center[:,0],center[:,1],s = 80,c = 'y', marker = 's')
plt.xlabel('Test Data'),plt.ylabel('Z samples')
plt.show()


Output:

This example is meant to illustrate where k-means will produce intuitively possible clusters.

Applications:
1) Identifying Cancerous Data.
2) Prediction of Students’ Academic Performance.
3) Drug Activity Prediction.

RELATED ARTICLES

Most Popular

Dominic
32340 POSTS0 COMMENTS
Milvus
86 POSTS0 COMMENTS
Nango Kala
6708 POSTS0 COMMENTS
Nicole Veronica
11872 POSTS0 COMMENTS
Nokonwaba Nkukhwana
11936 POSTS0 COMMENTS
Shaida Kate Naidoo
6829 POSTS0 COMMENTS
Ted Musemwa
7090 POSTS0 COMMENTS
Thapelo Manthata
6780 POSTS0 COMMENTS
Umr Jansen
6784 POSTS0 COMMENTS