Data Modelling & AI Data Structure & Algorithm

Consensus Clustering

30 July 2024

6

Clustering:

Before learning Consensus Clustering, we must know what Clustering is. In Machine Learning, Clustering is a technique used for grouping different objects in separated clusters according to their similarity, i.e. similar objects will be in the same clusters, separated from other clusters of similar objects. It is an Unsupervised learning method. Few frequently used Clustering algorithms are K-means, K-prototype, DBSCAN etc.

Clustering

Consensus Clustering:

There are few drawbacks of the normal clustering process. Algorithms like K-means or K-prototype etc use a random initialization procedure which results in different cluster results or cluster initialization in each iteration of the algorithm. There is also a need to initialize the value of K, which is generally chosen by the Elbow method. So, the clustering process is very dependent on these metrics, hence, it produces biased clusters which are also very unstable. To eliminate these drawbacks, we follow a different clustering approach which is Consensus Clustering.

The word ‘Consensus’ comes from a Latin word, which means ‘General agreement’. Consensus Clustering is a technique of combining multiple clusters into a more stable single cluster which is better than the input clusters. This way, all the clusters are merged into a stable single cluster and this process is done iteratively by generating a Consensus Matrix at each level.

Advantages of Consensus Clustering:

Better quality and robustness of the clusters.
Producing the correct number of clusters.
Better handling of missing data.
Individual partitions can be obtained independently.

Consensus Clustering Process

Process of Consensus Clustering:

The Consensus Clustering is based on two phases-

Partition Generation: In this stage, different partitions of data objects are created using different subsets of data attributes, applying different clustering algorithms with different bias, taking different parameters for clustering and using a different random subsample of the whole dataset. Once we generate the initial partition, we move forward towards generating consensus among the partitions and further generating the new partitions based on the previous consensus.
Consensus Generation: The consensus among the data partitions is generated using the Consensus Function, which is obtained generally in these approaches –
- Median Partitioning based approach: Here the data points of different partitions are grouped together by their similarity index. We form new partitions based on the medians of the data points of previous partitions. The Similarity index depends on the agreement & disagreement of the data points, which is measured by F-measures, Rand index etc.
- Co-occurrence based approach: In this approach, there are 3 methods we can use: 1. Relabeling/Voting based method, 2. Co-association matrix-based method, 3. Graph-based method. Relabeling/Voting based method generates the new clusters by determining the correspondence with the current consensus. Each instance gains a certain vote from their cluster assignments and updates the consensus and the cluster assignments accordingly. The Co-association matrix-based method generates the new clusters based on the co-association matrix by the similarity of data points and the Graph-based method generates a weighted graph to represent multiple clusters and finds the optimal partitions by minimizing the graph cut.

Workflow of Consensus Clustering

There are many different Consensus Clustering algorithms based on different approaches of generating consensus function and there are many research works still going on improving the existing models.

Recommended

Solve DSA problems on GfG Practice.

Solve Problems

Feeling lost in the world of random DSA topics, wasting time without progress? It’s time for a change! Join our DSA course, where we’ll guide you on an exciting journey to master DSA efficiently and on schedule.
Ready to dive in? Explore our Free Demo Content and join our DSA course, trusted by over 100,000 neveropen!

Consensus Clustering

Clustering:

Consensus Clustering:

Advantages of Consensus Clustering:

Process of Consensus Clustering:

Run Local AWS Cloud Stack using LocalStack on Linux

Learn Terraform Automation in 3 days using Video Courses

How To Expose Ansible AWX Service using Nginx Ingress

LEAVE A REPLY Cancel reply

Most Popular

5 Best VPNs for Brunei in 2025: Surf & Stream Privately by Raven Wu

NordVPN vs. Mullvad VPN 2025: Which VPN Is Better? by Gjurgjica Panova

Surfshark vs. Atlas VPN 2025: Which VPN Is Better? by Gjurgjica Panova

PureVPN vs. Private Internet Access 2025: Which Is Better? by Gjurgjica Panova

Recent Comments

EDITOR PICKS

5 Best VPNs for Brunei in 2025: Surf & Stream Privately by Raven Wu

NordVPN vs. Mullvad VPN 2025: Which VPN Is Better? by Gjurgjica Panova

Surfshark vs. Atlas VPN 2025: Which VPN Is Better? by Gjurgjica Panova

POPULAR POSTS

5 Best VPNs for Brunei in 2025: Surf & Stream Privately by Raven Wu

NordVPN vs. Mullvad VPN 2025: Which VPN Is Better? by Gjurgjica Panova

Surfshark vs. Atlas VPN 2025: Which VPN Is Better? by Gjurgjica Panova

POPULAR CATEGORY

ABOUT US

FOLLOW US