250 likes | 371 Views
This paper introduces a novel clustering algorithm utilizing a new similarity measurement called "cohesion" to enhance clustering performance while effectively managing outliers. The algorithm, referred to as CSM (Cohesion-Based Self-Merging), combines hierarchical and partitional clustering techniques to yield robust results with reduced execution time. Through comprehensive performance studies, the algorithm demonstrates its capacity to resist outliers and achieve clustering outcomes comparable to traditional methods like CURE. The findings underscore its potential in various applications requiring efficient data grouping.
E N D
A Robust and Efficient Clustering Algorithm based on Cohesion Self-Merging Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Cheng-Ru Lin Ming-Syan Chen
Outline • Motivation • Objective • Introduction • Preliminaries • Cohesion-Base Self-Merging Algorithm • Performance Studies • Conclusion • Personal opinion
Motivation • The dissimilarity measured between two clusters are vulnerable to outliers, and removing the outliers precisely is yet another difficult task.
Objective • We propose a new similarity measurement, referred to as “cohesion”, to measure the inter-cluster distances.
Introduction • Hierarchical Clustering algorithms. • Good clustering quality. • Partitional clustering algorithms. • Good execution time and space requirement. • Hybrid clustering algorithms. • combin the features of partitional and hierarchical clustering methods
Preliminaries • Hierarchical Clustering Algorithms. • Hierarchical Clustering Algorithm. • Single-link and Complete-link. • Algorithm CURE.
Preliminaries • Partitional Clustering Algorithms. • The K-means algorithm. • Algorithm CLARA and CLARANS.
Preliminaries • Hybrid Clustering Algorithms. • Phase1:Partition. • Phase2:Merge. • Algorithm BIRCH.
Cohesion-Based Self-Merging Algorithm • We propose a new similarity measurement, namely cohesion, based on the joinability of a data point to another cluster.
Cohesion-Based Self-Merging Algorithm • Definition 1: • Given a cluster Cl consisting of n data points, p1,p2,…,pn, the radius r of Cl is defined as
Cohesion-Based Self-Merging Algorithm • Definition 2: • Given a data point of a cluster and another cluster , the joinability of to is defined as
Cohesion-Based Self-Merging Algorithm • Definition 3: • The cohesion of two clusters and is defined as
Cohesion-Based Self-Merging Algorithm • Algorithm CSM • Input: • The input data set, n. • The number of subclusters, m. • The desired number of clusters, k. • Output: • The hierarchical structure of the k clusters.
Performance Studies • Experiment 1:Clustering Quality of Algorithm CSM.
Performance Studies • Experiment 2:Efficiency of Algorithm CSM.
Conclusion • Algorithm CSM is able to not only resist outliers but also lead to similar clustering results as algorithm CURE while incurring a much shorter execution time complexity.
Personal Opinion • This paper has some examples can help us understand. • Cohesion : a good method to resist outliers. • Weakness : the number of subclusters, m? the desired number of clusters, k?