Hierarchical Clustering

Hierarchical Clustering Dr. Bernard Chen Assistant Professor

Outline • Hierarchical Clustering • Hybrid Hierarchical Kmeans clustering • DBscan

Hierarchical Clustering Venn Diagram of Clustered Data Dendrogram From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Nearest Neighbor, Level 2, k = 1 clusters. From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Nearest Neighbor, Level 3, k = 2 clusters.

Nearest Neighbor, Level 8, k = 1 cluster.

Typical Alternatives to Calculate the Distance between Clusters • Single link: smallest distance between an element in one cluster and an element in the other, i.e., dis(Ki, Kj) = min(tip, tjq) • Complete link: largest distance between an element in one cluster and an element in the other, i.e., dis(Ki, Kj) = max(tip, tjq) • Average: avg distance between an element in one cluster and an element in the other, i.e., dis(Ki, Kj) = avg(tip, tjq)

Functional significant gene clusters Two-way clustering Sample clusters Gene clusters

Outline • Hierarchical Clustering • Hybrid Hierarchical Kmeans clustering • DBscan

Motivation • Among clustering algorithms, Hierarchical and K-means clustering are the two most popular and classic methods. However, both have their innate disadvantages. • K-means clustering requires a specified number of clusters in advance and chooses initial centroids randomly; in other words, you don’t know how to start • Hierarchical clustering is hard to find a place to cut

Hybrid Hierarchical K-means Clustering (HHK) Algorithm • The brief idea is we cluster around half data through Hierarchical clustering and succeed by K-means for the remaining • In order to generate super-rules, we let Hierarchical terminate when it generates the largest number of clusters

Hybrid Hierarchical K-means Clustering (HHK) Algorithm

Hybrid Hierarchical K-means Clustering (HHK) Algorithm Example

Hierarchical Clustering