Hierarchical Clustering

1 / 23

# Hierarchical Clustering - PowerPoint PPT Presentation

Hierarchical Clustering. Dr. Bernard Chen Assistant Professor. Outline. Hierarchical Clustering Hybrid Hierarchical Kmeans clustering DBscan. Hierarchical Clustering. Venn Diagram of Clustered Data. Dendrogram.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Hierarchical Clustering

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
1. Hierarchical Clustering Dr. Bernard Chen Assistant Professor

2. Outline • Hierarchical Clustering • Hybrid Hierarchical Kmeans clustering • DBscan

3. Hierarchical Clustering Venn Diagram of Clustered Data Dendrogram From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

4. Nearest Neighbor, Level 2, k = 1 clusters. From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

5. Nearest Neighbor, Level 3, k = 2 clusters.

6. Nearest Neighbor, Level 4, k = 3 clusters.

7. Nearest Neighbor, Level 5, k = 2 clusters.

8. Nearest Neighbor, Level 6, k = 2 clusters.

9. Nearest Neighbor, Level 7, k = 2 clusters.

10. Nearest Neighbor, Level 8, k = 1 cluster.

11. Typical Alternatives to Calculate the Distance between Clusters • Single link: smallest distance between an element in one cluster and an element in the other, i.e., dis(Ki, Kj) = min(tip, tjq) • Complete link: largest distance between an element in one cluster and an element in the other, i.e., dis(Ki, Kj) = max(tip, tjq) • Average: avg distance between an element in one cluster and an element in the other, i.e., dis(Ki, Kj) = avg(tip, tjq)

12. Functional significant gene clusters Two-way clustering Sample clusters Gene clusters

13. Outline • Hierarchical Clustering • Hybrid Hierarchical Kmeans clustering • DBscan

14. Motivation • Among clustering algorithms, Hierarchical and K-means clustering are the two most popular and classic methods. However, both have their innate disadvantages. • K-means clustering requires a specified number of clusters in advance and chooses initial centroids randomly; in other words, you don’t know how to start • Hierarchical clustering is hard to find a place to cut

15. Hybrid Hierarchical K-means Clustering (HHK) Algorithm • The brief idea is we cluster around half data through Hierarchical clustering and succeed by K-means for the remaining • In order to generate super-rules, we let Hierarchical terminate when it generates the largest number of clusters

16. Hybrid Hierarchical K-means Clustering (HHK) Algorithm