1 / 29

Cluster Analysis part2

Cluster Analysis part2. Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010. Outline. Hierarchical Clustering Hybrid Hierarchical Kmeans clustering DBscan. Hierarchical Clustering. Venn Diagram of Clustered Data.

Download Presentation

Cluster Analysis part2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cluster Analysispart2 Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010

  2. Outline • Hierarchical Clustering • Hybrid Hierarchical Kmeans clustering • DBscan

  3. Hierarchical Clustering Venn Diagram of Clustered Data Dendrogram From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

  4. Nearest Neighbor, Level 2, k = 7 clusters. From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

  5. Nearest Neighbor, Level 3, k = 6 clusters.

  6. Nearest Neighbor, Level 4, k = 5 clusters.

  7. Nearest Neighbor, Level 5, k = 4 clusters.

  8. Nearest Neighbor, Level 6, k = 3 clusters.

  9. Nearest Neighbor, Level 7, k = 2 clusters.

  10. Nearest Neighbor, Level 8, k = 1 cluster.

  11. Functional significant gene clusters Two-way clustering Sample clusters Gene clusters

  12. Outline • Hierarchical Clustering • Hybrid Hierarchical Kmeans clustering • DBscan

  13. Motivation • Among clustering algorithms, Hierarchical and K-means clustering are the two most popular and classic methods. However, both have their innate disadvantages. • K-means clustering requires a specified number of clusters in advance and chooses initial centroids randomly; in other words, you don’t know how to start • Hierarchical clustering is hard to find a place to cut

  14. Hybrid Hierarchical K-means Clustering (HHK) Algorithm • The brief idea is we cluster around half data through Hierarchical clustering and succeed by K-means for the remaining • In order to generate super-rules, we let Hierarchical terminate when it generates the largest number of clusters

  15. Hybrid Hierarchical K-means Clustering (HHK) Algorithm

  16. Hybrid Hierarchical K-means Clustering (HHK) Algorithm Example

  17. Hybrid Hierarchical K-means Clustering (HHK) Algorithm Example

  18. Hybrid Hierarchical K-means Clustering (HHK) Algorithm Example

  19. Hybrid Hierarchical K-means Clustering (HHK) Algorithm Example

  20. Hybrid Hierarchical K-means Clustering (HHK) Algorithm Example

  21. Hybrid Hierarchical K-means Clustering (HHK) Algorithm Example

  22. Hybrid Hierarchical K-means Clustering (HHK) Algorithm Example

  23. Outline • Hierarchical Clustering • Hybrid Hierarchical Kmeans clustering • DBscan

  24. Density-Based Clustering Methods • Clustering based on density (local cluster criterion), such as density-connected points • Major features: • Discover clusters of arbitrary shape • Handle noise • One scan • Need density parameters as termination condition

  25. DBscan • Two parameters: • Eps: Maximum radius of the neighbourhood • MinPts: Minimum number of points in an Eps-neighbourhood of that point

  26. DBscan • Directly density-reachable: A point p is directly density-reachable from a point q w.r.t. Eps, MinPts if • p belongs to NEps(q) • core point condition: |NEps (q)| >= MinPts

  27. Outlier Border Eps = 1cm MinPts = 5 Core DBSCAN: Density Based Spatial Clustering of Applications with Noise • Relies on a density-based notion of cluster: A cluster is defined as a maximal set of density-connected points • Discovers clusters of arbitrary shape in spatial databases with noise

  28. DBscan • Arbitrary select a point p • Retrieve all points density-reachable from p w.r.t. Eps and MinPts. • If p is a core point, a cluster is formed. • If p is a border point, no points are density-reachable from p and DBSCAN visits the next point of the database. • Continue the process until all of the points have been processed.

  29. DBSCAN: Sensitive to Parameters

More Related