1 / 42

Kunstmatige Intelligentie / RuG

KI2 - 7. Clustering Algorithms. Johan Everts. Kunstmatige Intelligentie / RuG. What is Clustering? .

Download Presentation

Kunstmatige Intelligentie / RuG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KI2 - 7 Clustering Algorithms Johan Everts Kunstmatige Intelligentie / RuG

  2. What is Clustering? Find K clusters (or a classification that consists of K clusters) so that the objects of one cluster are similar to each other whereas objects of different clusters are dissimilar. (Bacher 1996)

  3. The Goals of Clustering • Determine the intrinsic grouping in a set of unlabeled data. • What constitutes a good clustering? • All clustering algorithms will produce clusters, regardless of whether the data contains them • There is no golden standard, depends on goal: • data reduction • “natural clusters” • “useful” clusters • outlier detection

  4. Stages in clustering

  5. Taxonomy of Clustering Approaches

  6. Hierarchical Clustering Agglomerative clustering treats each data point as a singleton cluster, and then successively merges clusters until all points have been merged into a single remaining cluster. Divisive clustering works the other way around.

  7. Agglomerative Clustering Single link In single-link hierarchical clustering, we merge in each step the two clusters whose two closest members have the smallest distance.

  8. Agglomerative Clustering Complete link In complete-link hierarchical clustering, we merge in each step the two clusters whose merger has the smallest diameter.

  9. Example – Single Link AC

  10. Example – Single Link AC

  11. Example – Single Link AC

  12. Example – Single Link AC

  13. Example – Single Link AC

  14. Example – Single Link AC

  15. Example – Single Link AC

  16. Example – Single Link AC

  17. Example – Single Link AC

  18. Example – Single Link AC

  19. Example – Single Link AC

  20. Taxonomy of Clustering Approaches

  21. Square error

  22. K-Means • Step 0: Start with a random partition into K clusters • Step 1: Generate a new partition by assigning each pattern to its closest cluster center • Step 2: Compute new cluster centers as the centroids of the clusters. • Step 3: Steps 1 and 2 are repeated until there is no change in the membership (also cluster centers remain the same)

  23. K-Means

  24. K-Means – How many K’s ?

  25. K-Means – How many K’s ?

  26. Locating the ‘knee’ The knee of a curve is defined as the point of maximum curvature.

  27. Leader - Follower • Online • Specify threshold distance • Find the closest cluster center • Distance above threshold ? Create new cluster • Or else, add instance to cluster

  28. Leader - Follower • Find the closest cluster center • Distance above threshold ? Create new cluster • Or else, add instance to cluster

  29. Leader - Follower • Find the closest cluster center • Distance above threshold ? Create new cluster • Or else, add instance to cluster and update cluster center Distance < Threshold

  30. Leader - Follower • Find the closest cluster center • Distance above threshold ? Create new cluster • Or else, add instance to cluster and update cluster center

  31. Leader - Follower • Find the closest cluster center • Distance above threshold ? Create new cluster • Or else, add instance to cluster and update cluster center Distance > Threshold

  32. Kohonen SOM’s The Self-Organizing Map (SOM) is an unsupervised artificial neural network algorithm. It is a compromise between biological modeling and statistical data processing

  33. Kohonen SOM’s • Each weight is representative of a certain input. • Input patterns are shown to all neurons simultaneously. • Competitive learning: the neuron with the largest response is chosen.

  34. Kohonen SOM’s • Initialize weights • Repeat until convergence • Select next input pattern • Find Best Matching Unit • Update weights of winner and neighbours • Decrease learning rate & neighbourhood size Learning rate & neighbourhood size

  35. Kohonen SOM’s Distance related learning

  36. Kohonen SOM’s

  37. Some nice illustrations

  38. Kohonen SOM’s • Kohonen SOM Demo (from ai-junkie.com): mapping a 3D colorspace on a 2D Kohonen map

  39. Performance Analysis • K-Means • Depends a lot on a priori knowledge (K) • Very Stable • Leader Follower • Depends a lot on a priori knowledge (Threshold) • Faster but unstable

  40. Performance Analysis • Self Organizing Map • Stability and Convergence Assured • Principle of self-ordering • Slow and many iterations needed for convergence • Computationally intensive

  41. Conclusion • No Free Lunch theorema • Any elevated performance over one class, is exactly paid for in performance over another class • Ensemble clustering ? • Use SOM and Basic Leader Follower to identify clusters and then use k-mean clustering to refine.

  42. Any Questions ? ?

More Related