- 152 Views
- Uploaded on
- Presentation posted in: General

KI2 - 7. Clustering Algorithms. Johan Everts. Kunstmatige Intelligentie / RuG. What is Clustering? .

Kunstmatige Intelligentie / RuG

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

KI2 - 7

Clustering Algorithms

Johan Everts

Kunstmatige Intelligentie / RuG

Find K clusters (or a classification that consists of K clusters) so that the objects of one cluster are similar to each other whereas objects of different clusters are dissimilar. (Bacher 1996)

- Determine the intrinsic grouping in a set of unlabeled data.
- What constitutes a good clustering?
- All clustering algorithms will produce clusters,
regardless of whether the data contains them

- There is no golden standard, depends on goal:
- data reduction
- “natural clusters”
- “useful” clusters
- outlier detection

Agglomerative clustering treats each data point as a singleton cluster, and then successively merges clusters until all points have been merged into a single remaining cluster. Divisive clustering works the other way around.

Single link

In single-link hierarchical clustering, we merge in each step the two clusters whose two closest members have the smallest distance.

Complete link

In complete-link hierarchical clustering, we merge in each step the two clusters whose merger has the smallest diameter.

- Step 0: Start with a random partition into K clusters
- Step 1: Generate a new partition by assigning each pattern to its closest cluster center
- Step 2: Compute new cluster centers as the centroids of the clusters.
- Step 3: Steps 1 and 2 are repeated until there is no change in the membership (also cluster centers remain the same)

The knee of a curve is defined as the point of maximum curvature.

- Online
- Specify threshold distance
- Find the closest cluster center
- Distance above threshold ? Create new cluster
- Or else, add instance to cluster

- Find the closest cluster center
- Distance above threshold ? Create new cluster
- Or else, add instance to cluster

- Find the closest cluster center
- Distance above threshold ? Create new cluster
- Or else, add instance to cluster and update cluster center

Distance < Threshold

- Find the closest cluster center
- Distance above threshold ? Create new cluster
- Or else, add instance to cluster and update cluster center

- Find the closest cluster center
- Distance above threshold ? Create new cluster
- Or else, add instance to cluster and update cluster center

Distance > Threshold

The Self-Organizing Map (SOM) is an unsupervised artificial neural network algorithm. It is a compromise between biological modeling and statistical data processing

- Each weight is representative of a certain input.
- Input patterns are shown to all neurons simultaneously.
- Competitive learning: the neuron with the largest response is chosen.

- Initialize weights
- Repeat until convergence
- Select next input pattern
- Find Best Matching Unit
- Update weights of winner and neighbours
- Decrease learning rate & neighbourhood size

Learning rate & neighbourhood size

Distance related learning

- Kohonen SOM Demo (from ai-junkie.com):
mapping a 3D colorspace on a 2D Kohonen map

- K-Means
- Depends a lot on a priori knowledge (K)
- Very Stable

- Leader Follower
- Depends a lot on a priori knowledge (Threshold)
- Faster but unstable

- Self Organizing Map
- Stability and Convergence Assured
- Principle of self-ordering

- Slow and many iterations needed for convergence
- Computationally intensive

- Stability and Convergence Assured

- No Free Lunch theorema
- Any elevated performance over one class, is exactly paid for in performance over another class

- Ensemble clustering ?
- Use SOM and Basic Leader Follower to identify clusters and then use k-mean clustering to refine.

?