Kunstmatige Intelligentie / RuG

1 / 42

# Kunstmatige Intelligentie / RuG - PowerPoint PPT Presentation

KI2 - 7. Clustering Algorithms. Johan Everts. Kunstmatige Intelligentie / RuG. What is Clustering? .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Kunstmatige Intelligentie / RuG' - daniel_millan

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

KI2 - 7

Clustering Algorithms

Johan Everts

Kunstmatige Intelligentie / RuG

What is Clustering?

Find K clusters (or a classification that consists of K clusters) so that the objects of one cluster are similar to each other whereas objects of different clusters are dissimilar. (Bacher 1996)

The Goals of Clustering
• Determine the intrinsic grouping in a set of unlabeled data.
• What constitutes a good clustering?
• All clustering algorithms will produce clusters,

regardless of whether the data contains them

• There is no golden standard, depends on goal:
• data reduction
• “natural clusters”
• “useful” clusters
• outlier detection
Hierarchical Clustering

Agglomerative clustering treats each data point as a singleton cluster, and then successively merges clusters until all points have been merged into a single remaining cluster. Divisive clustering works the other way around.

Agglomerative Clustering

In single-link hierarchical clustering, we merge in each step the two clusters whose two closest members have the smallest distance.

Agglomerative Clustering

In complete-link hierarchical clustering, we merge in each step the two clusters whose merger has the smallest diameter.

K-Means
• Step 0: Start with a random partition into K clusters
• Step 1: Generate a new partition by assigning each pattern to its closest cluster center
• Step 2: Compute new cluster centers as the centroids of the clusters.
• Step 3: Steps 1 and 2 are repeated until there is no change in the membership (also cluster centers remain the same)
Locating the ‘knee’

The knee of a curve is defined as the point of maximum curvature.

• Online
• Specify threshold distance
• Find the closest cluster center
• Distance above threshold ? Create new cluster
• Or else, add instance to cluster
• Find the closest cluster center
• Distance above threshold ? Create new cluster
• Or else, add instance to cluster
• Find the closest cluster center
• Distance above threshold ? Create new cluster
• Or else, add instance to cluster and update cluster center

Distance < Threshold

• Find the closest cluster center
• Distance above threshold ? Create new cluster
• Or else, add instance to cluster and update cluster center
• Find the closest cluster center
• Distance above threshold ? Create new cluster
• Or else, add instance to cluster and update cluster center

Distance > Threshold

Kohonen SOM’s

The Self-Organizing Map (SOM) is an unsupervised artificial neural network algorithm. It is a compromise between biological modeling and statistical data processing

Kohonen SOM’s
• Each weight is representative of a certain input.
• Input patterns are shown to all neurons simultaneously.
• Competitive learning: the neuron with the largest response is chosen.
Kohonen SOM’s
• Initialize weights
• Repeat until convergence
• Select next input pattern
• Find Best Matching Unit
• Update weights of winner and neighbours
• Decrease learning rate & neighbourhood size

Learning rate & neighbourhood size

Kohonen SOM’s

Distance related learning

Kohonen SOM’s
• Kohonen SOM Demo (from ai-junkie.com):

mapping a 3D colorspace on a 2D Kohonen map

Performance Analysis
• K-Means
• Depends a lot on a priori knowledge (K)
• Very Stable