Clustering and Visual Data Analysis

1 / 24

# Clustering and Visual Data Analysis - PowerPoint PPT Presentation

Clustering and Visual Data Analysis. Ata Kaban The University of Birmingham 2005. The Clustering Problem. Unsupervised Learning. Data (input). ‘Interesting’ structure (output). Interesting: contains essential characteristics discards unessential details

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Clustering and Visual Data Analysis' - lucine

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Clustering and Visual Data Analysis

Ata Kaban

The University of Birmingham

2005

The Clustering Problem

Unsupervised Learning

Data (input)

‘Interesting’ structure (output)

• Interesting:
• contains essential characteristics
• provides a summary the data (e.g. to visualise on the screen)
• compact
• interpretable for humans
• etc.

Objective function that expresses our notion of interestingness for this data

One reason for clustering of data
• Here is some data
• Assume you transmit the coordinates of points drawn randomly from this data set
• You are only allowed to send a small (say 2 or 3) bits per point
• So it will be a lossy transmission
• Loss = sum of squared errors between the original and the decoded coordinates
• What encoder / decoder will loose the least information?
Formalising
• What objective does K-means optimise?
• Given an encoder function ENC:RT{1..K}

(T is dimension of data, K is number of clusters)

• Given a decoder function DEC:{1..K}RT
• DISTORTION:=sum n{xn-DEC[ENC(xn)]}2

where DEC(k)=μk are centers of clusters, k=1..K

So, DISTORTION=sumn{xn-μENC(xn)}2, where n goes from 1 to N, the number of points

The minimal distortion

DISTORTION=sumn{xn-μENC(x_n)}2

This is minimised.

What properties do μ1,…μK satisfy for that?

1) each point xn must be encoded by its nearest center, otherwise DISTORTION could be reduced by replacing ENC(xn) with the nearest center of xn.

2) each μk must be the centroid of its own points

If N is the known number of points and K the desired number of clusters, the K-means algorithm is:

Begin

initialize 1, 2, …,K (randomly selected)

do classify n samples according to nearest i

recompute i

until no change in i

return 1, 2, …, K

End

Other forms of clustering
• Many times, clusters are not disjoint, but a cluster may have subclusters, in turn having sub-subclusters.
• Hierarchical clustering
Given any two samples x and x’, they will be grouped together at some level, and if they are grouped a level k, they remain grouped for all higher levels
• Hierarchical clustering  tree representation called dendrogram
The similarity values may help to determine if the grouping are natural or forced, but if they are evenly distributed no information can be gained
• Another representation is based on set, e.g., on the Venn diagrams
• Agglomerative (bottom up, clumping): start with n singleton cluster and form the sequence by merging clusters
• Divisive (top down, splitting): start with all of the samples in one cluster and form the sequence by successively splitting clusters
Agglomerative hierarchical clustering
• The procedure terminates when the specified number of cluster has been obtained, and returns the cluster as sets of points, rather than the mean or a representative vector for each cluster
The problem of the number of clusters
• Typically, the number of clusters is known.
• When it’s not, that is a hard problem called model selection. There are several ways of proceed.
• A common approach is to repeat the clustering with c=1, c=2, c=3, etc.
What did we learn today?
• Data clustering
• K-means algorithm in detail
• How K-means can get stuck and how to take care of that
• The outline of Hierarchical clustering methods

Pattern ClassificationFind out more here!Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000

Pattern ClassificationFind out more here!Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000