4. Ad-hoc I: Hierarchical clustering
This presentation is the property of its rightful owner.
Sponsored Links
1 / 15

4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on
  • Presentation posted in: General

4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat Flat methods generate a single partition into k clusters. The number k of clusters has to be determined by the user ahead of time. Hierarchical methods generate a hierarchy of partitions, i.e.

Download Presentation

4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


4 ad hoc i hierarchical clustering hierarchical versus flat

  • 4. Ad-hoc I: Hierarchical clustering

  • Hierarchical versus Flat

  • Flat methods generate a single partition into k clusters. The number k of clusters has to be determined by the user ahead of time.

  • Hierarchical methods generate a hierarchy of partitions, i.e.

  • a partition P1 into 1 clusters (the entire collection)

  • a partition P2 into 2 clusters

  • a partition Pn into n clusters (each object forms its own cluster)

  • It is then up to the user to decide which of the partitions reflects actual sub-populations in the data.


4 ad hoc i hierarchical clustering hierarchical versus flat

P4

P3

P2

P1

Note: A sequence of partitions is called "hierarchical" if each cluster in a given partition is the union of clusters in the next larger partition.

Top: hierarchical sequence of partitionsBottom: non hierarchical sequence


4 ad hoc i hierarchical clustering hierarchical versus flat

  • Hierarchical methods again come in two varieties, agglomerative and divisive.

  • Agglomerative methods:

  • Start with partition Pn, where each object forms its own cluster.

  • Merge the two closest clusters, obtaining Pn-1.

  • Repeat merge until only one cluster is left.

  • Divisive methods

  • Start with P1.

  • Split the collection into two clusters that are as homogenous (and as different from each other) as possible.

  • Apply splitting procedure recursively to the clusters.


4 ad hoc i hierarchical clustering hierarchical versus flat

Note:

Agglomerative methods require a rule to decide which clusters to merge. Typically one defines a distance between clusters and then merges the two clusters that are closest.

Divisive methods require a rule for splitting a cluster.


4 ad hoc i hierarchical clustering hierarchical versus flat

4.1 Hierarchical agglomerative clustering

Need to define a distance d(P,Q) between groups, given a distance measure d(x,y) between observations.

Commonly used distance measures:

1. d1(P,Q) = min d(x,y), for x in P, y in Q ( single linkage )

2. d2(P,Q) = ave d(x,y), for x in P, y in Q ( average linkage )

3. d3(P,Q) = max d(x,y), for x in P, y in Q ( complete linkage )

4. ( centroid method )

5. ( Ward’s method )

d5 is called Ward’s distance.


4 ad hoc i hierarchical clustering hierarchical versus flat

  • Motivation for Ward’s distance:

  • Let Pk = P1 ,…, Pk be a partition of the observations into k groups.

  • Measure goodness of a partition by the sum of squared distances of observations from their cluster means:

  • Consider all possible (k-1)-partitions obtainable from Pk by a merge

  • Merging two clusters with smallest Ward’s distance optimizes goodness of new partition.


4 ad hoc i hierarchical clustering hierarchical versus flat

  • 4.2 Hierarchical divisive clustering

  • There are divisive versions of single linkage, average linkage, and Ward’s method.

  • Divisive version of single linkage:

  • Compute minimal spanning tree (graph connecting all the objects with smallest total edge length.

  • Break longest edge to obtain 2 subtrees, and a corresponding partition of the objects.

  • Apply process recursively to the subtrees.

  • Agglomerative and divisive versions of single linkage give identical results (more later).


4 ad hoc i hierarchical clustering hierarchical versus flat

Divisive version of Ward’s method.

Given cluster R.

Need to find split of R into 2 groups P,Q to minimize

or, equivalently, to maximize Ward’s distance between P and Q.

Note: No computationally feasible method to find optimal P, Q for large |R|. Have to use approximation.


4 ad hoc i hierarchical clustering hierarchical versus flat

  • Iterative algorithm to search for the optimal Ward’s split

  • Project observations in R on largest principal component.

  • Split at median to obtain initial clusters P, Q.

  • Repeat {

  • Assign each observation to cluster with closest mean

  • Re-compute cluster means

  • } Until convergence

  • Note:

  • Each step reduces RSS(P, Q)

  • No guarantee to find optimal partition.


4 ad hoc i hierarchical clustering hierarchical versus flat

Divisive version of average linkage

Algorithm Diana, Struyf, Hubert, and Rousseuw, pp. 22


4 ad hoc i hierarchical clustering hierarchical versus flat

  • 4.3 Dendograms

  • Result of hierarchical clustering can be represented as binary tree:

  • Root of tree represents entire collection

  • Terminal nodes represent observations

  • Each interior node represents a cluster

  • Each subtree represents a partition

  • Note:The tree defines many more partitions than the n-2 nontrivial ones constructed during the merge (or split) process.

  • Note: For HAC methods, the merge order defines a sequence of n subtrees of the full tree. For HDC methods a sequence of subtrees can be defined if there is a figure of merit for each split.


4 ad hoc i hierarchical clustering hierarchical versus flat

If distance between daughter clusters is monotonically increasing as we move up the tree, we can draw dendogram:

y-coordinate of vertex = distance between daughter clusters.

Point set and corresponding single linkage dendogram


4 ad hoc i hierarchical clustering hierarchical versus flat

  • Standard method to extract clusters from a dendogram:

  • Pick number of clusters k.

  • Cut dendogram at a level that results in k subtrees.


4 ad hoc i hierarchical clustering hierarchical versus flat

  • 4.4 Experiment

  • Try hierarchical method on unimodal 2D datasets.

  • Experiments suggest:

  • Except in completely clear-cut situations, tree cutting (“cutree”) is useless for extracting clusters from a dendogram.

  • Complete linkage fails completely for elongated clusters.


4 ad hoc i hierarchical clustering hierarchical versus flat

  • Needed:

  • Diagnostics to decide whether the daughters of a dendogramnode really correspond to spatially separated clusters.

  • Automatic and manual methods for dendogram pruning.

  • Methods for assigning observations in pruned subtrees to clusters.


  • Login