1 / 15

# 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat - PowerPoint PPT Presentation

4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat Flat methods generate a single partition into k clusters. The number k of clusters has to be determined by the user ahead of time. Hierarchical methods generate a hierarchy of partitions, i.e.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat' - porter-christensen

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

• 4. Ad-hoc I: Hierarchical clustering

• Hierarchical versus Flat

• Flat methods generate a single partition into k clusters. The number k of clusters has to be determined by the user ahead of time.

• Hierarchical methods generate a hierarchy of partitions, i.e.

• a partition P1 into 1 clusters (the entire collection)

• a partition P2 into 2 clusters

• a partition Pn into n clusters (each object forms its own cluster)

• It is then up to the user to decide which of the partitions reflects actual sub-populations in the data.

P3

P2

P1

Note: A sequence of partitions is called "hierarchical" if each cluster in a given partition is the union of clusters in the next larger partition.

Top: hierarchical sequence of partitionsBottom: non hierarchical sequence

• Hierarchical methods again come in two varieties, agglomerative and divisive.

• Agglomerative methods:

• Start with partition Pn, where each object forms its own cluster.

• Merge the two closest clusters, obtaining Pn-1.

• Repeat merge until only one cluster is left.

• Divisive methods

• Split the collection into two clusters that are as homogenous (and as different from each other) as possible.

• Apply splitting procedure recursively to the clusters.

Note:

Agglomerative methods require a rule to decide which clusters to merge. Typically one defines a distance between clusters and then merges the two clusters that are closest.

Divisive methods require a rule for splitting a cluster.

Need to define a distance d(P,Q) between groups, given a distance measure d(x,y) between observations.

Commonly used distance measures:

1. d1(P,Q) = min d(x,y), for x in P, y in Q ( single linkage )

2. d2(P,Q) = ave d(x,y), for x in P, y in Q ( average linkage )

3. d3(P,Q) = max d(x,y), for x in P, y in Q ( complete linkage )

4. ( centroid method )

5. ( Ward’s method )

d5 is called Ward’s distance.

• Motivation for Ward’s distance:

• Let Pk = P1 ,…, Pk be a partition of the observations into k groups.

• Measure goodness of a partition by the sum of squared distances of observations from their cluster means:

• Consider all possible (k-1)-partitions obtainable from Pk by a merge

• Merging two clusters with smallest Ward’s distance optimizes goodness of new partition.

• 4.2 Hierarchical divisive clustering

• There are divisive versions of single linkage, average linkage, and Ward’s method.

• Divisive version of single linkage:

• Compute minimal spanning tree (graph connecting all the objects with smallest total edge length.

• Break longest edge to obtain 2 subtrees, and a corresponding partition of the objects.

• Apply process recursively to the subtrees.

• Agglomerative and divisive versions of single linkage give identical results (more later).

Given cluster R.

Need to find split of R into 2 groups P,Q to minimize

or, equivalently, to maximize Ward’s distance between P and Q.

Note: No computationally feasible method to find optimal P, Q for large |R|. Have to use approximation.

• Iterative algorithm to search for the optimal Ward’s split

• Project observations in R on largest principal component.

• Split at median to obtain initial clusters P, Q.

• Repeat {

• Assign each observation to cluster with closest mean

• Re-compute cluster means

• } Until convergence

• Note:

• Each step reduces RSS(P, Q)

• No guarantee to find optimal partition.

Algorithm Diana, Struyf, Hubert, and Rousseuw, pp. 22

• 4.3 Dendograms

• Result of hierarchical clustering can be represented as binary tree:

• Root of tree represents entire collection

• Terminal nodes represent observations

• Each interior node represents a cluster

• Each subtree represents a partition

• Note:The tree defines many more partitions than the n-2 nontrivial ones constructed during the merge (or split) process.

• Note: For HAC methods, the merge order defines a sequence of n subtrees of the full tree. For HDC methods a sequence of subtrees can be defined if there is a figure of merit for each split.

If distance between daughter clusters is monotonically increasing as we move up the tree, we can draw dendogram:

y-coordinate of vertex = distance between daughter clusters.

Point set and corresponding single linkage dendogram

• 4.4 Experiment increasing as we move up the tree, we can draw dendogram:

• Try hierarchical method on unimodal 2D datasets.

• Experiments suggest:

• Except in completely clear-cut situations, tree cutting (“cutree”) is useless for extracting clusters from a dendogram.

• Complete linkage fails completely for elongated clusters.

• Needed: increasing as we move up the tree, we can draw dendogram:

• Diagnostics to decide whether the daughters of a dendogramnode really correspond to spatially separated clusters.

• Automatic and manual methods for dendogram pruning.

• Methods for assigning observations in pruned subtrees to clusters.