Lecture 21 clustering 2
Download
1 / 9

Lecture 21 Clustering (2) - PowerPoint PPT Presentation


  • 143 Views
  • Uploaded on

Lecture 21 Clustering (2). Outline. Similarity (Distance) Measures Distortion Criteria Scattering Criterion Hierarchical Clustering and other clustering methods. Distance Measure. Distance Measure – What does it mean “Similar"? Norm: Mahalanobis distance:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Lecture 21 Clustering (2) ' - jamese


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Lecture 21 clustering 2

Lecture 21Clustering (2)


Outline
Outline

  • Similarity (Distance) Measures

  • Distortion Criteria

    Scattering Criterion

  • Hierarchical Clustering and other clustering methods

(C) 2001-2003 by Yu Hen Hu


Distance measure
Distance Measure

  • Distance Measure – What does it mean “Similar"?

    • Norm:

    • Mahalanobis distance:

      d(x,y) = |x – y|TSxy1|x – y|

    • Angle: d(x,y) = xTy/(|x|•|y|)

      Binary and symbolic features (x, y contains 0, 1 only):

    • Tanimoto coefficient:

(C) 2001-2003 by Yu Hen Hu


Clustering criteria
Clustering Criteria

  • Is the current clustering assignment good enough? Most popular one is the mean-square error distortion measure

  • Other distortion measures can also be used:

(C) 2001-2003 by Yu Hen Hu


Scatter matrics

Scatter matrices are defined in the context of analysis of variance in statistics.

They are used in linear discriminant analysis.

However, they can also be used to gauge the fitness of a particular clustering assignment.

Mean vector for i-th cluster:

Total mean vector

Scatter matrix for i-th cluster:

Within-cluster scatter matrix

Between-cluster scatter matrix

Scatter Matrics

(C) 2001-2003 by Yu Hen Hu


Scattering criteria

Total scatter matrix: variance in statistics.

Note that the total scatter matrix is independent of the assignment I(xk,i). But …

SW and SB both depend on I(xk,i)!

Desired clustering property

SW small

SB large

How to gauge Sw is small or SB is large?

There are several ways.

Tr. Sw (trace of SW): Let

be the eigenvalue decomposition of SW, then

Scattering Criteria

(C) 2001-2003 by Yu Hen Hu


Cluster separating measure csm

Similar to scattering criteria. variance in statistics.

csm = (mi-mj)/(i+j)

The larger its value, the more separable the two clusters.

Assume underlying data distribution is Gaussian.

Cluster Separating Measure (CSM)

(C) 2001-2003 by Yu Hen Hu


Hierarchical clustering
Hierarchical Clustering variance in statistics.

  • Merge Method:

    Initially, each xk is a cluster. During each iteration, nearest pair of distinct clusters are merged until the number of clusters is reduced to 1.

  • How to measure distance between two clusters:

    dmin(C(i), C(j)) = min. d(x,y); x  C(i), y  C(j)

     leads to minimum spanning tree

    dmax(C(i), C(j)) = max. d(x,y); x  C(i), y  C(j)

    davg(C(i), C(j)) =

    dmean(C(i), C(j)) = mi– mj

(C) 2001-2003 by Yu Hen Hu


Hierarchical clustering ii
Hierarchical Clustering (II) variance in statistics.

Split method:

  • Initially, only one cluster. Iteratively, a cluster is splited into two or more clusters, until the total number of clusters reaches a predefined goal.

  • The scattering criterion can be used to decide how to split a given cluster into two or more clusters.

  • Another way is to perform a m-way clustering, using, say, k-means algorithm to split a cluster into m smaller clusters.

(C) 2001-2003 by Yu Hen Hu


ad