The stability of a good clustering
1 / 23

The Stability of a Good Clustering - PowerPoint PPT Presentation

  • Uploaded on

The Stability of a Good Clustering. Marina Meila University of Washington [email protected] similarities. Optimizing these criteria is NP-hard’. worst case. Data Objective Algorithm. K-means. Spectral clustering.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' The Stability of a Good Clustering ' - wesley

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The stability of a good clustering

The Stability of a Good Clustering

Marina Meila

University of Washington

[email protected]

Optimizing these criteria is np hard


Optimizing these criteria is NP-hard’

worst case

  • Data

  • Objective

  • Algorithm


Spectral clustering

...but “spectral clustering, K-means work well when good clustering exists”

interesting case

This talk: If a “good” clustering exists, it is “unique”

If “good” clustering found, it is provably good

Results summary
Results summary

  • Given

    • objective = NCut, K-means distortion

    • data

    • clustering Y with K clusters

  • Spectral lower bound on distortion

  • If small

  • Then small

    where = best clustering with K clusters

A graphical view




A graphical view



  • Introduction

    • Matrix representations for clusterings

    • Quadratic representation for clustering cost

    • The misclassification error distance

  • Results for NCut (easier)

  • Results for K-means distortion (harder)

  • Discussion

Clusterings as matrices
Clusterings as matrices

  • Clustering of { 1,2,..., n } with K clusters (C1, C2,...CK)

  • Represented by n x K matrix

    • unnormalized

    • normalized

  • All matrices have orthogonal columns

Distortion is quadratic in x


Distortion is quadratic in X

NCut K-means

The confusion matrix





The Confusion Matrix

Two clusterings

  • (C1, C2, ... CK) with

  • (C’1, C’2, ... C’K’) with

  • Confusion matrix (K x K’)

The misclassification error distance
The Misclassification Error distance

  • computed by the maximal bipartite matching algorithm between clusters


confusion matrix

classification error


Results for ncut
Results for NCut

  • given

    • data A (n x n)

    • clustering X (n x K)

  • Lower bound for NCut (M02, YS03, BJ03)

  • Upper bound for (MSX’05)


largest e-values of A

Relaxed minimization for

s.t. X = n x K orthogonal matrix


X* = K principal e-vectors of A

small w.r.t eigengap K+1-K X close to X*

convexity proof

Two clusterings X,X’ close to X*

trace XTX’ large

trace XTX’ large


Why the eigengap matters
Why the eigengap matters

  • Example

    • A has 3 diagonal blocks

    • K = 2

    • gap( C ) = gap( C’ ) = 0 but C, C’ not close



Remarks on stability results
Remarks on stability results

  • No explicit conditions on S

    • Different flavor from other stability results, e.g Kannan & al 00, Ng & al 01 which assume S “almost” block diagonal

    • But…results apply only if a good clustering is found

  • There are S matrices for which no clustering satisfies theorem

  • Bound depends on aggregate quantities like

    • K

    • cluster sizes (=probabilities)

  • Points are weighted by their volumes (degrees)

    • good in some applications

    • bounds for unweighted distances can be obtained

Is the bound ever informative
Is the bound ever informative?

  • An experiment: S perfect + additive noise

K means distortion

K = 4

dim = 30


K-means distortion

  • We can do the same ...

  • ...but, K-th principal subspace typically not stable

New approach use k 1 vectors
New approach: Use K-1 vectors

  • Non-redundant representation Y

  • Distortion – new expression

    • ...and new (relaxed) optimization problem

Solution of the new problem
Solution of the new problem

  • Relaxed optimization problem


  • Solution

    • U = K-1 principal e-vectors of A

    • W = KxK orthogonal matrix

      • with on first row


Y close to Y*

Clusterings Y,Y’ close to Y*

||YTY’||F large

||YTY’||F large


Solve relaxed minimization

  • Theorem

    For any two clusterings Y,Y’ with Y, Y’ > 0


    Corollary: Bound for d(Y,Yopt)


K = 4

dim = 30


20 replicates



true error


  • First (?) distribution independent bounds on the clustering error

    • data dependent

      • hold when data well clustered (this is the case of interest)

  • Tight? – not yet...

  • In addition

    • Improved variational bound for the K-means cost

    • Showed local equivalence between “misclassification error” distance and “Frobenius norm distance” (also known as 2distance)

  • Related work

    • Bounds for mixtures of Gaussians (Dasgupta, Vempala)

    • Nearest K-flat to n points (Tseng)

    • Variational bounds for sparse PCA (Mogghadan)