The stability of a good clustering
Download
1 / 23

The Stability of a Good Clustering - PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on

The Stability of a Good Clustering. Marina Meila University of Washington [email protected] similarities. Optimizing these criteria is NP-hard’. worst case. Data Objective Algorithm. K-means. Spectral clustering.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' The Stability of a Good Clustering ' - wesley


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The stability of a good clustering

The Stability of a Good Clustering

Marina Meila

University of Washington

[email protected]


Optimizing these criteria is np hard

similarities

Optimizing these criteria is NP-hard’

worst case

  • Data

  • Objective

  • Algorithm

K-means

Spectral clustering

...but “spectral clustering, K-means work well when good clustering exists”

interesting case

This talk: If a “good” clustering exists, it is “unique”

If “good” clustering found, it is provably good


Results summary
Results summary

  • Given

    • objective = NCut, K-means distortion

    • data

    • clustering Y with K clusters

  • Spectral lower bound on distortion

  • If small

  • Then small

    where = best clustering with K clusters


A graphical view

distortion

lower

bound

A graphical view

clusterings


Overview
Overview

  • Introduction

    • Matrix representations for clusterings

    • Quadratic representation for clustering cost

    • The misclassification error distance

  • Results for NCut (easier)

  • Results for K-means distortion (harder)

  • Discussion


Clusterings as matrices
Clusterings as matrices

  • Clustering of { 1,2,..., n } with K clusters (C1, C2,...CK)

  • Represented by n x K matrix

    • unnormalized

    • normalized

  • All matrices have orthogonal columns


Distortion is quadratic in x

similarities

Distortion is quadratic in X

NCut K-means


The confusion matrix

mkk’

k

k’

=

The Confusion Matrix

Two clusterings

  • (C1, C2, ... CK) with

  • (C’1, C’2, ... C’K’) with

  • Confusion matrix (K x K’)


The misclassification error distance
The Misclassification Error distance

  • computed by the maximal bipartite matching algorithm between clusters

k

confusion matrix

classification error

k’


Results for ncut
Results for NCut

  • given

    • data A (n x n)

    • clustering X (n x K)

  • Lower bound for NCut (M02, YS03, BJ03)

  • Upper bound for (MSX’05)

    whenever

largest e-values of A


Relaxed minimization for

s.t. X = n x K orthogonal matrix

Solution:

X* = K principal e-vectors of A

small w.r.t eigengap K+1-K X close to X*

convexity proof

Two clusterings X,X’ close to X*

trace XTX’ large

trace XTX’ large

small


Why the eigengap matters
Why the eigengap matters

  • Example

    • A has 3 diagonal blocks

    • K = 2

    • gap( C ) = gap( C’ ) = 0 but C, C’ not close

C

C’


Remarks on stability results
Remarks on stability results

  • No explicit conditions on S

    • Different flavor from other stability results, e.g Kannan & al 00, Ng & al 01 which assume S “almost” block diagonal

    • But…results apply only if a good clustering is found

  • There are S matrices for which no clustering satisfies theorem

  • Bound depends on aggregate quantities like

    • K

    • cluster sizes (=probabilities)

  • Points are weighted by their volumes (degrees)

    • good in some applications

    • bounds for unweighted distances can be obtained


Is the bound ever informative
Is the bound ever informative?

  • An experiment: S perfect + additive noise


K means distortion

K = 4

dim = 30

4

K-means distortion

  • We can do the same ...

  • ...but, K-th principal subspace typically not stable


New approach use k 1 vectors
New approach: Use K-1 vectors

  • Non-redundant representation Y

  • Distortion – new expression

    • ...and new (relaxed) optimization problem


Solution of the new problem
Solution of the new problem

  • Relaxed optimization problem

    given

  • Solution

    • U = K-1 principal e-vectors of A

    • W = KxK orthogonal matrix

      • with on first row


small

Y close to Y*

Clusterings Y,Y’ close to Y*

||YTY’||F large

||YTY’||F large

small

Solve relaxed minimization


  • Theorem

    For any two clusterings Y,Y’ with Y, Y’ > 0

    whenever

    Corollary: Bound for d(Y,Yopt)


Experiments

K = 4

dim = 30

Experiments

20 replicates

pmin

bound

true error



Conclusions
Conclusions

  • First (?) distribution independent bounds on the clustering error

    • data dependent

      • hold when data well clustered (this is the case of interest)

  • Tight? – not yet...

  • In addition

    • Improved variational bound for the K-means cost

    • Showed local equivalence between “misclassification error” distance and “Frobenius norm distance” (also known as 2distance)

  • Related work

    • Bounds for mixtures of Gaussians (Dasgupta, Vempala)

    • Nearest K-flat to n points (Tseng)

    • Variational bounds for sparse PCA (Mogghadan)


ad