the stability of a good clustering
Download
Skip this Video
Download Presentation
The Stability of a Good Clustering

Loading in 2 Seconds...

play fullscreen
1 / 23

The Stability of a Good Clustering - PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on

The Stability of a Good Clustering. Marina Meila University of Washington [email protected] similarities. Optimizing these criteria is NP-hard’. worst case. Data Objective Algorithm. K-means. Spectral clustering.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' The Stability of a Good Clustering ' - wesley


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the stability of a good clustering

The Stability of a Good Clustering

Marina Meila

University of Washington

[email protected]

optimizing these criteria is np hard

similarities

Optimizing these criteria is NP-hard’

worst case

  • Data
  • Objective
  • Algorithm

K-means

Spectral clustering

...but “spectral clustering, K-means work well when good clustering exists”

interesting case

This talk: If a “good” clustering exists, it is “unique”

If “good” clustering found, it is provably good

results summary
Results summary
  • Given
    • objective = NCut, K-means distortion
    • data
    • clustering Y with K clusters
  • Spectral lower bound on distortion
  • If small
  • Then small

where = best clustering with K clusters

a graphical view

distortion

lower

bound

A graphical view

clusterings

overview
Overview
  • Introduction
    • Matrix representations for clusterings
    • Quadratic representation for clustering cost
    • The misclassification error distance
  • Results for NCut (easier)
  • Results for K-means distortion (harder)
  • Discussion
clusterings as matrices
Clusterings as matrices
  • Clustering of { 1,2,..., n } with K clusters (C1, C2,...CK)
  • Represented by n x K matrix
    • unnormalized
    • normalized
  • All matrices have orthogonal columns
the confusion matrix

mkk’

k

k’

=

The Confusion Matrix

Two clusterings

  • (C1, C2, ... CK) with
  • (C’1, C’2, ... C’K’) with
  • Confusion matrix (K x K’)
the misclassification error distance
The Misclassification Error distance
  • computed by the maximal bipartite matching algorithm between clusters

k

confusion matrix

classification error

k’

results for ncut
Results for NCut
  • given
    • data A (n x n)
    • clustering X (n x K)
  • Lower bound for NCut (M02, YS03, BJ03)
  • Upper bound for (MSX’05)

whenever

largest e-values of A

slide11

Relaxed minimization for

s.t. X = n x K orthogonal matrix

Solution:

X* = K principal e-vectors of A

small w.r.t eigengap K+1-K X close to X*

convexity proof

Two clusterings X,X’ close to X*

trace XTX’ large

trace XTX’ large

small

why the eigengap matters
Why the eigengap matters
  • Example
    • A has 3 diagonal blocks
    • K = 2
    • gap( C ) = gap( C’ ) = 0 but C, C’ not close

C

C’

remarks on stability results
Remarks on stability results
  • No explicit conditions on S
    • Different flavor from other stability results, e.g Kannan & al 00, Ng & al 01 which assume S “almost” block diagonal
    • But…results apply only if a good clustering is found
  • There are S matrices for which no clustering satisfies theorem
  • Bound depends on aggregate quantities like
    • K
    • cluster sizes (=probabilities)
  • Points are weighted by their volumes (degrees)
    • good in some applications
    • bounds for unweighted distances can be obtained
is the bound ever informative
Is the bound ever informative?
  • An experiment: S perfect + additive noise
k means distortion

K = 4

dim = 30

4

K-means distortion
  • We can do the same ...
  • ...but, K-th principal subspace typically not stable
new approach use k 1 vectors
New approach: Use K-1 vectors
  • Non-redundant representation Y
  • Distortion – new expression
    • ...and new (relaxed) optimization problem
solution of the new problem
Solution of the new problem
  • Relaxed optimization problem

given

  • Solution
    • U = K-1 principal e-vectors of A
    • W = KxK orthogonal matrix
        • with on first row
slide18

small

Y close to Y*

Clusterings Y,Y’ close to Y*

||YTY’||F large

||YTY’||F large

small

Solve relaxed minimization

slide19
Theorem

For any two clusterings Y,Y’ with Y, Y’ > 0

whenever

Corollary: Bound for d(Y,Yopt)

experiments

K = 4

dim = 30

Experiments

20 replicates

pmin

bound

true error

conclusions
Conclusions
  • First (?) distribution independent bounds on the clustering error
    • data dependent
      • hold when data well clustered (this is the case of interest)
  • Tight? – not yet...
  • In addition
    • Improved variational bound for the K-means cost
    • Showed local equivalence between “misclassification error” distance and “Frobenius norm distance” (also known as 2distance)
  • Related work
    • Bounds for mixtures of Gaussians (Dasgupta, Vempala)
    • Nearest K-flat to n points (Tseng)
    • Variational bounds for sparse PCA (Mogghadan)
ad