Loading in 5 sec....

The Stability of a Good Clustering PowerPoint Presentation

The Stability of a Good Clustering

- 53 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' The Stability of a Good Clustering ' - wesley

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Optimizing these criteria is NP-hard’

worst case

- Data
- Objective
- Algorithm

K-means

Spectral clustering

...but “spectral clustering, K-means work well when good clustering exists”

interesting case

This talk: If a “good” clustering exists, it is “unique”

If “good” clustering found, it is provably good

Results summary

- Given
- objective = NCut, K-means distortion
- data
- clustering Y with K clusters

- Spectral lower bound on distortion
- If small
- Then small
where = best clustering with K clusters

Overview

- Introduction
- Matrix representations for clusterings
- Quadratic representation for clustering cost
- The misclassification error distance

- Results for NCut (easier)
- Results for K-means distortion (harder)
- Discussion

Clusterings as matrices

- Clustering of { 1,2,..., n } with K clusters (C1, C2,...CK)
- Represented by n x K matrix
- unnormalized
- normalized

- All matrices have orthogonal columns

mkk’

k

k’

=

The Confusion MatrixTwo clusterings

- (C1, C2, ... CK) with
- (C’1, C’2, ... C’K’) with
- Confusion matrix (K x K’)

The Misclassification Error distance

- computed by the maximal bipartite matching algorithm between clusters

k

confusion matrix

classification error

k’

Results for NCut

- given
- data A (n x n)
- clustering X (n x K)

- Lower bound for NCut (M02, YS03, BJ03)
- Upper bound for (MSX’05)
whenever

largest e-values of A

s.t. X = n x K orthogonal matrix

Solution:

X* = K principal e-vectors of A

small w.r.t eigengap K+1-K X close to X*

convexity proof

Two clusterings X,X’ close to X*

trace XTX’ large

trace XTX’ large

small

Why the eigengap matters

- Example
- A has 3 diagonal blocks
- K = 2
- gap( C ) = gap( C’ ) = 0 but C, C’ not close

C

C’

Remarks on stability results

- No explicit conditions on S
- Different flavor from other stability results, e.g Kannan & al 00, Ng & al 01 which assume S “almost” block diagonal
- But…results apply only if a good clustering is found

- There are S matrices for which no clustering satisfies theorem
- Bound depends on aggregate quantities like
- K
- cluster sizes (=probabilities)

- Points are weighted by their volumes (degrees)
- good in some applications
- bounds for unweighted distances can be obtained

Is the bound ever informative?

- An experiment: S perfect + additive noise

dim = 30

4

K-means distortion- We can do the same ...
- ...but, K-th principal subspace typically not stable

New approach: Use K-1 vectors

- Non-redundant representation Y
- Distortion – new expression
- ...and new (relaxed) optimization problem

Solution of the new problem

- Relaxed optimization problem
given

- Solution
- U = K-1 principal e-vectors of A
- W = KxK orthogonal matrix
- with on first row

Y close to Y*

Clusterings Y,Y’ close to Y*

||YTY’||F large

||YTY’||F large

small

Solve relaxed minimization

- Theorem
For any two clusterings Y,Y’ with Y, Y’ > 0

whenever

Corollary: Bound for d(Y,Yopt)

Conclusions

- First (?) distribution independent bounds on the clustering error
- data dependent
- hold when data well clustered (this is the case of interest)

- data dependent
- Tight? – not yet...
- In addition
- Improved variational bound for the K-means cost
- Showed local equivalence between “misclassification error” distance and “Frobenius norm distance” (also known as 2distance)

- Related work
- Bounds for mixtures of Gaussians (Dasgupta, Vempala)
- Nearest K-flat to n points (Tseng)
- Variational bounds for sparse PCA (Mogghadan)

Download Presentation

Connecting to Server..