1 / 23

The Stability of a Good Clustering

The Stability of a Good Clustering. Marina Meila University of Washington mmp@stat.washington.edu. similarities. Optimizing these criteria is NP-hard’. worst case. Data Objective Algorithm. K-means. Spectral clustering.

Download Presentation

The Stability of a Good Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Stability of a Good Clustering Marina Meila University of Washington mmp@stat.washington.edu

  2. similarities Optimizing these criteria is NP-hard’ worst case • Data • Objective • Algorithm K-means Spectral clustering ...but “spectral clustering, K-means work well when good clustering exists” interesting case This talk: If a “good” clustering exists, it is “unique” If “good” clustering found, it is provably good

  3. Results summary • Given • objective = NCut, K-means distortion • data • clustering Y with K clusters • Spectral lower bound on distortion • If small • Then small where = best clustering with K clusters

  4. distortion lower bound A graphical view clusterings

  5. Overview • Introduction • Matrix representations for clusterings • Quadratic representation for clustering cost • The misclassification error distance • Results for NCut (easier) • Results for K-means distortion (harder) • Discussion

  6. Clusterings as matrices • Clustering of { 1,2,..., n } with K clusters (C1, C2,...CK) • Represented by n x K matrix • unnormalized • normalized • All matrices have orthogonal columns

  7. similarities Distortion is quadratic in X NCut K-means

  8. mkk’ k k’ = The Confusion Matrix Two clusterings • (C1, C2, ... CK) with • (C’1, C’2, ... C’K’) with • Confusion matrix (K x K’)

  9. The Misclassification Error distance • computed by the maximal bipartite matching algorithm between clusters k confusion matrix classification error k’

  10. Results for NCut • given • data A (n x n) • clustering X (n x K) • Lower bound for NCut (M02, YS03, BJ03) • Upper bound for (MSX’05) whenever largest e-values of A

  11. Relaxed minimization for s.t. X = n x K orthogonal matrix Solution: X* = K principal e-vectors of A small w.r.t eigengap K+1-K X close to X* convexity proof Two clusterings X,X’ close to X* trace XTX’ large trace XTX’ large small

  12. Why the eigengap matters • Example • A has 3 diagonal blocks • K = 2 • gap( C ) = gap( C’ ) = 0 but C, C’ not close C C’

  13. Remarks on stability results • No explicit conditions on S • Different flavor from other stability results, e.g Kannan & al 00, Ng & al 01 which assume S “almost” block diagonal • But…results apply only if a good clustering is found • There are S matrices for which no clustering satisfies theorem • Bound depends on aggregate quantities like • K • cluster sizes (=probabilities) • Points are weighted by their volumes (degrees) • good in some applications • bounds for unweighted distances can be obtained

  14. Is the bound ever informative? • An experiment: S perfect + additive noise

  15. K = 4 dim = 30 4 K-means distortion • We can do the same ... • ...but, K-th principal subspace typically not stable

  16. New approach: Use K-1 vectors • Non-redundant representation Y • Distortion – new expression • ...and new (relaxed) optimization problem

  17. Solution of the new problem • Relaxed optimization problem given • Solution • U = K-1 principal e-vectors of A • W = KxK orthogonal matrix • with on first row

  18. small Y close to Y* Clusterings Y,Y’ close to Y* ||YTY’||F large ||YTY’||F large small Solve relaxed minimization

  19. Theorem For any two clusterings Y,Y’ with Y, Y’ > 0 whenever Corollary: Bound for d(Y,Yopt)

  20. K = 4 dim = 30 Experiments 20 replicates pmin bound true error

  21. B A D

  22. Conclusions • First (?) distribution independent bounds on the clustering error • data dependent • hold when data well clustered (this is the case of interest) • Tight? – not yet... • In addition • Improved variational bound for the K-means cost • Showed local equivalence between “misclassification error” distance and “Frobenius norm distance” (also known as 2distance) • Related work • Bounds for mixtures of Gaussians (Dasgupta, Vempala) • Nearest K-flat to n points (Tseng) • Variational bounds for sparse PCA (Mogghadan)

More Related