1 / 14

Robust Information-theoretic Clustering

Robust Information-theoretic Clustering. By C. Bohm, C. Faloutsos, J-Y. Pan, and C. Plant Presenter: Niyati Parikh. Objective. Find natural clustering in a dataset Two questions: Goodness of a clustering Efficient algorithm for good clustering. Define “ goodness”.

Download Presentation

Robust Information-theoretic Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Robust Information-theoretic Clustering By C. Bohm, C. Faloutsos, J-Y. Pan, and C. Plant Presenter: Niyati Parikh

  2. Objective • Find natural clustering in a dataset • Two questions: • Goodness of a clustering • Efficient algorithm for good clustering

  3. Define “goodness” • Ability to describe the clusters succinctly • Adopt VAC (Volume after Compression) • Record #bytes for number of clusters k • Record #bytes to record their type (guassian, uniform,..) • Compressed location of each point

  4. VAC • Tells which grouping is better • Lower VAC => better grouping • Formula using decorrelation matrix • Decorrelation matrix = matrix with eigenvectors

  5. Computing VAC • Steps: • Compute covariance matrix of cluster C • Compute PCA and obtain eigenvector matrix • Compute VAC from the matrix

  6. Efficient algorithm • Take initial clustering given by any algorithm • Refine that clustering to remove outliers/noise • Output a better clustering by doing post processing

  7. Refining Clusters • Use VAC to refine existing clusters • Removing outliers from the given cluster C • Define Core and Out as set of points for core and outliers in C • Initially Out contains all points in C • Arrange points in ascending order of its distance from center • Compute VAC • Pick the closest point from Out and move to Core • Compute new VAC • If new VAC increases then stop, else pick next closest point and repeat

  8. VAC and Robust estimation • Conventional estimation: covariance matrix uses Mean • Robust estimation: covariance matrix uses Median • Median is less affected by outliers than Mean

  9. Sample result • Imperfect clusters formed by K-Means affect purifying process • May result into redundant clusters, that could be merged

  10. Cluster Merging • Merge Ci and Cj only if the combined VAC decreases • savedCost(Ci, Cj) = VAC(Ci) + VAC(Cj) – VAC(Ci U Cj) • If savedCost > 0, then merge Ci and Cj • Greedy search to maximize savedCost, hence minimize VAC

  11. Final Result

  12. Experiment results

  13. Example

  14. Thank You • Questions?

More Related