1 / 18

DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation. Alexander Hinneburg Martin-Luther-University Halle-Wittenberg, Germany Hans-Henning Gabriel 101tec GmbH, Halle, Germany. Overview. Density-based clustering and DENCLUE 1.0 Hill climbing as EM-algorithm

Download Presentation

DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation Alexander Hinneburg Martin-Luther-University Halle-Wittenberg, Germany Hans-Henning Gabriel 101tec GmbH, Halle, Germany

  2. Overview • Density-based clustering and DENCLUE 1.0 • Hill climbing as EM-algorithm • Identification of local maxima • Applications of general EM-acceleration • Experiments

  3. Density-Based Clustering • Assumption • clusters are regions of high density in the data space , • How to estimate density? • parametric models • mixture models • non-parametric models • histogram • kernel density estimation

  4. Kernel Density Estimation • Idea • influence of a data point is modeled by a kernel • density is the normalized sum of all kernels • smoothing parameter h Gaussian Kernel Density Estimate

  5. DENCLUE 1.0 Framework • Clusters are defined by local maxima of the density estimate • find all maxima by hill climbing • Problem • const. step size Gradient Hill Climbing const. step size

  6. Problem of const. Step Size • Not efficient • many unnecessary small steps • Not effective • does not converge to a local maximumjust comes close • Example

  7. New Hill Climbing Approach • General approach • differentiate density estimate and set to zero • no solution, but can be used for iteration

  8. New DENCLUE 2.0 Hill Climbing • Efficient • automatically adjusted step size at no extra costs • Effective • converges to local maximum (proof follows) • Example

  9. Proof of Convergence • Cast the problem of maximizing kernel denstiy as maximizing the likelihood of a mixture model • Introduce hidden variable

  10. Proof of Convergence • Complete likelihood is maximized by EM-Algorithm • this also maximizes the original likelihood, which is the kernel density estimate • When starting the EM with we do the hill climbing for E-Step M-Step

  11. Identification of local Maxima • EM-Algorithm iterates until • reached end point • sum of k last step sizes • Assumption • true local maximum is in a ball of around • Points with end points closerbelong to the same maximum M • In case of non-unique assignmentdo a few extra EM iterations

  12. Acceleration • Sparse EM • update only the p% points with largest posterior • saves 1-p% of kernel computations after first iteration • Data Reduction • use only %p of the data as representative points • random sampling • kMeans

  13. Experiments • Comparison of DENCLUE 1.0 (FS) vs. 2.0 (SSA) • 16-dim. artificial data • both methods are tuned to find the correct clustering

  14. Experiments • Comparison of acceleration methods

  15. Experiments • Clustering quality (normalized mutual information, NMI) vs. sample size (RS)

  16. Experiments • Cluster Quality (NMI) of DENCLUE 2.0 (SSA) and acceleration methods and k-Means on real data sample sizes 0.8, 0.4, 0.2

  17. Conclusion • New hill climbing for DENCLUE • Automatic step size adjustment • Convergence proof by reduction to EM • Allows the application of general EM accelerations • Future work • automatic setting of smoothing parameter h(so far tuned manually)

  18. Thank you for your attention!

More Related