1 / 20

Density Biased Sampling An improved method for clustering

Density Biased Sampling An improved method for clustering. By: Mesbah, Seyedsadra Department of computer Science, Lakehead University December 2013. Table of Contents. Abstract Introduction Density Biased Sampling Related Works Approximating Density Biased Sampling Experiments

sherri
Download Presentation

Density Biased Sampling An improved method for clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Density BiasedSamplingAn improved method for clustering By: Mesbah, Seyedsadra Department of computer Science, Lakehead University December2013

  2. Table of Contents • Abstract • Introduction • Density Biased Sampling • Related Works • Approximating Density Biased Sampling • Experiments • Methodology • Evaluation Metrics • Data Generation • Results • Conclusion • References

  3. Abstract • purpose • Problem with Uniform Random Sampling • Under Sample / Over Sample • Weighted Sample • Memory Efficient

  4. Introduction • Uniform Sampling / No Value Consideration • Sets of Equivalent Records • Clustering in General • Reduce the Data Size • P-Uniform • Example • Density Biased Sample / Weighted Sample

  5. Density Biased Sampling • Basic Definition • Constraints • Uniform Selection • Density Preserving Sample • Biased by Group Size / Sample Size M • Observations

  6. Related Works • Some of Related Works • BIRCH Algorithm • Uniform Sampling vs. CF-Tree • DBS vs. PPS

  7. Approximating DBS • Need to be Partitioned • Lack of Memory Problem • Two Pass algorithm • Sample of First j Items • Convert to One Pass Algorithm

  8. Experiments • Aim • Conditions

  9. Methodology • Experiment Specifications • BIRCH Summarization • Uniform Random Sampling • Hash Based Approximation • Exact Density Biased Sampling

  10. Evaluation Metrics • RMS • RMS Error • Number of Clusters Found (NC)

  11. Data Generation • Based on Mixture Model • Discard Noises • Cluster Membership Distributions • Example

  12. Results (1) • BIRCH performs quite poorly

  13. Results (2) • IBS and IRBS Find More Clusters

  14. Results (3) • In Average Case, IBS and IRBS Are Better

  15. Results (4) • Binning is ideal for IRBS

  16. Results (5) • Collisions Have no Effect on Clustering

  17. Applications • Improve Summarizations • Statistical Models

  18. Conclusion • General Summary • Hash Based Approximation • Appropriate Binning • Problem with Uniform Sampling • Using Zipf Distribution

  19. References 1. Christopher R. Palmer , Christos Faloutsos"Density Biased Sampling: An Improved Method for Data Mining and Clustering" 2. International Journal of Computer Science and Management Research Vol 1 Issue 1 Aug 2012ISSN 2278-733X A.K.Jainet.al. 72"Survey of Recent Clustering Techniques in Data Mining"

  20. Thank you

More Related