1 / 20

Data Clustering: 50 years beyond K-means

Data Clustering: 50 years beyond K-means. Presenter : Jiang-Shan Wang Authors : Anil K. Jain. 國立雲林科技大學 National Yunlin University of Science and Technology. PRL 2010. Outline. Motivation Objective Data clustering User’s dilemma K-means Extensions of K-means

stacia
Download Presentation

Data Clustering: 50 years beyond K-means

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Clustering: 50 years beyond K-means Presenter : Jiang-Shan Wang Authors : Anil K. Jain 國立雲林科技大學 National Yunlin University of Science and Technology PRL 2010

  2. Outline • Motivation • Objective • Data clustering • User’s dilemma • K-means • Extensions of K-means • Trends in data clustering • Summary • Comments

  3. Motivation • Providing a brief overview of clustering and point out some of the emerging and useful research directions.

  4. Objective Summarizing well known clustering methods, discuss the major challenge and key issues in designing clustering algorithm, and point out some of the emerging and useful research directions.

  5. Data clustering • Three main purposes: • Underlying structure • Natural classification • Compression

  6. K-means • Three parameters • Number of clusters • Cluster initialization • Distance metrics

  7. Extensions of K-means Fuzzy C-means Bisecting K-means X-means K-medoid Kernel K-means

  8. User’s dilemma Representation

  9. User’s dilemma Purpose of grouping

  10. User’s dilemma Number of clusters

  11. User’s dilemma Cluster validity

  12. User’s dilemma Comparing clustering algorithm

  13. User’s dilemma Comparing clustering algorithm

  14. User’s dilemma • Admissibility analysis of clustering algorithms • Fisher and vanNess’s criteria • Convex • Cluster proportion • Cluster omission • Monotone • Kleinberg’s criteria • Scale invariance • Richness • consistency

  15. Trends in data clustering Clustering ensembles

  16. Trends in data clustering Semi-supervised clustering

  17. Trends in data clustering • Large-scale clustering • Studies • Efficient Nearest Neighbor • Data summarization • Distributed computing • Incremental clustering • Sampling-based methods

  18. Trends in data clustering • Multi-way clustering • Heterogeneous data • Rank data • Dynamic data • Graph data • Relational data

  19. Summary There needs to be a suite of benchmark data. A tighter integration between clustering algorithms and the application needs. Optimization problems. Stability or consistency. Choose clustering principles according to satisfiability of the stated axioms. Develop semi-supervised clustering.

  20. Comments • Advantage • Many figures to understanding. • Drawback • … • Application • Clustering.

More Related