1 / 13

Determining the best K for clustering transactional datasets – A coverage density-based approach

Determining the best K for clustering transactional datasets – A coverage density-based approach. Presenter : Lin, Shu -Han Authors : Hua Yan, Keke Chen, Ling Liu, Joonsoo Bae. Data & Knowledge Engineering (DKE) 68 (2009) 28–48. Outline. Motivation Objective Methodology

olympe
Download Presentation

Determining the best K for clustering transactional datasets – A coverage density-based approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Determining the best K for clustering transactional datasets–A coverage density-based approach Presenter : Lin, Shu-Han Authors : Hua Yan, Keke Chen, Ling Liu, JoonsooBae Data & Knowledge Engineering(DKE) 68 (2009) 28–48

  2. Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Comments

  3. Motivation Booleanvalues Clusterthetransactional datasets– akindofspecialcategoricaldata Timecomplexity:O(dmN2logN) 3

  4. Objectives • TodesignamethodACTD(AgglomerativeClusteringalgorithmwithTransactional-cluster-modesDissimilarity)especiallyfortransactionaldata • Instead of ACE (Agglomerative Categorical clustering with Entropy criterion) • Findbest-K • Moreefficiently 4

  5. Methodology–OverviewofSCALE (Sampling,ClusteringstructureAssessment,cLustering&domain-specficEvaluation) Agglomerative ACE ACTD BKPlot DMDI 5

  6. Methodology– ACTDIntra-cluster similarity Nk Mk in this case, only c is the transactional-cluster-mode • Coverage Density • Transactional-cluster-mode • A subset of items 6

  7. Methodology– ACTDInter-cluster similarity • [0, .5] Transactional-cluster-mode dissimilarity Timecomplexity:O(dmN2logN) O(MN2logN) 7

  8. Methodology– DMDI Valleys、 changedramatically 8

  9. Experiments – Performance 9

  10. Experiments – Quality 10

  11. Experiments – Quality on sample dataset Withnoise 11

  12. Conclusions TheACTD • TheCoverageDensity-basedmethodispromisingfortransactionaldatasets • Faster • Morestable thanentropy-basedmethod • TheAgglomerativeHierarchicalclusteringalgorithmandDMDIcanhelptofindbest-K

  13. Comments • Advantage • … • Drawback • … • Application • …

More Related