Mining Favorable Facets - PowerPoint PPT Presentation

mining favorable facets n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Mining Favorable Facets PowerPoint Presentation
Download Presentation
Mining Favorable Facets

play fullscreen
1 / 22
Mining Favorable Facets
104 Views
Download Presentation
brandee-canute
Download Presentation

Mining Favorable Facets

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Mining Favorable Facets Presenter : Wei-Hao Huang Authors : Raymond Chi-Wing Wong, Jian Pei, Ada Wai-Chee Fu, Ke Wang SIGKDD, 2008

  2. Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments

  3. Motivation • The importance of dominance and skyline analysis in multi-criteria decision making applications. • Fixed order v.s. different customers may have different preferences on nominal attributes. • Finding favorable facets.

  4. Objectives • Propose to minimal disqualifying condition (MDC) which can summarize favorable facets and is meaningful to the user. • Develop two algorithms: • Computing MDC On-the-fly (MDC-O) • A Materialization Method (MDC-M) • Use real data sets and synthetic data set to verify effectiveness and efficiency

  5. Methodology • Skyline analysis • Naïve Method • Minimal Disqualifying Conditions(MDC) • MDC On-the-fly (MDC-O) • A Materialization Method (MDC-M)

  6. Skyline analysis

  7. Naïve Method: Lattice Search

  8. Minimal Disqualifying Conditions • Used to summarize favorable facets effectively. R’={(T,M)} R’’={(H,M)} MDC(f)={(T,M),(H,M)}

  9. MDC-O: Computing MDC On-the-fly Point: P Data Set: D Template: R Process MDC(P)

  10. MDC-M: A Materialization Method Data Set: D Template: R Process SKY(R) MDC

  11. Indexing for Speed-up • Use R-tree index structure • An R-tree can be built the totally ordered attributes T • Find points that quasi-dominates p, a range search is conducted on the R-tree

  12. Experiments • Synthetic Data Set • Dimension • Numeric attributes • Nominal attributes • Tuples • Template Size • Cardinality of Nominal Attributes • Zipfian Parameter • Real Data Set • Nursery • Automobile

  13. Synthetic Data Set-Dimension(numeric attributes)

  14. Synthetic Data Set-Dimension(nominal attributes)

  15. Synthetic Data Set-Tuples 500k -> 1000k

  16. Synthetic Data Set-Template Size

  17. Synthetic Data Set-Cardinality of Nominal Attributes

  18. Real Data Set • Nursery Data Set • There are 12,960 instances and 8 attributes. • The results in the performance are similar to synthetic data sets. • Automobile Data Set • Computation times were negligibly small. • Honda, Mitsubishi and Toyota.

  19. Conclusions • MDC is effective in summarizing the favorable facets. • The experimental results show proposed methods are efficacious. • Future work is used to dynamic data and ordering is an interesting topic.

  20. Comments • Advantages • Finding favorable facets which has not been studied before. • Effectiveness and the efficiency of the mining. • Applications • Information retrieval