Mining Favorable Facets

# Mining Favorable Facets

## Mining Favorable Facets

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Mining Favorable Facets Presenter : Wei-Hao Huang Authors : Raymond Chi-Wing Wong, Jian Pei, Ada Wai-Chee Fu, Ke Wang SIGKDD, 2008

2. Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments

3. Motivation • The importance of dominance and skyline analysis in multi-criteria decision making applications. • Fixed order v.s. different customers may have different preferences on nominal attributes. • Finding favorable facets.

4. Objectives • Propose to minimal disqualifying condition (MDC) which can summarize favorable facets and is meaningful to the user. • Develop two algorithms: • Computing MDC On-the-fly (MDC-O) • A Materialization Method (MDC-M) • Use real data sets and synthetic data set to verify effectiveness and efficiency

5. Methodology • Skyline analysis • Naïve Method • Minimal Disqualifying Conditions(MDC) • MDC On-the-fly (MDC-O) • A Materialization Method (MDC-M)

6. Skyline analysis

7. Naïve Method: Lattice Search

8. Minimal Disqualifying Conditions • Used to summarize favorable facets effectively. R’={(T,M)} R’’={(H,M)} MDC(f)={(T,M),(H,M)}

9. MDC-O: Computing MDC On-the-fly Point: P Data Set: D Template: R Process MDC(P)

10. MDC-M: A Materialization Method Data Set: D Template: R Process SKY(R) MDC

11. Indexing for Speed-up • Use R-tree index structure • An R-tree can be built the totally ordered attributes T • Find points that quasi-dominates p, a range search is conducted on the R-tree

12. Experiments • Synthetic Data Set • Dimension • Numeric attributes • Nominal attributes • Tuples • Template Size • Cardinality of Nominal Attributes • Zipfian Parameter • Real Data Set • Nursery • Automobile

13. Synthetic Data Set-Dimension(numeric attributes)

14. Synthetic Data Set-Dimension(nominal attributes)

15. Synthetic Data Set-Tuples 500k -> 1000k

16. Synthetic Data Set-Template Size

17. Synthetic Data Set-Cardinality of Nominal Attributes

18. Real Data Set • Nursery Data Set • There are 12,960 instances and 8 attributes. • The results in the performance are similar to synthetic data sets. • Automobile Data Set • Computation times were negligibly small. • Honda, Mitsubishi and Toyota.

19. Conclusions • MDC is effective in summarizing the favorable facets. • The experimental results show proposed methods are efficacious. • Future work is used to dynamic data and ordering is an interesting topic.

20. Comments • Advantages • Finding favorable facets which has not been studied before. • Effectiveness and the efficiency of the mining. • Applications • Information retrieval