exploratory mining and pruning optimization of constrained associations rules n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Exploratory Mining and Pruning Optimization of Constrained Associations Rules PowerPoint Presentation
Download Presentation
Exploratory Mining and Pruning Optimization of Constrained Associations Rules

Loading in 2 Seconds...

  share
play fullscreen
1 / 22
angus

Exploratory Mining and Pruning Optimization of Constrained Associations Rules - PowerPoint PPT Presentation

109 Views
Download Presentation
Exploratory Mining and Pruning Optimization of Constrained Associations Rules
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Exploratory Mining and Pruning Optimization of Constrained Associations Rules Data Engineering Lab 성 유진

  2. Abstract • Standpoint of supporting human-centered discovery of Knowledge • lack of user exploration and control • lack of focus • rigid notion of relationship • Constrained association queries • pruning using monotonicity, succinctness Data Engineering Lab 성 유진

  3. Introduction • Problem1 (Lack of User Exploration and Control) • Mining Process => Black Box • (user can’t preempt and needs to wait for hours) • establish clear breakpoints to allow user feedback • Problem2 (Lack of Focus) • on which to focus the mining  to find association between sets of items whose types do not overlap Data Engineering Lab 성 유진

  4.  associations from item sets whose total price is at least $1,000 • provide a rich interface for the user to express focus (CAQ) • Problem3 (Rigid notion of Relationship) • significance metrics : • separate criteria for selecting candidates for the antecedent and consequent:  association from items to sets of types pepsi => snacks Data Engineering Lab 성 유진

  5. Data Engineering Lab 성 유진

  6. Architecture • Phase 1 • user initially specifies CAQ • includes a set of constraints C • C is applicable to the antecedent and consequent • output: • pairs of candidates(Sa, Sc) • Sa, Sc have support over thresholds • user can add, delete, of modify the constraints as many times as desired Data Engineering Lab 성 유진

  7. Phase 2 • significance metric • a threshold for the metric • whatever further conditions to be imposed ont the antecedent and consequent  classical association mining - confidence (as significance metric) - confidence threshold - require ( SaSc) be frequent Data Engineering Lab 성 유진

  8. Data Engineering Lab 성 유진

  9. Constrained Association Queries • CAQ • S  Item : S is a set variable on the Item domain • {(S1, S2) |C}, C is a set of constraints on S1, S2 • frequent constraints freq(Si) • trans(TID, Itemset), iteminfo(Item, Type, Price) • S.price  100 : all items in S are of price less than of equal to $100 • {snacks, sodas}  S.Type Data Engineering Lab 성 유진

  10. CAQ Examples • {(S1, S2) | S1 Item & S2 Item & count(S1) = 1 & count(S2) = 1 & freq(S1) & freq(S2)} • S1.Type  S2.Type   and max(S1.Price) avg(S2.Price) • {(S1, S2) | agg1(S1.Price)  100& agg2(S2.Price 1000} • {(S1, S2) | S1.Type  {Snacks} & S2.Type  {beers} & max(S1.Price) min(S2.Price) • Sound/Complete • algorithm is sound if it only finds frequent sets that satisfy the given constraints • algorithm is complete if all frequent sets satisfying the given constraints are found Data Engineering Lab 성 유진

  11. Goal • to push the constraints as deeply as possible inside the computation of frequent set • classical algorithm + test them for constraint satisfaction => too inefficient • sound/complete : anti-monotone, succinctness Data Engineering Lab 성 유진

  12. Anti-Monotone Constraints • Find constraints which satisfy anti-monotone • prune away a significant num of candidates • Definition • A 1-var constraint C is anti-monotone iff for all sets S, S’: • S  S’ & S satisfies C  S’ satisfies C • Identify which constraints are anti-monotone • Fig3 • min(S)  v (anti-monotone) , min(S)  v (not ) Data Engineering Lab 성 유진

  13. Data Engineering Lab 성 유진

  14. Succinct Constraints • once-and-for-all (before any iteration takes place) • not generate and test paradigm • how to • succinctness • member generating functions • definition • SATc(Item) : the set of item sets satisfying C , pruned space • C1 S.Price 100 , pruned space for C1 contains only item sets such that each item in the set has a price at least $100 • selection predicate, p Data Engineering Lab 성 유진

  15. Data Engineering Lab 성 유진

  16. Example C1 S.Price 100 , let Item1 = price 100 (Item): • C1 is succinct because its pruned space SATc1(Item) is simply 2item1  C2 {snacks, sodas}  S.Type : Let Item2, Item3 , Item4 be the sets type = ‘snacks’(Item),type = ‘sodas’(Item) , type  ‘snacks’  type  ‘sodas’ (Item) • C2 is succint SATC2(Item) can be expressed as 2item - 2item2 - 2item3 - 2item4 - 2item2  item4 - 2item3  item4 Data Engineering Lab 성 유진

  17. Example • C1 S.Price 100, MGF = {X |X  Item1 & C   } • C2 {snacks, sodas}  S.Type, MGF = {X1 X2  X3| • X1  Item2 & X1   & X2 Item3 & X2   & X3 Item4} Data Engineering Lab 성 유진

  18. Algorithms • Algorithm Apriori+ • computes the frequent set => among frequent set, those which satisfy constraints become answer set • Algorithm Hybrid(m) • in case (C - Cfreq ) is more selective , apriori+ is inefficient • First check Cfreq for m iterations • to reduce the remaining I/O cost, it switches to checking (C- Cfreq) Data Engineering Lab 성 유진

  19. Data Engineering Lab 성 유진

  20. CAP algorithm • 4 Cases  succinct and Anti-monotone • Replace C1 in the Apriori Algorithm by C1c  succinct but not anti-monotone Data Engineering Lab 성 유진

  21.  Anti-monotone but Non-succinct • Define Ck as in apriori algorithm, drop the candidates S if S fails C • constraint satisfaction is tested before counting is done  neither • Induce any weaker constraint C’ from C, depending on whether C’ is anti-monotone and /or sucinct, use the above strategies • Once all frequent sets are generated, test them for satisfaction of C Data Engineering Lab 성 유진

  22. Data Engineering Lab 성 유진