1 / 47

Mining Association Rules with Constraints

Mining Association Rules with Constraints. Wei Ning Joon Wong COSC 6412 Presentation. Outline. Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References. Outline. Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion

kathie
Download Presentation

Mining Association Rules with Constraints

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

  2. Outline • Introduction • Summary of Approach • Algorithm CAP • Performance Analysis • Conclusion • References

  3. Outline • Introduction • Summary of Approach • Algorithm CAP • Performance Analysis • Conclusion • References

  4. Introduction • Recall mining association rules • Association rules mining finds interesting association or correlation relationships among a large set of data items.

  5. Some problems we met during mining association rules • Overwhelming? • Not what you want? • Wait so long? • Lack of Focus

  6. Introduction(cont.) • Example in walmart • Suppose a manager want to find which is the most popular shoes in winter?

  7. Outline • Introduction • Summary of Approach • Algorithm CAP • Performance Analysis • Conclusion • References

  8. Mining frequent itemsets vs. Mining association rules • Mining frequent itemsets is almost the same as Mining association rules

  9. Constrained Mining • A naive solution • First find all frequent sets, and then test them for constraint satisfaction • Our approach: • Analyze the properties of constraints comprehensively • Push them as deeply as possible inside the frequent pattern computation.

  10. Frequent Itemsets & Constraints TDB (min_sup=2) • Given a transaction database • Frequent itemset: a subset of items frequently appear in transactions, e.g. {a, c} • Constraint: a predicate over itemsets • C(I): sum(I)>50 • C(abd)= true

  11. Mining Frequent Itemsets With Constraints • Given • A transaction database TDB • A support threshold min_sup • A constraint C • Find the complete set of frequent itemsets satisfying the constraint • Use constraint to • Express user’s focus • Improve both effectiveness and efficiency

  12. Classification of Constraints • We have the following classification of constraints • Anti-monotone • Monotone • Succinct • Convertible • Convertible anti-monotone • Convertible monotone • Strongly convertible • Inconvertible

  13. Anti-Monotone • Definition 1 (Anti-Monotone): A 1-var constraint C is anti-monotone if for all sets S, S’: S  S’ & S satisfies C S’ satisfies C. • Simply, when an intemset S violates the constraint, so does any of its superset

  14. Is Min(S)  v anti-monotone? S={5, 10, 14}, v = 7 •  Min(S)  7 {5} violates it. Superset {5}: {5, 10}, {5, 14}, {5, 10 , 14} So does {5, 10}, {5, 14}, {5, 10 , 14} Min(S)  v is anti-monotone

  15. Succinct • Definition 2 (Succinct) • I  Item is a succinct setif it can be expressed as p(Item) for some selection predicate p. • SP  2Item is a succinct powerset if there is a fixed number of succinct sets Item1, … Itemk  Item such that SP can be expressed in terms of the strict powersets of Item1,…,Itemk, using union and minus. • Finally, a 1-var constraint C is succinct provided SATc(Item) is a succinct powerset.

  16. Succinct • General idea: we can enumerate all and only those sets that are guaranteed to satisfy the constraint. • If a constraint is succinct, we can directly generate precisely the sets that satisfy it.

  17. Succinct example • Itemset containing a or b • Itemset containing some item with value more than 30

  18. Succinct example • C1  Item.Price  100 • Item 1 = Item.price  100(Item)={a,b} • 2Item1={{a}, {b}, {a, b}} • SATc1 = {{a}, {b}, {a, b}} • SATc1 = 2Item1 • C1 is succinct

  19. Convertible • Convert tough constraints into anti-monotone or monotone by properly order items

  20. Convertible • Definition: • R is an order of items • Convertible anti-monotone • Itemset X satisfies constraint  so does every prefix of X w.r.t. R

  21. Convertible example • constraint C: avg(X)  25 • Order items in value-descending order • <a, f, g, d, b, h, c, e> • Itemset afd satisfies C • So do prefixes a and af • Thus, it becomes • Anti-monotone!

  22. Commonly Used Constraints— A General Picture

  23. Optional Proof of min(S)  v is Anti-monotone • According to the table, min(S)  v is both anti-monotone and succinct. • I only proof anti-monotone here due to time limitation. • Something special…

  24. Constraint Classification Monotone Antimonotone Strongly convertible Succinct Convertible anti-monotone Convertible monotone Inconvertible

  25. Summary of ApproachRecapitulation • Basic idea about mining frequent itemsets with constraints. • Introduce several important constraints.

  26. Outline • Introduction • Summary of Approach • Algorithm CAP • Performance Analysis • Conclusion • References

  27. Algorithms • There are many algorithms in solving constrained based association rules mining. • Algorithm Direct • Algorithm MultiJoins & Reorder • Algorithm Apriori† • Algorithm Hybrid(m) • Algorithm CAP (Main Focus)

  28. Design of Algorithm • Sound • An algorithm is sound provided it only finds frequent sets that satisfy the given constraints. • Complete • An algorithm is complete provided all frequent sets satisfying the given constraints are found.

  29. Algorithm Apriori† • Main idea : Use Apriori Algorithm to get the frequent item sets. Then apply the constraints on the item sets found. • Step 1) Apriori with Cfreq • Step 2) Apply C – Cfreq to get final Ans

  30. Algorithm Apriori† (Pseudocode) 1. C1 consists of sets of size 1; k = 1; Ans = ; 2. While (Ck not empty) { 2.1 conduct db scan to form Lk from Ck; 2.2 form Ck+1 from Lk based on Cfreq; k++; } 3. For each set S in some Lk: Add S to Ans if S satisfies (C – Cfreq).

  31. The Apriori† Algorithm — An Example L1 Database TDB C1 1st scan C2 C2 2nd scan L2 L3 C3 3rd scan

  32. The Apriori† Algorithm — An Example (cont.) L1 Constraint : {A, C, E}  T.Item Database TDB L2 L3

  33. Algorithm CAP • Succinct and Anti-monotone • Strategy I: Replace C1 in the Apriori Algorithm by C1C. • Anti-monotone but non-succinct • Strategy II: Define Ck as in the Apriori Algorithm. Drop a set S  Ck from counting if S fails C, i.e., constraint satisfaction is tested before counting is done.

  34. Algorithm CAP (cont.) • Succinct but non-anti-monotone • Strategy III: Too Complicated. To be discussed later… • Non-succinct & non-anti-monotone • Strategy IV: Induce any weaker constraint C1 from C. Depending on whether C1 is anti-monotone and/or succinct, use one of the strategies I-III above for the generation of frequent set.

  35. Algorithm CAP (Pseudocode) 1 if CsamCsucCnoneis non-empty, prepare C1 as indicated in Strategies I, III, and IV; k = 1; 2 if Csucis non-empty { 2.1 conduct db scan to form L1 as indicated in Strategy III; 2.2 form C2 as indicated in Strategy III; k = 2;} 3 while (Ck not empty) { 3.1 conduct db scan to form Lk from Ck; 3.2 form Ck+1 from Lk based on Strategy III if Csuc is non-empty, and Strategy II for constraints in Cam;} 4. if Cnoneis empty, Ans = ULk. Otherwise, for each set S in some Lk, add S to Ans iff S satisfies Cnone.

  36. The Algorithm CAP — An Example Constraints : {A, C, E}  T.Item & min support count = 2 Question : Which strategy should we apply? Database TDB

  37. The Algorithm CAP — An Example (Cont.) L1 Database TDB Apply Strategy I!!! C1 1st scan C2 2nd scan C2 L2 C3 Because {A, E} is pruned earlier

  38. min (S) < 5 {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {1} {2} {3} {4} Apriori Case 3 : Succinct but not anti-monotone. Revisit… {1} {2} {3} {4} {1,2} {2,3}………{3,4} ……… {1,2,3,4} Some possible frequent sets may be lost: e.g. {1,8} {1,2,10} **Information extracted from past presentation.

  39. Case 3 : Succinct but not anti-monotone. Continue… • Algorithm Direct • Idea : Play it safe. Generate Cck+1 by using Lck x F where F is the set of all frequent items. • Algorithm MultiJoins • Algorithm Reorder

  40. Outline • Introduction • Summary of Approach • Algorithm CAP • Performance Analysis • Conclusion • References

  41. Performance Analysis (Specification) • Programs written in C • Generate transactional databases using program from IBM Almaden Research Center • 100,000 records, domain of 1,000 items • Page size 4KB • SPARC-10 environment

  42. Performance Analysis (Terminology) • Speedup • Comparison of execution time between two algorithms. • Item Selectivity • x% of them items satisfying the constraints. • Support Threshold • *Low support threshold means more frequent set to process.

  43. Performance Analysis • Note: Support threshold set at 0.5%. • For 10% selectivity, CAP runs 80 times faster than Apriori†! • For 30% selectivity, the speedup is about 10 times.

  44. Performance Analysis • Note: Item Selectivity fixed at 30%. • Support threshold goes up, frequent item set goes down, Apriori† improves. • CAP still at least 8 times faster.

  45. Performance Analysis • Each entry is of the form a/b • a is the # of frequent set satisfying the constraint. • B is the total number of frequent set. • For L4 with support of 0.2%, Apriori† finds 1250 frequent sets where 8 of which is found by CAP.

  46. Conclusion • The idea of anti-monotonicity, succinctness, and convertible are introduced in the paper. • Sound, complete, and efficient algorithms are introduced for the constraint based association rule mining.

  47. Reference • R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. KDD’97. • R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. SIGMOD’98. • J. Pei and J. Han. Can we push more constraints into frequent pattern mining? KDD’00.

More Related