1 / 12

Fast Algorithms for Mining Association Rules

Fast Algorithms for Mining Association Rules. Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999. DB of "Basket Data" TID items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5. association rules {1} => {3} {2,3} => {5} {2,5} => {3}.

betty
Download Presentation

Fast Algorithms for Mining Association Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999

  2. DB of "Basket Data" TID items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 association rules {1} => {3} {2,3} => {5} {2,5} => {3} Mining Association Rules association rule metrics:

  3. General Strategy • Step I: Find all itemsets with minimum support (minsup) • Step II: Generate rules from minsup'ed itemsets

  4. repeat Step I: Finding Minsup Itemsets • Key fact:Adding items to an itemset never increases its support • General Strategy: Proceed inductively on itemset size • Apriori Algorithm: 1. Base case: Begin with all minsup itemsets of size 1 (L1) 2. Without peeking at the DB, generate candidate itemsets ofsizek (Ck) from Lk-1 3. Remove candidate itemsets that contain unsupported subsets 4. Further refine Ck using the database to produce Lk

  5. Algorithm to Guess Itemsets • Naïve way: • Extend all itemsets with all possible items • More sophisticated: • Join Lk-1 with itself, adding only a single, final item • e.g.: {1 2 3}, {1 2 4}, {1 3 4}, {1 3 5}, {2, 3, 4} produces{1 2 3 4} and {1 3 4 5} • Remove itemsets with an unsupported subset • e.g.: {1 3 4 5} has an unsupported subset: {1 4 5} if minsup = 50% • Use the database to further refine Ck

  6. Example

  7. Part II: Generating Rules • Key fact: Moving items from the antecedent to the consequent never changes support, and never increases confidence • Algorithm • For each itemset IS with minsup: • Find all minconf rules with a single consequent of the form (IS - L1 => L1 ) • Guess candidate consequents Ck by appending items from IS - Lk-1 to Lk-1 • Verify confidence of each rule IS - Ck => Ckusing known itemset support values repeat

  8. Other Details • Itemset hash trees for subset testing • Buffering • Variations • Fewer database passes, itemsets from multiple iterations • AprioriTID -- exclude unnecessary database records • AprioriHybrid -- use either Apriori or AprioriTID • Future Work: • Multiple ISA Taxonomies • constraints on rules (e.g. # of items)

  9. Subsequent Papers • Mining sequenced rules • Finding "interesting" rules • Efficiently handling long itemsets • Integration with query optimizers • Adjustments to handle dense/relational databases • Apply constraints to further filter association rules

  10. Questions • How are rules ranked? Do the minsup and minconf find interesting rules? Do they omit any interesting rules? • What about maximum support? • How well will this approach work for other problems (e.g. clustering, classification)?

  11. Apriori

  12. Apriori • Join operation • Subset filtering

More Related