1 / 15

Fast Algorithms for Mining Association Rules

Fast Algorithms for Mining Association Rules. Rakesh Agrawal Ramakrishnan Srikant. Slides from Ofer Pasternak. Introduction. Bar-Code technology Mining Association Rules over basket data (93) Tires ^ accessories  automotive service Cross market, Attached mail. Very large databases.

allie
Download Presentation

Fast Algorithms for Mining Association Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak

  2. Data Mining Seminar 2003 Introduction • Bar-Code technology • Mining Association Rules over basket data (93) • Tires ^ accessories  automotive service • Cross market, Attached mail. • Very large databases.

  3. Data Mining Seminar 2003 Notation • Items – I = {i1,i2,…,im} • Transaction – set of items • Items are sorted lexicographically • TID – unique identifier for each transaction

  4. Data Mining Seminar 2003 Notation • Association Rule – X  Y

  5. Data Mining Seminar 2003 Confidence and Support • Association rule XY has confidence c, c% of transactions in D that contain X also contain Y. • Association rule XY has support s, s% of transactions in D contain X and Y.

  6. Data Mining Seminar 2003 Define the Problem Given a set of transactions D, generate all association rules that have support and confidence greater than the user-specified minimum support and minimum confidence.

  7. Data Mining Seminar 2003 Discovering all Association Rules • Find all Large itemsets • itemsets with support above minimum support. • Use Large itemsets to generate the rules.

  8. Data Mining Seminar 2003 General idea • Say ABCD and AB are large itemsets • Compute conf = support(ABCD) / support(AB) • If conf >= minconf AB  CD holds.

  9. Data Mining Seminar 2003 Discovering Large Itemsets • Multiple passes over the data • First pass– count the support of individual items. • Subsequent pass • Generate Candidates using previous pass’s large itemset. • Go over the data and check the actual support of the candidates. • Stop when no new large itemsets are found.

  10. Data Mining Seminar 2003 The Trick Anysubset of large itemset is large. Therefore To find large k-itemset • Create candidatesby combining large k-1 itemsets. • Delete those that contain any subset that is not large.

  11. Data Mining Seminar 2003 Algorithm Apriori Count item occurrences Generate new k-itemsets candidates Find the support of all the candidates Take only those with support over minsup

  12. Data Mining Seminar 2003 Candidate generation • Join step • Prune step P and q are 2 k-1 large itemsets identical in all k-2 first items. Join by adding the last item of q to p Check all the subsets, remove a candidate with “small” subset

  13. Data Mining Seminar 2003 Example L3 = { {1 2 3}, {1 24}, {1 3 4}, {1 3 5}, {2 3 4} } After joining { {1 2 3 4}, {1 3 4 5} } After pruning {1 2 3 4} {1 4 5} and {3 4 5} Are not in L3

  14. Data Mining Seminar 2003 Correctness Show that Any subset of large itemset must also be large Join is equivalent to extending Lk-1 with all items and removing those whose (k-1) subsets are not in Lk-1 Preventsduplications

  15. Data Mining Seminar 2003 Subset Function • Candidate itemsets - Ck are stored in a hash-tree • Finds in O(k) time whether a candidate itemset of size k is contained in transaction t. • Total time O(max(k,size(t))

More Related