1 / 18

A Classical Apriori Algorithm for Mining Association Rules

A Classical Apriori Algorithm for Mining Association Rules. What is an Association Rule?. Given a set of transactions { t 1 , t 2 , ...,t n } where a transaction t i is a set of items {X i1 , … , X im } An association rule is an expression: A ==> B

corin
Download Presentation

A Classical Apriori Algorithm for Mining Association Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Classical Apriori Algorithm for Mining Association Rules

  2. What is an Association Rule? • Given a set of transactions {t1, t2, ...,tn} • where a transaction ti is a set of items {Xi1, … , Xim} • An association rule is an expression: • A ==> B • where A & B are sets of items, and A  B =  • Meaning: transactions which contain A also contain B

  3. Two Thresholds • Measurement of rule strength in a • relational transaction database • A ==> B [support, confidence] • support(AB)= • confidence (A ==> B)=

  4. Strong Rules • We are interested in strong associations, i.e., • support min_sup • & confidence min_conf • Examples: • bread & butter ==> milk [support=5%, confidence=60%] • beer ==> diapers [support=10%, confidence=80%]

  5. Mining Association Rules • Mining association rules from a large dataset of items • can improve the quality of business decisions • A supermarket with a large collection of items, • typical business decisions: • what to put on sale • how to design coupons, • how to place merchandise on shelves to • maximize the profit, etc.

  6. Mining Association Rules (2) • There are two main steps in mining association rules • 1. Find all combinations of items that have transaction • support above minimum support (frequent itemsets) • 2. Generate association rules from the frequent itemsets • Most existing algorithms focused on the first step • because it requires a great deal of computation, memory, • and I/O, and has a significant impact on the overall • performance

  7. The Classical Mining Algorithm Apriori (Agrawal, et al.’94) • At the first iteration, scan all the transactions and • count the the number of occurrences for each items. • This derives the frequent 1-itemsets, L1 • At the k-th iteration, the candidate set Ck are those • whose every (k-1)-item subset is in Lk-1is formed • Scan the database and count the number of • occurrences for each candidate k-itemset • Totally, it needs xdatabase scans for x levels

  8. Moving 1 level at a time (Apriori) through an itemset lattice Level x … Level (k+1) Level k … Level 3 Level 2 Level l

  9. The Algorithm Apriori 1. L1 = {frequent 1-itemset} 2. For (k=2; Lk-1 L < > 0, k++) { 3. Ck = Apriori_gen(Lk-1) ; 4. for all transactions t in D do 5. for all candidates c in D do 6. c.count++ ; 7. Lk = {c in Ck | c.count >= minimum support} 8. } 9. Result = UkLk

  10. The Algorithm Apriori _gen Pre: all itemsets in Lk-1 Post: itemsets in Ck Insert into Ck Select p.item1, p.item2, …, p.itemk-1, q.itemk-1 From Lk-1 p, Lk-1 q Where p.item1 = q.item1, …, p.itemk-2 = q.itemk-2, p.itemk-1 = q.itemk-1

  11. The prune step Pre: itemsets in Ckand Lk-1 Post: itemsets in Cksuch that some (k-1)-subset of c which is not in Lk-1are deleted Forall itemsets c  Ckdo Forall (k-1)-subsets s of c do if (s  Lk-1) then delete c from Ck

  12. An Example Input Dataset Tid items 1 A B C 2 B C E 3 A B C E F 4 A B C D 5 A B C E 6 A B C E F 7 B C D E F 8 A B C 9 A C D E 10 B C E F minsup = 20%  L1 = { A, B, C, D, E, F}

  13. An Example (2) C2 = {AB, AC, AD, AE, AF, BC, BD, BE, BF, CD, CE, CF, DE, DF, EF} After counting C2 = {AB(6), AC(7), AD(2), AE(4), AF(2), BC(8), BD(2), BE(6), BF(4), CD(3), CE(6), CF(3), DE(2), DF(1), EF(4)} L2 = {AB, AC, AD, AE, AF, BC, BD, BE, BF, CD, CE, CF, DE, EF}

  14. An Example (3) C3 = {ABC, ABD, ABE, ABF, ACD, ACE, ACF, ADE, ADF, AEF, BCD, BCE, BCF, BDE, BDF, BEF, CDE, CDF, CEF} After pruning C3 = {ABC, ABD, ABE, ABF, ACD, ACE, ACF, ADE, AEF, BCD, BCE, BCF, BDE, BEF, CDE, CEF} After counting C3 = {ABC(6), ABD(1), ABE(3), ABF(2), ACD(2), ACE(4), ACF(2), ADE(1), AEF(2), BCD(2), BCE(2), BCF(3), BDE(1), BEF(4), CDE(2), CEF(3)} L3 = {ABC, ABE, ABF, ACD, ACE, ACF, AEF, BCD, BCE, BCF, BEF, CDE, CEF}

  15. An Example (4) C4 = {ABCE, ABCF, ABEF, ACDE, ACDF, BCDE, BCDF, BCEF} After pruning C4 = {ABCE, ABCF, ABEF, ACEF, BCEF} After counting C4 = {ABCE(3), ABCF(2), ABEF(2), ACEF(2), BCEF(3),} L4 = {ABCE, ABCF, ABEF, ACEF, BCEF}

  16. An Example (5) C5 = {ABCEF} After counting C5 = {ABCEF(2)} L5 = {ABCEF}

  17. Assignment 1 • Work: • ให้เขียนโปรแกรมที่สอดคล้องกับ An algorithm Apriori เพื่อ generate Frequent itemsets ในแต่ละ Level ของ Itemsets lattice • Data sets : • สามารถ download จากเครื่อง “angsila/~nuansri/310214” • run ด้วยค่า minimum support ต่างๆดังนี้ • xt10.data ==> minsup = 20%, 15%, และ 10% • tr2000.data ==> minsup= 10%, 8% และ 5%

  18. Assignment 1 (2) • Due : • วันจันทร์ ที่ 15ก.ย. 2546 • สาธิตโปรแกรมและเอกสารประกอบโปรแกรม ณ ห้อง SD417 • Note: • Frequent itemsets ในทุก Level ของ Itemsets lattice จะต้องเหมือนกัน ไม่ว่า run โดยคนละโปรแกรม หรือโปรแกรมใช้โครงสร้างข้อมูลที่ต่างกัน ใน data sets ชุดเดียวกัน • ดังนั้นนิสิตทุกคน สามารถตรวจความถูกต้องของ จำนวนและค่าของ frequent itemsets ใน data sets ชุดเดียวกัน กับเพื่อนร่วมชั้นเรียน

More Related