1 / 27

Association Rule Mining

(Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi) ‏. Association Rule Mining. An Example. Terminology. Transaction. Item. Itemset. Association Rules. Let U be a set of items and let X , Y  U , with X  Y = 

ziv
Download Presentation

Association Rule Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏ Association Rule Mining

  2. An Example

  3. Terminology Transaction Item Itemset

  4. Association Rules Let U be a set of items and let X, YU, with XY =  An association rule is an expression of the form XY, whose meaning is: If the elements of X occur in some context, then so do the elements of Y

  5. Quality Measures Let T be set of all transactions. The following statistical quantities are relevant to association rule mining: support(X)‏ |{t T: X t}| / |T| support(X Y) |{t T: XY t}| / |T| confidence(XY) |{t T: XY t}| / |{t T: X t}| The percentage of all transactions, containing item set x The percentage of all transactions, containing both item sets x and y The percentage of transactions containing item set x, that also contain item set y. How good is item set x at predicting item set y.

  6. Learning Associations The purpose of association rule learning is to find “interesting” rules, i.e., rules that meet the following two user-defined conditions: support(XY) MinSupport confidence(XY) MinConfidence

  7. Itemsets Frequent itemset An itemset whose support is greater than MinSupport (denoted Lk where k is the size of the itemset)‏ Candidate itemset A potentially frequent itemset (denoted Ck where k is the size of the itemset)‏ High percentage of transactions contain the full item set.

  8. Basic Idea Generate all frequent itemsets satisfying the condition on minimum support Build all possible rules from these itemsets and check them against the condition on minimum confidence All the rules above the minimum confidence threshold are returned for further evaluation

  9. AprioriAll (I)‏ • L1 • For each item IjI • count({Ij}) = | {Ti : IjTi} | • If count({Ij}) MinSupport x m • L1L1 {({Ij}, count({Ij})} • k 2 • While Lk-1 • Lk • For each (l1, count(l1)) Lk-1 • For each (l2, count(l2)) Lk-1 • If (l1 = {j1, …, jk-2, x} l2 = {j1, …, jk-2, y} xy)‏ • l {j1, …, jk-2, x, y} • count(l)  | {Ti : lTi } | • If count(l) MinSupport x m LkLk {(l, count(l))} • kk + 1 • Return L1L2… Lk-1 The number of all transactions, containing item I_j If this count is big enough, we add the item and count to a stack, L_1

  10. Rule Generation • Look at set {a,d,e} • Has six candidate association rules: • {a}{d,e} confidence: {a,d,e} / {a} = 0.571 • {d,e}{a} confidence: {a,d,e} / {d,e} = 1.000 • {d}{a,e} confidence: {a,d,e} / {d} = 0.667 • {a,e}{d} confidence: {a,d,e} / {a,e} = 0.667 • {e}{a,d} confidence: {a,d,e} / {e} = 0.571 • {a,d}{e} confidence: {a,d,e} / {a,d} = 0.800

  11. Confidence-Based Pruning

  12. Rule Generation • Look at set {a,d,e}. Let MinConfidence == 0.800 • Has six candidate association rules: • {d,e}{a} confidence: {a,d,e} / {d,e} = 1.000 • {a,e}{d} confidence: {a,d,e} / {a,e} = 0.667 • {a,d}{e} confidence: {a,d,e} / {a,d} = 0.800 • {d}{a,e} confidence: {a,d,e} / {d} = 0.667 • Selected Rules: • {d,e}a and {a,d}e

  13. Summary Apriori is a rather simple algorithm that discovers useful and interesting patterns It is widely used It has been extended to create collaborative filtering algorithms to provide recommendations

  14. References Fast Algorithms for Mining Association Rules (1994) Rakesh Agrawal, Ramakrishnan Srikant. Proc. 20th Int. Conf. Very Large Data Bases, VLDB (PDF)‏ Mining Association Rules between Sets of Items in Large Databases (1993) Rakesh Agrawal, Tomasz Imielinski, Arun Swami. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data Introduction to Data Mining P-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Pearson Education Inc., 2006, Chapter 6

More Related