Association Rules (market basket analysis)

1 / 30

# Association Rules (market basket analysis) - PowerPoint PPT Presentation

Retail shops are often interested in associations between different items that people buy. Someone who buys bread is quite likely also to buy milk A person who bought the book Database System Concepts is quite likely also to buy the book Operating System Concepts .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Association Rules (market basket analysis)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
1. Retail shops are often interested in associations between different items that people buy. Someone who buys bread is quite likely also to buy milk A person who bought the book Database System Concepts is quite likely also to buy the book Operating System Concepts. Associations information can be used in several ways. E.g. when a customer buys a particular book, an online shop may suggest associated books. Association rules: bread  milk DB-Concepts, OS-Concepts  Networks Left hand side: antecedent, right hand side: consequent An association rule must have an associated population; the population consists of a set of instances E.g. each transaction (sale) at a shop is an instance, and the set of all transactions is the population Association Rules(market basket analysis)

2. Set of items: I={I1,I2,…,Im} Transactions: D={t1,t2, …, tn}, tj I Itemset: {Ii1,Ii2, …, Iik}  I Support of an itemset: Percentage of transactions which contain that itemset. Large (Frequent) itemset: Itemset whose number of occurrences is above a threshold. Association Rule Definitions

3. Association Rules Example I = { Beer, Bread, Jelly, Milk, PeanutButter}

4. Association Rule (AR): implication X  Y where X,Y  I and X  Y = the null set; Support of AR (s) X Y: Percentage of transactions that contain X Y Confidence of AR (a) X  Y: Ratio of number of transactions that contain X  Y to the number that contain X Association Rule Definitions

5. Association Rules Ex (cont’d)

6. Association Rules Ex (cont’d) Of 5 transactions, 3 involve both Bread and PeanutButter, 3/5 = 60% Of the 4 transactions that involve Bread, 3 of them also involve PeanutButter 3/4 = 75%

7. Given a set of items I={I1,I2,…,Im} and a database of transactions D={t1,t2, …, tn} where ti={Ii1,Ii2, …, Iik} and Iij I, the Association Rule Problem is to identify all association rules X  Y with a minimum support and confidence (supplied by user). NOTE: Support of X  Y is same as support of X  Y. Association Rule Problem

8. Find Large Itemsets. Generate rules from frequent itemsets. Association Rule Algorithm (Basic Idea) This is the simple naïve algorithm, better algorithms exist.

9. Association Rule Algorithm • We are generally only interested in association rules with reasonably high support (e.g. support of 2% or greater) • Naïve algorithm • Consider all possible sets of relevant items. • For each set find its support (i.e. count how many transactions purchase all items in the set). • Large itemsets: sets with sufficiently high support • Use large itemsets to generate association rules. • From itemset A generate the rule A - {b} b for each b  A. • Support of rule = support (A). • Confidence of rule = support (A ) / support (A - {b})

10. From itemset A generate the rule A - {b} b for each b  A. • Support of rule = support (A). • Confidence of rule = support (A ) / support (A - {b}) Lets say itemset A = {Bread, Butter, Milk} Then A - {b} b for each b  Aincludes 3 possibilities {Bread, Butter}  Milk {Bread, Milk}  Butter {Butter, Milk}  Bread

11. Large Itemset Property: Any subset of a large itemset is large. Contrapositive: If an itemset is not large, none of its supersets are large. Apriori

12. Large Itemset Property

13. Large Itemset Property If B is not frequent, then none of the supersets of B can be frequent. If {ACD} is frequent, then all subsets of {ACD} ({AC}, {AD}, {CD}) must be frequent. If {ACD} is frequent, then all subsets of ({A}, {A}, {C}) must be frequent.

14. My Personal View of Association Rules Vastly over studied problem, of dubious utility

15. Student Presentations Starting next week students will be giving presentations Presentation can be on The student project A paper chosen by the student (per my approval) The presentation should last 8 to15 minutes. You need to tell me in advance how long the talk will be. You must email me the slides by midnight, before the talk There will be a signup sheet (topic and date) on my door tomorrow.

16. Tips for Giving a Good Talk Winter 2003 Dr Eamonn Keogh Computer Science & Engineering DepartmentUniversity of California - RiversideRiverside,CA 92521eamonn@cs.ucr.edu Modified from the notes of Edward R. Tufte, Craig S. Kaplan, Eamonn Keogh and others

17. Outline Advice on giving talks • General advice • Organization • Making clear overheads • Avoiding common pitfalls Conclusion

18. General Advice I • Show up early. You may have a chance to head off some technical or ergonomic problem. • Have a backup plan. If your lecture is based on a PowerPoint presentation, have overhead backups of each page. • Check out the room ahead of time. Before your talk, check out the room, and make sure it has everything you need.

19. General Advice II • Never apologize. Most people wouldn’t have noticed the issues for which you’re apologizing—and it just sounds lame. • Invest in a laser pointer. They are inexpensive, and are extremely useful. • Rehearse timing. This is the most common sin!!!

20. Overheads I • Use large fonts. Use the biggest fonts realistically possible. Small fonts are hard to read • Use highlycontrastingcolors. • Avoid busy backgrounds. Too much in the background makes the text hard to read

21. Overheads II • Avoid using red text. Red text is often hard to read. • AVOID ALL CAPS! All caps look like you're shouting. …Include a good combination of words, pictures, and graphics. A variety keeps the presentation interesting

22. Overheads III • Be Terse • The sales forecasts show an increase on the horizon. • Sales are up. • Use bullets or numbered items appropriately Outline of our method • Design • Implementation • Testing Goals • Ease of use • Reusability • Reliability

23. Overheads IIII • Begin with an introduction slide (Who you are, why you are giving a talk, the title of the talk) • Next, give an outline (“roadmap”). For a short talk, you might want to combine this with the above • State your point (one simple slide) • Demonstrate your point (a few slides) • Review your point (one simple slide)

24. Overheads V • End with a slide that reviews the entire talk… • We introduced the TSP problem • We explained why it is an important problem • We explained why it is a hard problem • We introduced a new heuristic to solve TSP • We empirically demonstrated the utility of our approach • End “cleanly”, don’t fade away.

25. Overheads VI • Avoid using “standard” clipart/ background etc I have seen this at least 20 times in conference presentations.

26. Overheads VII • Be careful with Acronyms… Rangei, Diameteri C_max R1, D1 R2, D2 C_min Neighboring Unlabeled Token: sskh f dhfa

27. Annoying Personal Habits I(This means you) • Playing with jewelry • Licking and/or biting your lips • Constantly adjusting your glasses • Popping the top of a pen • Playing with facial hair (men) • Playing with/twirling your hair (women)

28. Annoying Personal Habits II(This means you) • Jingling change in your pocket • Leaning against anything for support • Fillers: “ah”, “um”, and “and” • Starting every sentence with the same word • Sticky floor syndrome • Avoiding eye contact • Lack of enthusiasm “Basically” and “essentially” seem to be the current favorites.

29. Conclusion • We have motivated the need for a high quality talk • We have seen various tips on creating high quality overheads • We have seen various hints on avoiding common pitfalls

30. Questions? Dr Eamonn Keogh Computer Science & Engineering DepartmentUniversity of California - RiversideRiverside,CA 92521eamonn@cs.ucr.edu