1 / 25

Mining Generalized Association Rules

Mining Generalized Association Rules. R. Srikant & R. Agrawal (IBM) Presentation by: Colin Cherry. Objectives. What are generalized association rules? Why do we care? How can we get them efficiently? How can we reduce rule redundancy? Is the efficient method any good?. Motivation.

glora
Download Presentation

Mining Generalized Association Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Generalized Association Rules R. Srikant & R. Agrawal (IBM) Presentation by: Colin Cherry

  2. Objectives • What are generalized association rules? • Why do we care? • How can we get them efficiently? • How can we reduce rule redundancy? • Is the efficient method any good?

  3. Motivation • Association rules find rules of the form: • XY, where X and Y are sets of items • What if there is structure over your items? • Structure can be used to generalize

  4. Hierarchy Example Beverage Soft Drink … Cola … Pepsi Coke … …

  5. Hierarchy Example On Sale Not On Sale … … • Goal of this paper: • Given hierarchies over items: • Capture interesting rules at all levels of multiple hierarchies

  6. Simple Fix • Just add parents to each transaction. • {Coke, 7-up, ranch Doritos, bananas} would become: {Coke, 7-up, ranch Doritos, bananas, Doritos, cola, clear pop, soft drink, chips, junk food, fruit, produce}

  7. Fix Cont’d • Run Apriori on expanded database • Redefine association rules: Make sure: XY XY={} Y contains no ancestors of any item in X

  8. Problems with the fix • Counting may slow down • Total number of items & average transaction size will grow • Could get a lot of redundant rules • Milk  Cereal (70%) • Skim Milk  Cereal (70%) Do we care?

  9. An Efficient Algorithm • “Cumulate” • Filtering ancestors added to transactions • Hierarchy-aware itemset pruning • For more complicated, speculative algorithms, see paper

  10. Filtering Ancestors • Not counting soft drink? Don’t add it. • Only add ancestors that are in at least one of the candidate itemsets • Delete any items we are not counting • Not counting Doritos? Replace with chips • Each iteration: • Pre-compute the ancestors for each item

  11. Itemset Pruning • No sense counting both {coke,cola,chips} and {coke,chips}, they’ll always be the same • Take out {coke,cola} during count size=2 and you’ll never have to deal with it

  12. Reducing Redundancy Milk  Cereal (8% sup, 70% conf) Skim Milk  Cereal (2% sup, 70% conf) • If Skim Milk accounts for 1/4 of Milk sales, then the 2nd rule is redundant • Expected support and confidence (wrt hierarchy) will define interesting

  13. Close Ancestors • An itemset Z’ is an ancestor of Z if: • Z’ = Z with some items replaced by ancestors • Z’ has the same number of items as Z • Z’ is a close ancestor of Z if: • No ancestor of Z has Z’ as an ancestor Take {coke,bananas} as Z Z’={cola, bananas} is a close ancestor Z’={soft drink, bananas} is not close Z’={cola,fruit} is not close

  14. Interestingness • A rule XY is interesting if for all interesting, close ancestors X’Y’: Sup({X,Y}) > R*ExpSup({X,Y}|{X’,Y’}) or: Conf(XY) > R*ExpConf(XY|X’Y’) • R is defined by the user

  15. Putting it all together • #1 is interesting - has no ancestor • #2 is interesting - twice expected support • #3 is not interesting • Has exactly expected support according to closest ancestor (#2)

  16. Experiments • Lots of experiments on artificial data in paper. • We’ll look at the results of using Cumulate on real data • Compare to the quick fix - just adding in ancestors to transactions

  17. Supermarket

  18. Department Store

  19. Interestingness Results • Hierarchical Interestingness pruning: • R = 25% resulted in pruning roughly 40% of the rules • R = 50% resulted in pruning roughly 50% of the reuslts • Pruning had a significant impact!

  20. Objectives Revisited • What are generalized association rules? • Rules aware of hierarchies over items • Why do we care? • Support can be low for individual items • How can we get them efficiently? • Cumulate algorithm - hierarchy aware counting • How can we reduce rule redundancy? • Check surprise with respect to ancestors • Is the efficient method any good? • Yeap!

  21. Questions? ?

  22. Hierarchy Example Impulse Fridge … Beverage … Cans Bottles … …

  23. Pros • Rules over items low in the tree may not have minimum support • Can raise min support • Shoot for fewer, more general rules • BUT: You can catch rules at any level of the hierarchy

  24. Data Sets • Supermarket: • 500,000 items • 1.5 million transactions • Hierarchy has 4 levels, 118 roots • Department Store: • 200,000 items • 500,000 transactions • Hierarchy has 7 levels, 89 roots

  25. Summary • Nothing ground-breaking in this paper • But, it provides a solid, efficient method for working with hierarchies • Generalization is a powerful tool to have available in association rules

More Related