Market Basket Analysis and Advanced Data Mining

1. Market Basket Analysis and Advanced Data Mining Professor Amit Basu abasu@smu.edu

3. Examples Rule form: LHS � RHS IF a customer buys diapers, THEN they also buy beer diapers � beer �Transactions that purchase bread and butter also purchase milk� bread ? butter ? milk Customers who purchase maintenance agreements are very likely to purchase large appliances When a new hardware store opens, one of the most commonly sold items is toilet bowl cleaners

4. Representations What�s the difference between these patterns? (a) Risk = 0.3 * sin(numcards * dem10.25) + 0.83 * (pastdef - dem2) * cos(employed+dem1)2 (b) Risk = 0.93 * priordefault + 0.23 * num_cards � 1.3 * employed � 0.734 (c) IF person has a good credit rating THEN they have fewer accidents

5. Evaluation Support : measure of how often the collection of items in an association occur together as a percentage of all the transactions In 2% of the purchases at hardware store, both pick and shovel were bought support = #tuples(LHS, RHS)/N Confidence : confidence of rule �B given A� is a measure of how much more likely it is that B occurs when A has occurred 100% meaning that B always occurs if A has occurred confidence = #tuples(LHS, RHS) / #tuples(LHS) Example: bread and butter ? milk [90%, 1%] Rules originating from the same itemset have identical support but can have different confidence

6. The association rules mining problem Generate all association rules from the given dataset that have support greater than a specified minimum and confidence greater than a specified minimum

7. Examples Rule form: LHS � RHS [confidence, support] diapers � beer [60%, 0.5%] �90% of transactions that purchase bread and butter also purchase milk� bread and butter ? milk [90%, 1%]

8. Example

9. How Good is an Association Rule? Is support and confidence enough? Lift (improvement) tells us how much better a rule is at predicting the result than just assuming the result in the first place Lift = P(LHS^RHS) / (P(LHS).P(RHS) When lift > 1 then the rule is better at predicting the result than guessing When lift < 1, the rule is doing worse than informed guessing and using the Negative Rule produces a better rule than guessing

10. Computational Complexity Given d unique items: Total number of itemsets = 2d Total number of possible association rules:

11. The Problem of Lots of Data Fast Food Restaurant�could have 100 items on its menu How many combinations are there with 3 different menu items? 161,700 ! Supermarket�10,000 or more unique items 50 million 2-item combinations 100 billion 3-item combinations Use of product hierarchies (groupings) helps address this common issue Also, the number of transactions in a given time-period could also be huge (hence expensive to analyze)

12. Preparing Data for MBA Determining scope of dataset (one or many stores, what period, etc) Converting transaction data to itemsets Generalizing items to appropriate level Depends on objective of model Rolling up rare items to get adequate support

13. Search Approach Two sub-problems in discovering all association rules: Find all sets of items (itemsets) that have transaction support above minimum support Itemsets that qualify are called large itemsets, and all others small itemsets. Generate from each large itemset, rules that use items from the large itemset. Given a large itemset Y, and X is a subset of Y Take the support of Y and divide it by the support of X If the ratio c is at least minconf, then X ? (Y - X) is satisfied with confidence factor c

14. Reducing Number of Candidates Apriori principle: If an itemset is large, then all of its subsets must also be large Support of an itemset never exceeds the support of its subsets

15. The Apriori Algorithm Progressively identifies large itemsets of different sizes Exploits the property that any subset of a large itemset is also a large itemset Also, any superset of a small itemset is also small

16. Extending MBA Dissociation rules Combining transaction data with complementary data Shopper characteristics Store characteristics Seasonal factors Analyzing patterns over time Patterns that span multiple occasions Need to �sessionize� data Need to recognize shoppers across sessions

17. Usability of Association Rules

18. Advanced Data Mining Text Mining Mining non-textual data Image and video data (Multimedia) Spatial data GIS Temporal data Time series Behavioral patterns Web Mining Web usage Web content

19. Mining Image Data Traditional pattern recognition Neural networks Supervised learning Discovering patterns Unsupervised learning Clustering

20. Mining Spatial Data Spatial databases typically use special data structures Extensions of tree-structured indexes Quad trees, R-trees, k-D trees, etc. Relationships based on spatial descriptors Overlapping, disjoint, contains, etc. Distance-based clustering Feature extraction Association rules If location is near lake, pollution is low

21. Web Mining Mining data that is obtained from the Web Web Content mining Web Usage mining

22. Web Content Mining Search engines Spiders and Crawlers Metacrawlers A major challenge is the unstructured form of the data Lack of high-level standards Abuse of descriptors (meta-information)

23. Web Usage Mining Mining Web logs Data is relatively structured Data is highly dynamic Problems with identification and location The inherently non-linear aspects of Web usage behavior Tracking both forward and backward links Dynamic personalization

24. Issues and Trends Mining across multiple data sources and sets Online mining � what are the patterns right now? Concerns about privacy and other ethical questions Property Accuracy

Market Basket Analysis and Advanced Data Mining

Market Basket Analysis and Advanced Data Mining

Presentation Transcript

Data Mining Cluster Analysis: Advanced Concepts and Algorithms

Advanced Topics in Data Mining: Web Mining

CS 349: Market Basket Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms

Privacy Preserving Market Basket Data Analysis

MARKET BASKET ANALYSIS

Data Mining Cluster Analysis: Advanced Concepts and Algorithms

Introduction to Data Analysis and Mining

Financial Data Mining and Analysis

Advanced Topics in Data Mining

Physics Analysis with Advanced Data Mining Techniques

statistical analysis and data mining

R for Data Analysis and Data Mining

Knowledge discovery & data mining Association rules and market basket analysis --introduction

Market Basket Analysis

Association Rules (market basket analysis)

Market Basket Analysis

Chapter 18: Data Analysis and Mining

Data Analysis and Mining

Advanced data mining with TagHelper and Weka

Market Basket Analysis and Advanced Data Mining