Maximizing Sales through Market Basket Analysis

Market Basket Analysisand Association Rules Chapter 14

Customers Tend to Buy Things Together…

Market Basket Analysis (MBA)? • Relationships through associations • Examples of association rules: • {Bread}  {Milk} • {Diaper}  {?} • {Milk, Bread}  {?} • In summary, we want to know, “what product, in a shopping basket, likely goes with what other product”

Similar Ideas Apply to Many Industries (not just about shopping baskets) • Retailer’s point-of-sale (POS) data • Credit card data (possibly cross-merchant purchases) • Services ordered by telecom customers • Banking services ordered • Record of insurance claims (for fraud detection) • Medical record • … • Generalizing MBA to Association rules (also called Affinity Analysis): we want to know “what goes with what”

Why Do We Care? • Product placement • Whole Foods: next to flowers are birthday cards • Wal-Mart customers who purchase Barbie dolls have a 60% likelihood of also purchasing one of three types of candy bars [Forbes, Sept 8, 1997] • Recommendations • Amazon.com: as you are looking at HDTVs, you might also want HDMI cables • Bundling • E.g., travel “packages” – flight, hotel, car • Other Applications • Price discrimination • Website / catalog design • Fraud detection (multiple suspicious insurance claims) • Medical complications (based on combinations of treatments)

Example: Recommendations in Amazon.com

Association Rules – A Definition • Given a transactional database (set of transactions), • find rules that • predict the occurrence of an item • based on the occurrences of other items in the database. • Implication means co-occurrence, not causality

Rule Format • IF {set of items}  THEN {set of items} • Example: If {diapers}  then {beer} • “IF” part: Antecedent • “THEN” part: Consequent • “Itemset” = the set of items (e.g., products) comprising the antecedent and consequent • Antecedent and consequent are disjoint (i.e., have no items in common)

Many Rules are Possible Consider example to the right: Transaction 2 supports several rules, such as • “If bread, then diaper” • “If beer, then diaper” • “If bread and beer, then eggs” • + many more … • Given n items, the total number of itemsets that can be generated is 2n – n – 1!!!!

Frequent “Itemsets” • Ideally, we want to create all possible combinations of items • Problem: computation time grows exponentially as # of items increases • Solution: consider only “frequent itemsets” • Criterion for “frequent”: support

Support (measures the relevance of a rule) • Support for {bread}  {diapers} is 2/4 • In other words, 50% of transactions include this pair of items • Support quantifies the significance of the co-occurrence of the items involved in a rule • In practice, we only care about itemsets with strong enough support

Exercise on Support: Phone Faceplates • What is the support of {white}? • What is the support of {red, white}? • If we only care about itemsets with minimum support 50%, do we care about {white}? Do we care about {red, white}?

Confidence (measures the strength of a rule) • Confidence for {bread}  {diapers} is 2/3 • In other words: conditional on the fact that a basket contains bread, the probability that the same basket also contains diapers is 2/3

Exercise on Confidence • Confidence for {diapers} {bread} is? • Confidence for {Milk,Eggs} {Shoes} is? • Confidence for {red, white}  {green} is?

Valid Association Rules • A valid rule is one that meets both a minimum support as well as a minimum confidence thresholds. • Both thresholds determined by decision maker. • Why need both thresholds? • A strong (has high confidence level) rule is not necessarily relevant (has high support level) • Example of a rule with high confidence & low support: • A cell phone company database contains all call destinations for each account • {Germany}  {France, Belgium} with confidence = 100% • Support is 1 of 100K accounts.

Valid Association Rules • Suppose we use: Minimum support: 50% • Minimum confidence: 50% • Check Support: • Check Rules – only two survive: • Bread  Diapers (Support = 50%, Confidence = 66.67%) • Diapers  Bread (Support = 50%, Confidence = 100%)

Is a “Valid Rule” Always a “Good Rule”? • Consider: • Tea  Coffee • Confidence = #(Coffee and Tea)/#(Tea) = 15/20 = 75% • i.e., the probability that someone who has bought tea will also buy coffee, is 75%. • Seems good?

Caveat About Confidence • Tea  Coffee • Recall that Confidence = #(CoffeeTea)/#(Tea) = 15/20 = 75% • But, P(Coffee) = #(Coffee) /100 = 90/100 = 90% • i.e., the probability that someone would have bought coffee is 90% • So, given that tea has been bought, the probability of buying coffee has dropped. • Although confidence is high, rule is misleading! • In fact, the confidence of “NOT Tea  Coffee” is 75/80 = 93.75%

Statistical (In)Dependence • Population of 1000 students • 600 students know how to swim (S) • 700 students know how to bike (B) • 300 students know how to swim and bike (S,B) • P(S and B) = 300/1000 = 0.30 • P(S)  P(B) = 0.6  0.7 = 0.42 • P(S and B) = P(S)  P(B) Statistical independence • P(S and B) > P(S)  P(B) Positively correlated • P(S and B) < P(S)  P(B) Negatively correlated • P(Coffee and Tea) = 15/100 = 0.15 • P(Coffee)  P(Tea) = 0.9  0.2 = 0.18 > 0.15 Negatively correlated

Another Performance Measure: Lift • The lift of a rule measures how much more likely the consequent is, given the antecedent • Tea  Coffee • Confidence is 75% • Support of Coffee is 90% • Lift = 0.75/0.9 = 0.833 < 1  this rule is worse than not having any rule.

More on Lift • Another example: {diapers}  {beer} • # of customers in database: 1000 • # of customers buying diapers: 200 • # of customers buying beer: 50 • # of customers buying diapers & beer: 20 • Confidence: 20/200 = 0.1 (or 10%) • Support of consequent: 50/1000 = 0.05 (or 5%) • Lift: 0.1/0.05 = 2 > 1 • i.e, diapers and beer are positively correlated – WHY?

Exercise on Lift • Lift for • {red, white} {green} is ?

More About Performance Measures • Are support, confidence, and lift together enough? • Example: • {maternity ward}  {patient is woman} • Confidence 100%, lift >> 1, but obvious and uninteresting • How to screening for rules that are of particular interest and significance? • Use domain specific conditions to filter generated rules. • Some thoughts: • “Actionability”: Keep only rules that can be acted upon. • “Interestingness”: Various measures for how unexpected a rule is. • E.g. Rule is interesting if it contradicts what is currently known.

Other Evaluation Criteria (Optional) • Many measures proposed in the literature • Some measures good for some applications, but not for others • What criteria should determine whether a measure is good or bad? • Piatetsky-Shapiro suggests 3 properties of a good measure M: • M(A,B) = 0 if A and B are statistically independent • M(A,B)increases monotonically with P(A,B) when P(A) and P(B) remain unchanged • M(A,B)decreases monotonically with P(A) [or P(B)] when P(A,B) and P(B) [or P(A)] remainunchanged • Support and lift are symmetric measures, i.e., M(A, B) = M(B, A) • Confidence is an asymmetric measure, i.e., M(A, B)  M(B, A) Piatetsky-Shapiro, G. and Frawley, W. J., eds (1991), Knowledge Discovery in Databases, AAAI, MIT Press

Generating Association Rules • Standard approach: Apriori • Developed by Agrawal et al (1994) • Problem was defined as: • Generate all association rules that have • support greater than the user-specified support threshold min_sup (minimum support) , and • confidence greater than the user-specified confidence threshold min_conf (minimum confidence) • The algorithm performs a (relatively) efficient search over the data to find all such rules.

Generating Association Rules (Cont.) • Problem is decomposed into two sub-problems: • Find all sets of items (itemsets) with support above min_sup • Itemsets with support ≥ min_sup are called frequent itemsets. • From each frequent itemset, generate rules that use items from that frequent itemset. • Given a frequent itemset Y, and X, a subset of Y • Take the support of Y and divide it by the support of X • Estimates confidence c of the rule X  (Y \ X) • If c ≥ min_conf, then X  (Y \ X) is a valid association rule

Phase 1: Finding Frequent Itemsets • Subsets of frequent itemsets must also be frequent • If a frequent itemset has size n, all subsets of size (n-1) are also frequent • If {diaper, beer} is frequent, then {diaper} and {beer} are also frequent • Therefore, if an itemset is not frequent then no itemset that includes it can be frequent. • if {wine} is not frequent then {wine, beer} cannot be frequent. • We start by finding all itemsets of size 1 that are frequent. • We then try to “expand” these by counting the frequency of only those itemsets of size 2 that include frequent itemsets of size 1. • Next, we take itemsets of size 2 that are frequent, and try expanding them to itemsets of size 3 • We continue this process until we further expansion is not possible.

Exercise on Phase 1 • Requirement: • Minimum support: 40% • Minimum confidence: 80% • Find all 1-item itemsets that meet the minimum support • What are the 2-item itemsets that you need to investigate? • Find all 2-item itemsets that meet the minimum support • Do you need to investigate any 3-item itemsets?

Phase 2: Finding Association Rules • For each frequent itemset, find all possible rules of the form • Antecedent Consequent • using items contained in the itemset • Only keep the rules that meet min_conf (minimum confidence). • Example: • Suppose {Milk, Bread, Butter} is a frequent itemset. • Does {Milk}  {Bread, Butter} have the minimum confidence? • Similarly, {Bread}  {Milk, Butter}, {Butter} {Milk, Bread}, {Bread, Butter}  {Milk}, {Milk, Butter}  {Bread}, {Milk, Bread}  {Butter} • The confidence of the rule {Milk}  {Bread, Butter} is calculated as

Exercise on Phase 2 • Requirement: • Minimum support: 40% • Minimum confidence: 80% • Recall that we already find the following frequent Itemsets: • {Beer} (sup = 0.8), {Diapers} (sup = 0.6), {Chocolates} (sup = 0.4) and {Beer, Diapers} (support = 0.6) • Do we need to consider any 1-item frequent itemsets? • For each multi-item itemset, list all possible association rules and calculate confidence. Then identify all valid association rules.

On the Symmetry/Asymmetry of Metrics Which of these statement(s) is(are) correct ? (1) Support of R1 > Support of R2 (2) Confidence of R1 > Confidence of R2 (3) Lift of R1 > Lift of R2 • Consider these two rules: • R1: {bread}  {diapers} • R2: {diapers}  {bread}

“Confidence” Is Not Symmetric • If A B meets the minimum confidence threshold, B  A does NOT necessarily meet it! • Example: • Support of {Yogurt} is 0.2 (20%) • Support of {Yogurt, Bread, Butter } is 0.1 (10%) • Support of {Bread, Butter} is 0.5 (50%) • Confidence of {Yogurt}  {Bread, Butter} is 0.1/0.2 = 0.5 (50%) • Confidence of {Bread, Butter}  {Yogurt} is 0.1/0.5 = 0.2 (20%)

Think Wildly -- Applications of Association Rules (Revisiting “Why Do We Care?”) • Product placement • Should a store put associated items together? • Recommendations • What if there are competing products to recommend? • Fraud detection • Finding in insurance data that a certain doctor often works with a certain lawyer may indicate potential fraudulent activity • Is it useful for website / catalog design? • Is dissociation important? • If A and NOT B  C • “Database” and Not “Systems Analysis”  “Business Intelligence”

uniform support reduced support Level 1 min_sup = 5% Level 1 min_sup = 5% Milk [support = 10%] Level 2 min_sup = 5% 2% Milk [support = 6%] Skim Milk [support = 4%] Level 2 min_sup = 3% Variants • Multiple-level Association Rules • Items often form hierarchy • Flexible support settings • Items at the lower level are expected to have lower support.

Variants • Analyzing sequential patterns • Given a set of sequences and a support threshold, find the complete set of frequent subsequences • Transaction databases vs. sequence databases • Example: customer shopping sequences: • More examples: • Medical treatment, natural disasters (e.g., earthquakes), telephone calling patterns, Weblog click streams, …

Variants • Continuous attributes, or categorical attributes • Spatial and Multi-Media Association • Constraint-based Data Mining • Knowledge type constraint: • Classification, association, etc. • Data constraint — using SQL-like queries • Find product pairs sold together in stores in Richardson in May 2010. • Dimension/level constraint • Region, price, brand, customer category • Rule (or pattern) constraint • Small sales (price < $10) triggers big sales (sum > $200)

Chapter 14 • Read Chapter 14 for details on this topic • Only Section 14.1

Maximizing Sales through Market Basket Analysis