1 / 40

Probabilistic Methods for Targeted Advertising

Probabilistic Methods for Targeted Advertising. Max Chickering Microsoft Research. Outline. Targeted Mailing To whom should you send a solicitation? Targeted Advertising on the Web How should you display banner ads to maximize click-through?. Targeted Mailing.

Mercy
Download Presentation

Probabilistic Methods for Targeted Advertising

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Methods forTargeted Advertising Max Chickering Microsoft Research

  2. Outline • Targeted Mailing To whom should you send a solicitation? • Targeted Advertising on the Web How should you display banner ads to maximize click-through?

  3. Targeted Mailing • Given a population of potential customers. Person X1 X2 … Xn 1 0 0 … red 2 0 3.4 … blue . . . . . . . . . . . . m 1 7 … green • Sending an advertisement costs money: • - Postage • - Possible Discount Which potential customers do you solicit?

  4. Motivating Application • Advertisement: • MSN subscription • Potential customers: • People who registered Windows 95 • Known variables: • from questionnaire (e.g. gender, RAM size)

  5. Naïve Solutions • Mail to those customers most likely to subscribe to MSN • Can waste money by targeting customers who would • subscribe anyway • Mail to everyone • Even worse!

  6. Response Behaviors Will the potential customer buy the product? Mail Don’t Mail Always buyer Yes Yes Persuadable Yes No Anti-persuadable No Yes Never buyer No No We only make money from mailing to the persuadable potential customers

  7. Expected Profit for a Population Population of N potential cutomers Nalw, Nper, Nanti, Nnev Cost of mailing c Solicited and unsolicited revenue r Expected Profit from mailing Profit from not mailing

  8. Lift in Profit From Mailing Profit from mailing - Profit from not mailing For any set of potential customers, we should only mail if the lift is positive.

  9. Learning Expected Lift S{s0, s1} (did not subscribe, did subscribe) M{m0, m1} (did not mail, did mail) Identifiable if S, M known in training data Lift : -c + [ p(S=s1|M=m1) – p(S=s1|M=m0) ] r

  10. Controlled Experiment:Identify Profitable Sub-Populations • Choose a small sample of the potential customers • Randomly divide those customers into a “treatment group” • (M = m1) and a “control group” (M = m0) • Wait a specified period of time, and record S= s0 or • S= s1 for each

  11. Person X1 X2 … Xn M S 1 0 0 … red m1s0 2 0 3.4 … blue m0s1 . . . . . . . . . . . . m 1 7 … green m1s1 Controlled Experiment Use machine-learning techniques to identify sub-populations with high positive lift, and then target those customers Lift ( Sub-population corresponding to Xn=blue ) = -c + [ p(S=s1|M=m1 , Xn=blue) – p(S=s1|M=m0 , Xn=blue) ] r

  12. Identify Profitable Sub-Populations Known distinctions in our data : X = {X1, …, Xn}, S, M Partitions of X define sub-populations and statistical model for p(S|M,X) defines the lift Lift 1 Lift 2 Lift 3 X1 < 10, X12 = false X1 > 10, X4 2 X1 > 10, X4 = 2 Lift 4 X1 < 10, X12 = true Approach: Use Decision Trees

  13. Probabilistic Decision Trees p(S | M, X1, X2) p(S | M=m0, X1=1, X2=2)

  14. 2 X 1,3 2 M X 1 not mailed 2 mailed 1 M p(S=subscribed) = 0.6 p(S=subscribed) = 0.5 p(S=not subscribed) = 0.4 p(S=not subscribed) = 0.5 mailed not p(S=subscribed) = 0.7 mailed p(S=not subscribed) = 0.3 M not mailed mailed p(S=subscribed) = 0.4 p(S=subscribed) = 0.2 p(S=subscribed) = 0.3 p(S=not subscribed) = 0.6 p(S=not subscribed) = 0.8 p(S=not subscribed) = 0.7 Calculating Lift Potential customer with {X1=1, X2=2}, Assume c = 0.50, r = 9 Lift = -0.5 + (0.4 – 0.2)  9 = 1.3 Mail to this person!

  15. X1 X2 Xn Xn X1 X3 Score3(Data) Score1(Data) Scoren(Data) Score2(Data) Score1(Data) Scoren(Data) X2 X2 X2 X2 Traditional Learning Algorithm

  16. Lift-Aware Learning Algorithm Traditional Learning Algorithm Identify a tree that represents p(S|M,X) well Lift-Aware Would like the tree to be good at modeling the difference: p(S=s1|M=m1,X=x) - p(S=s1|M=m0,X=x)

  17. X2 X1 X1 X2 Xn Score1(Data) Scoren(Data) M M M M M M M M M M M M M A Heuristic Only consider decision trees (for S) with the last split on M X1 X1 Score2(Data) Score2(Data)

  18. Experiment: Real-world Dataset Product of interest: MSN subscription Potential customers: Windows 95 registrants Known variables (X): 15 from questionnaire (e.g. gender, RAM size) Cost to Mail: 42 cents Subscription revenue: varied from 1 to 15 dollars Data: sample of ~110,000 potential customers (70% train, 30% test) Compared our algorithm (FORCE) with unconstrained greedy algorithm (NORMAL) for various revenues

  19. Results on Test Data:Per-person improvement over Mail-to-All

  20. Conclusions / Future Work Marginal improvement over standard decision-tree algorithm: Almost every path in the “standard” trees contained a split on M. We expect larger difference for other domains. Algorithm works for discounted prices: Expected Profit from mailing Profit from not mailing

  21. Part II: Targeted Advertising on the Web ??? Given information about a visitor, how do you choose which advertisement to display?

  22. Goals of Targeted Advertising • Maximize $$$ • Maximize Clicks • Brand Presence

  23. Possible cluster attributes: • Current page category • Pages the user has visited on the site • Known demographics • Inferred demographics • Previous advertisement clicks Cluster 1 Cluster m Naïve Targeting Scheme Step 1: cluster / segment users

  24. Naïve Targeting Scheme Step 2: Advertiser books ads into clusters Step 3: Measure click probabilities Step 4: Show best ad to each cluster Problems: (Inventory management) Ad Quotas Cluster overbooking

  25. Cluster 1 Cluster 2 Cluster m x11 x12 x1m Ad 1 x21 x22 x2m Ad 2 xn1 xn2 xnm Ad n Advertisement Allocation xij = Number of times to show advertisement i to user cluster j

  26. Maximize Expected Clicks Cluster 1 Cluster 2 Cluster m p11x11 p12x12 p1mx1m Ad 1 p21x21 p22x22 p2mx2m Ad 2 pn1xn1 pn2xn2 pnmxnm Ad n

  27. Cluster j xi1 xi1 xij xim Ad i xin Inventory-Management Constraints

  28. Linear Program Find the schedule X that maximizes: Subject to: Solve using (e.g.) the simplex algorithm

  29. A Simple Targeting System • Estimate probabilities • Find the optimal schedule • Serve ads to cluster j via

  30. Cluster 1 Cluster 1 Cluster 2 Cluster 2 0.49 0.51 k 0 Ad 1 Ad 1 0.51 0.49 0 k Ad 2 Ad 2 Sensitivity to Estimates Probabilities: q1 = q2 = c1 = c2 =k Optimal Schedule:

  31. Cluster 1 Cluster 1 Cluster 2 Cluster 2 0.5 0.5 b a Ad 1 Ad 1 0.5 0.5 d c Ad 2 Ad 2 Solution: Buckets Probabilities: q1 = q2 = c1 = c2 =k Optimal Schedule: a+b+c+d = 2k Secondary (linear) optimization: Ads are shown as close to uniform across all clusters

  32. Passive Experiment: MSNBC(December 1998) Clusters defined by the current page group Sports News Health Opinion ¼ Manual approach: advertisers buy impressions on page groups

  33. Passive Experiment: MSNBC(December 1998) ~20 clusters ~500 advertisements ~1.6 million impressions / day Data from day 1: Estimate pij (ave ~4K data points per probability) Find optimal schedule (less than 1 minute – no buckets) Data from day 2: Re-estimate pij Evaluate schedule: Result: 20 – 30 % increase over manual schedule

  34. Active Experiment on MSNBC(May 1999) Particular advertiser: 5 ads Data from weekend 1: Estimate pij (~15K data points per probability) Find optimal schedule (less than 1 second using buckets) Rearrange advertisements for weekend 2 Data from weekend 2: Count the number of clicks and compare to weekend 1

  35. Weekend 1 (pre target) Weekend 2 (post target) 0 advertiser control Active Experiment Results 30% increase for the advertiser, negligible increase for others Predicted a 20% increase on MSNBC

  36. Extensions Problem: Increasing total expected clicks across site may decrease clicks for particular advertiser Solution: Add (linear) constraint that expected clicks cannot decrease Passive experiment: MSNBC overall increase still ~20%

  37. Expected utility of X = Extensions Focus of talk: pij = expected #clicks from showing ad i to user j In general: uij = expected utility from showing ad i to user j Alternative uijchoices Weighted probabilities: wi pij Probability of purchase Increase in brand awareness Expected revenue

  38. My Home Page http://research.microsoft.com/~dmax/

  39. Results on Test Data:Per-person improvement over Mail-to-All • To evaluate test case given a model: • Evaluate the lift given X (ignoring M and S) • Recommend Mail if and only if Lift > 0 • If recommendation matches M from the test • case, add r to the total revenue. Otherwise, • ignore.

More Related