probabilistic methods for targeted advertising n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Probabilistic Methods for Targeted Advertising PowerPoint Presentation
Download Presentation
Probabilistic Methods for Targeted Advertising

Loading in 2 Seconds...

play fullscreen
1 / 40

Probabilistic Methods for Targeted Advertising - PowerPoint PPT Presentation


  • 135 Views
  • Uploaded on

Probabilistic Methods for Targeted Advertising. Max Chickering Microsoft Research. Outline. Targeted Mailing To whom should you send a solicitation? Targeted Advertising on the Web How should you display banner ads to maximize click-through?. Targeted Mailing.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Probabilistic Methods for Targeted Advertising' - toyah


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
probabilistic methods for targeted advertising

Probabilistic Methods forTargeted Advertising

Max Chickering

Microsoft Research

outline
Outline
  • Targeted Mailing

To whom should you send a solicitation?

  • Targeted Advertising on the Web

How should you display banner ads to maximize click-through?

targeted mailing
Targeted Mailing
  • Given a population of potential customers.

Person X1 X2 … Xn

1 0 0 … red

2 0 3.4 … blue

. . . .

. . . .

. . . .

m 1 7 … green

  • Sending an advertisement costs money:
  • - Postage
  • - Possible Discount

Which potential customers do you solicit?

motivating application
Motivating Application
  • Advertisement:
    • MSN subscription
  • Potential customers:
    • People who registered Windows 95
  • Known variables:
    • from questionnaire (e.g. gender, RAM size)
na ve solutions
Naïve Solutions
  • Mail to those customers most likely to subscribe to MSN
  • Can waste money by targeting customers who would
  • subscribe anyway
  • Mail to everyone
  • Even worse!
response behaviors
Response Behaviors

Will the potential customer buy the product?

Mail Don’t Mail

Always buyer Yes Yes

Persuadable Yes No

Anti-persuadable No Yes

Never buyer No No

We only make money from mailing to the persuadable

potential customers

expected profit for a population
Expected Profit for a Population

Population of N potential cutomers Nalw, Nper, Nanti, Nnev

Cost of mailing c

Solicited and unsolicited revenue r

Expected Profit from mailing

Profit from not mailing

lift in profit from mailing
Lift in Profit From Mailing

Profit from mailing - Profit from not mailing

For any set of potential customers, we should only

mail if the lift is positive.

learning expected lift
Learning Expected Lift

S{s0, s1} (did not subscribe, did subscribe)

M{m0, m1} (did not mail, did mail)

Identifiable if

S, M known

in training data

Lift : -c + [ p(S=s1|M=m1) – p(S=s1|M=m0) ] r

controlled experiment identify profitable sub populations
Controlled Experiment:Identify Profitable Sub-Populations
  • Choose a small sample of the potential customers
  • Randomly divide those customers into a “treatment group”
  • (M = m1) and a “control group” (M = m0)
  • Wait a specified period of time, and record S= s0 or
  • S= s1 for each
controlled experiment

Person X1 X2 … Xn M S

1 0 0 … red m1s0

2 0 3.4 … blue m0s1

. . . .

. . . .

. . . .

m 1 7 … green m1s1

Controlled Experiment

Use machine-learning techniques to identify sub-populations with high positive lift, and then target those customers

Lift ( Sub-population corresponding to Xn=blue ) =

-c + [ p(S=s1|M=m1 , Xn=blue) – p(S=s1|M=m0 , Xn=blue) ] r

identify profitable sub populations
Identify Profitable Sub-Populations

Known distinctions in our data : X = {X1, …, Xn}, S, M

Partitions of X define sub-populations and statistical model for

p(S|M,X) defines the lift

Lift 1

Lift 2

Lift 3

X1 < 10, X12 = false

X1 > 10, X4 2

X1 > 10, X4 = 2

Lift 4

X1 < 10, X12 = true

Approach: Use Decision Trees

probabilistic decision trees
Probabilistic Decision Trees

p(S | M, X1, X2)

p(S | M=m0, X1=1, X2=2)

calculating lift

2

X

1,3

2

M

X

1

not

mailed

2

mailed

1

M

p(S=subscribed) = 0.6

p(S=subscribed) = 0.5

p(S=not subscribed) = 0.4

p(S=not subscribed) = 0.5

mailed

not

p(S=subscribed) = 0.7

mailed

p(S=not subscribed) = 0.3

M

not

mailed

mailed

p(S=subscribed) = 0.4

p(S=subscribed) = 0.2

p(S=subscribed) = 0.3

p(S=not subscribed) = 0.6

p(S=not subscribed) = 0.8

p(S=not subscribed) = 0.7

Calculating Lift

Potential customer with {X1=1, X2=2}, Assume c = 0.50, r = 9

Lift = -0.5 + (0.4 – 0.2)  9 = 1.3

Mail to this person!

traditional learning algorithm

X1

X2

Xn

Xn

X1

X3

Score3(Data)

Score1(Data)

Scoren(Data)

Score2(Data)

Score1(Data)

Scoren(Data)

X2

X2

X2

X2

Traditional Learning Algorithm
lift aware learning algorithm
Lift-Aware Learning Algorithm

Traditional Learning Algorithm

Identify a tree that represents p(S|M,X) well

Lift-Aware

Would like the tree to be good at modeling

the difference:

p(S=s1|M=m1,X=x) - p(S=s1|M=m0,X=x)

slide17

X2

X1

X1

X2

Xn

Score1(Data)

Scoren(Data)

M

M

M

M

M

M

M

M

M

M

M

M

M

A Heuristic

Only consider decision trees (for S) with the last split on M

X1

X1

Score2(Data)

Score2(Data)

experiment real world dataset
Experiment: Real-world Dataset

Product of interest: MSN subscription

Potential customers: Windows 95 registrants

Known variables (X): 15 from questionnaire (e.g. gender, RAM size)

Cost to Mail: 42 cents

Subscription revenue: varied from 1 to 15 dollars

Data: sample of ~110,000 potential customers

(70% train, 30% test)

Compared our algorithm (FORCE) with unconstrained greedy

algorithm (NORMAL) for various revenues

conclusions future work
Conclusions / Future Work

Marginal improvement over standard decision-tree algorithm:

Almost every path in the “standard” trees contained a split on

M. We expect larger difference for other domains.

Algorithm works for discounted prices:

Expected Profit from mailing

Profit from not mailing

part ii targeted advertising on the web
Part II: Targeted Advertising on the Web

???

Given information about a visitor, how do you choose

which advertisement to display?

goals of targeted advertising
Goals of Targeted Advertising
  • Maximize $$$
    • Maximize Clicks
    • Brand Presence
na ve targeting scheme

Possible cluster attributes:

  • Current page category
  • Pages the user has visited on the site
  • Known demographics
  • Inferred demographics
  • Previous advertisement clicks

Cluster 1

Cluster m

Naïve Targeting Scheme

Step 1: cluster / segment users

na ve targeting scheme1
Naïve Targeting Scheme

Step 2: Advertiser books ads into clusters

Step 3: Measure click probabilities

Step 4: Show best ad to each cluster

Problems: (Inventory management)

Ad Quotas

Cluster overbooking

advertisement allocation

Cluster 1

Cluster 2

Cluster m

x11

x12

x1m

Ad 1

x21

x22

x2m

Ad 2

xn1

xn2

xnm

Ad n

Advertisement Allocation

xij = Number of times to show advertisement i

to user cluster j

maximize expected clicks
Maximize Expected Clicks

Cluster 1

Cluster 2

Cluster m

p11x11

p12x12

p1mx1m

Ad 1

p21x21

p22x22

p2mx2m

Ad 2

pn1xn1

pn2xn2

pnmxnm

Ad n

linear program
Linear Program

Find the schedule X that maximizes:

Subject to:

Solve using (e.g.) the simplex algorithm

a simple targeting system
A Simple Targeting System
  • Estimate probabilities
  • Find the optimal schedule
  • Serve ads to cluster j via
sensitivity to estimates

Cluster 1

Cluster 1

Cluster 2

Cluster 2

0.49

0.51

k

0

Ad 1

Ad 1

0.51

0.49

0

k

Ad 2

Ad 2

Sensitivity to Estimates

Probabilities:

q1 = q2 = c1 = c2 =k

Optimal Schedule:

solution buckets

Cluster 1

Cluster 1

Cluster 2

Cluster 2

0.5

0.5

b

a

Ad 1

Ad 1

0.5

0.5

d

c

Ad 2

Ad 2

Solution: Buckets

Probabilities:

q1 = q2 = c1 = c2 =k

Optimal Schedule:

a+b+c+d = 2k

Secondary (linear) optimization:

Ads are shown as close to uniform across all clusters

passive experiment msnbc december 1998
Passive Experiment: MSNBC(December 1998)

Clusters defined by the current page group

Sports

News

Health

Opinion

¼

Manual approach: advertisers buy impressions on page groups

passive experiment msnbc december 19981
Passive Experiment: MSNBC(December 1998)

~20 clusters

~500 advertisements

~1.6 million impressions / day

Data from day 1:

Estimate pij (ave ~4K data points per probability)

Find optimal schedule (less than 1 minute – no buckets)

Data from day 2:

Re-estimate pij

Evaluate schedule:

Result:

20 – 30 % increase over manual schedule

active experiment on msnbc may 1999
Active Experiment on MSNBC(May 1999)

Particular advertiser: 5 ads

Data from weekend 1:

Estimate pij (~15K data points per probability)

Find optimal schedule (less than 1 second using buckets)

Rearrange advertisements for weekend 2

Data from weekend 2:

Count the number of clicks and compare to weekend 1

active experiment results

Weekend 1 (pre target)

Weekend 2 (post target)

0

advertiser

control

Active Experiment Results

30% increase for the advertiser, negligible increase for others

Predicted a 20% increase on MSNBC

extensions
Extensions

Problem:

Increasing total expected clicks across site may decrease

clicks for particular advertiser

Solution:

Add (linear) constraint that expected clicks cannot

decrease

Passive experiment: MSNBC overall increase still ~20%

extensions1

Expected utility of X =

Extensions

Focus of talk: pij = expected #clicks from showing ad i to user j

In general: uij = expected utility from showing ad i to user j

Alternative uijchoices

Weighted probabilities: wi pij

Probability of purchase

Increase in brand awareness

Expected revenue

my home page
My Home Page

http://research.microsoft.com/~dmax/

results on test data per person improvement over mail to all1
Results on Test Data:Per-person improvement over Mail-to-All
  • To evaluate test case given a model:
  • Evaluate the lift given X (ignoring M and S)
  • Recommend Mail if and only if Lift > 0
  • If recommendation matches M from the test
  • case, add r to the total revenue. Otherwise,
  • ignore.