challenges in computational advertising
Download
Skip this Video
Download Presentation
Challenges in Computational Advertising

Loading in 2 Seconds...

play fullscreen
1 / 74

Challenges in Computational Advertising - PowerPoint PPT Presentation


  • 126 Views
  • Uploaded on

Challenges in Computational Advertising. Deepayan Chakrabarti ([email protected]). Online Advertising Overview. Pick ads. Ads. Advertisers. Ad Network. Content. User. Examples: Yahoo, Google, MSN, RightMedia , …. Content Provider. Advertising Setting. Sponsored Search. Display.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Challenges in Computational Advertising' - waite


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
online advertising overview
Online Advertising Overview

Pick ads

Ads

Advertisers

Ad Network

Content

User

Examples:Yahoo, Google, MSN, RightMedia, …

Content Provider

advertising setting
Advertising Setting

Sponsored Search

Display

Content Match

advertising setting1
Advertising Setting

Sponsored Search

Display

Content Match

Pick ads

advertising setting2
Advertising Setting
  • Graphical display ads
  • Mostly for brand awareness
  • Revenue based on number of impressions (not clicks)

Sponsored Search

Display

Content Match

advertising setting3
Advertising Setting

Sponsored Search

Display

Content Match

Content match ad

advertising setting4
Advertising Setting

Sponsored Search

Display

Content Match

Text ads

Pick ads

Match ads to the content

advertising setting5
Advertising Setting
  • The user intent is unclear
  • Revenue depends on number of clicks
  • Query (webpage) is long and noisy

Sponsored Search

Display

Content Match

advertising setting6
Advertising Setting

Sponsored Search

Display

Content Match

Search Query

Sponsored Search Ads

this presentation
This presentation
  • Content Match [KDD 2007]:
    • How can we estimate the click-through rate (CTR) of an ad on a page?

CTR for ad j on page i

~109 pages

~106 ads

this presentation1
This presentation
  • Estimating CTR for Content Match [KDD ‘07]
  • Traffic Shaping for Display Advertising [EC ‘12]

Display ads

Article summary

click

Alternates

this presentation2
This presentation
  • Estimating CTR for Content Match [KDD ‘07]
  • Traffic Shaping for Display Advertising[EC ‘12]
    • Recommend articles (not ads)
    • need high CTR on article summaries
    • + prefer articles on which under-delivering ads can be shown
this presentation3
This presentation
  • Estimating CTR for Content Match [KDD ‘07]
  • Traffic Shaping for Display Advertising [EC ‘12]
  • Theoretical underpinnings[COLT ‘10 best student paper]
    • Represent relationships as a graph
    • Recommendation = Link Prediction
    • Many useful heuristics exist
    • Why do these heuristics work?

Goal: Suggest friends

estimating ctr for content match
Estimating CTR for Content Match
  • Contextual Advertising
    • Show an ad on a webpage (“impression”)
    • Revenue is generated if a user clicks
    • Problem: Estimate the click-through rate (CTR) of an ad on a page

CTR for ad j on page i

~109 pages

~106 ads

estimating ctr for content match1
Estimating CTR for Content Match
  • Why not use the MLE?
    • Few (page, ad) pairs have N>0
    • Very few have c>0 as well
    • MLE does not differentiate between 0/10 and 0/100
    • We have additional information: hierarchies
estimating ctr for content match2
Estimating CTR for Content Match
  • Use an existing, well-understood hierarchy
    • Categorize ads and webpages to leaves of the hierarchy
    • CTR estimates of siblings are correlated
    • The hierarchy allows us to aggregate data
  • Coarser resolutions
    • provide reliable estimates for rare events
    • which then influences estimation at finer resolutions
estimating ctr for content match3
Estimating CTR for Content Match

Level 0

  • Region= (page node, ad node)
  • Region Hierarchy
    • A cross-product of the page hierarchy and the ad hierarchy

Level i

Region

Ad classes

Page classes

Page hierarchy

Ad hierarchy

estimating ctr for content match4
Estimating CTR for Content Match
  • Our Approach
    • Data Transformation
    • Model
    • Model Fitting
data transformation
Data Transformation
  • Problem:
  • Solution: Freeman-Tukey transform
    • Differentiates regions with 0 clicks
    • Variance stabilization:
model
Model
  • Goal: Smoothing across siblings in hierarchy[Huang+Cressie/2000]

Level i

Each region has a latent state Sr

yr is independent of the hierarchy given Sr

Sr is drawn from its parent Spa(r)

Sparent

latent

S3

S1

S4

Level i+1

S2

y1

y2

y4

observable

20

model1
Model

wpa(r)

Spa(r)

variance wr

Vpa(r)

βpa(r)

ypa(r)

upa(r)

Sr

variance Vr

coeff. βr

ur

yr

21

model2
Model
  • However, learning Wr, Vr and βrfor each region is clearly infeasible
  • Assumptions:
    • All regions at the same level ℓ sharethe same W(ℓ) and β(ℓ)
    • Vr = V/Nr for some constant V, since

wr

Spa(r)

Sr

Vr

βr

yr

ur

model3
Model
  • Implications:
    • determines degree of smoothing
    • :
      • Sr varies greatly from Spa(r)
      • Each region learns its own Sr
      • No smoothing
    • :
      • All Sr are identical
      • A regression model on features ur is learnt
      • Maximum Smoothing

wr

Spa(r)

Sr

Vr

βr

yr

ur

model4
Model
  • Implications:
    • determines degree of smoothing
    • Var(Sr) increases from root to leaf
      • Better estimates at coarser resolutions

wr

Spa(r)

Sr

Vr

βr

yr

ur

model5
Model
  • Implications:
    • determines degree of smoothing
    • Var(Sr) increases from root to leaf
    • Correlations among siblings atlevel ℓ:
      • Depends only on level of least commonancestor

wr

Spa(r)

Sr

Vr

βr

)

yr

ur

) > Corr(

,

Corr(

,

estimating ctr for content match5
Estimating CTR for Content Match
  • Our Approach
    • Data Transformation (Freeman-Tukey)
    • Model (Tree-structured Markov Chain)
    • Model Fitting
model fitting
Model Fitting
  • Fitting using a Kalman filtering algorithm
    • Filtering: Recursively aggregate data from leaves to root
    • Smoothing: Propagate information from root to leaves
  • Complexity: linear in the number of regions, for both time and space

filtering

smoothing

model fitting1
Model Fitting
  • Fitting using a Kalman filtering algorithm
    • Filtering: Recursively aggregate data from leaves to root
    • Smoothing: Propagates information from root to leaves
  • Kalman filter requires knowledge of β, V, and W
    • EM wrapped around the Kalman filter

filtering

smoothing

experiments
Experiments
  • 503M impressions
  • 7-level hierarchy of which the top 3 levels were used
  • Zero clicks in
    • 76% regions in level 2
    • 95% regions in level 3
  • Full dataset DFULL, and a 2/3 sample DSAMPLE
experiments1
Experiments
  • Estimate CTRs for all regions R in level 3 with zero clicks in DSAMPLE
  • Some of these regions R>0 get clicks in DFULL
  • A good model should predict higher CTRs for R>0 as against the other regions in R
experiments2
Experiments
  • We compared 4 models
    • TS: our tree-structured model
    • LM (level-mean): each level smoothed independently
    • NS (no smoothing): CTR proportional to 1/Nr
    • Random: Assuming |R>0| is given, randomly predict the membership of R>0 out of R
experiments3
Experiments

TS

Random

LM, NS

experiments4
Experiments
  • MLE=0 everywhere, since 0 clicks were observed
  • What about estimated CTR?

Variability from coarser resolutions

Close to MLE for large N

Estimated CTR

Estimated CTR

Impressions

Impressions

No Smoothing (NS)

Our Model (TS)

estimating ctr for content match6
Estimating CTR for Content Match
  • We presented a method to estimate
    • rates of extremely rare events
    • at multiple resolutions
    • under severe sparsity constraints
  • Key points:
    • Tree-structured generative model
    • Extremely fast parameter fitting
traffic shaping
Traffic Shaping
  • Estimating CTR for Content Match [KDD ‘07]
  • Traffic Shaping for Display Advertising [EC ‘12]
  • Theoretical underpinnings [COLT ‘10 best student paper]
traffic shaping1
Traffic Shaping

Which article summary should be picked?

Ans:The one with highest expected CTR

Which ad should be displayed?

Ans:The ad that minimizes underdelivery

Article pool

underdelivery
Underdelivery
  • Advertisers are guaranteed some impressions (say, 1M) over some time (say, 2 months)
    • only to users matching their specs
    • only when they visit certain types of pages
    • only on certain positions on the page
  • An underdelivering ad is one that is likely to miss its guarantee
underdelivery1
Underdelivery
  • How can underdelivery be computed?
    • Need user traffic forecasts
    • Depends on other ads in the system
  • An ad-serving systemwill try to minimizeunder-delivery on thisgraph

Demand dj

Supply sℓ

j

Forecasted impressions(user, article, position)

Ad inventory

traffic shaping2
Traffic Shaping

Which article summary should be picked?

Ans:The one with highest expected CTR

Which ad should be displayed?

Ans:The ad that minimizes underdelivery

Goal: Combine the two

traffic shaping3
Traffic Shaping
  • Goal: Bias the article summary selection to
    • reduce under-delivery
    • but insignificant drop in CTR
    • AND do this in real-time
outline
Outline
  • Formulation as an optimization problem
  • Real-time solution
  • Empirical results
formulation
Formulation

Ad delivery fraction φℓj

j

Demand dj

Traffic shaping fraction wki

i

Supply sk

CTRcki

k

k:(user)

j:(ads)

i:(user, article)

ℓ:(user, article, position)“Fully Qualified Impression”

Goal: Infer traffic shaping fractions wki

formulation1

Ad delivery fraction φℓj

Formulation

Traffic shaping fraction wki

A

CTRcki

  • Full traffic shaping graph:
    • All forecasted user traffic X all available articles
    • arriving at the homepage,
    • or directly on article page
  • Goal: Infer wki
    • But forced to infer φℓjas well

B

C

Full Traffic Shaping Graph

formulation2
Formulation

sk

wki

cki

i

k

j

underdelivery

(Satisfy demand constraints)

demand

Total user traffic flowing to j (accounting for CTR loss)

formulation3
Formulation

i

k

j

(Satisfy demand constraints)

(Bounds on traffic shaping fractions)

(Shape only available traffic)

(Ad delivery fractions)

key transformation
Key Transformation
  • This allows a reformulation solely in terms of new variables zℓj
    • zℓj = fraction of supply that is shown ad j, assuming user always clicks article
formulation4
Formulation
  • Convex program  can be solved optimally
formulation5
Formulation
  • But we have another problem
    • At runtime, we must shape every incoming user without looking at the entire graph
  • Solution:
    • Periodically solve the convex problem offline
    • Store a cache derived from this solution
    • Reconstruct the optimal solution for each user at runtime, using only the cache
outline1
Outline
  • Formulation as an optimization problem
  • Real-time solution
  • Empirical results
real time solution
Real-time solution

Cache these

Reconstruct using these

All constraints can be expressed as constraints on σℓ

real time solution1
Real-time solution

i

1

Σzℓj

k

Shape depends on the cached duals αj

Ui

j

3 KKT conditions

Li

σℓ

σℓ = 0 unless Σzℓj= maxℓΣzℓj

2

Σℓσℓ = constant for all i connected to k

3

real time solution2
Real-time solution

i

1

Σzℓj

k

Ui

  • Algo
  • Initialize σℓ = 0
  • Compute Σzℓj from (1)
  • If constraints unsatisfied, increase σℓ while satisfying (2) and (3)
  • Repeat
  • Extract wki from zℓj

j

Li

σℓ

σℓ = 0 unless Σzℓj= maxℓΣzℓj

2

Σℓσℓ = constant for all i connected to k

3

results
Results
  • Data:
    • Historical traffic logs from April, 2011
    • 25K user nodes
      • Total supply weight > 50B impressions
    • 100K ads
  • We compare our model to a scheme that
    • picks articles to maximize expected CTR, and
    • picks ads to display via a separate greedy method
lift in impressions
Lift in impressions

Nearly threefold improvement via traffic shaping

Lift in impressions delivered to underperforming ads

Fraction of traffic that is not shaped

average ctr
Average CTR

CTR drop < 10%

Average CTR (as percentage of maximum CTR)

Fraction of traffic that is not shaped

summary
Summary
  • 3x underdelivery reduction with <10% CTR drop
  • 2.6x reduction with 4% CTR drop
  • Runtime application needs only a small cache
traffic shaping4
Traffic Shaping
  • Estimating CTR for Content Match [KDD ‘07]
  • Traffic Shaping for Display Advertising [EC ‘12]
  • Theoretical underpinnings [COLT ‘10 best student paper]
link prediction
Link Prediction
  • Which pair of nodes {i,j} shouldbe connected?

Alice

Bob

Charlie

Goal: Recommend a movie

link prediction1
Link Prediction
  • Which pair of nodes {i,j} shouldbe connected?

Goal: Suggest friends

previous empirical studies
Previous Empirical Studies*

Especially if the graph is sparse

How do we justify these observations?

Link prediction accuracy*

Random

Shortest Path

Common Neighbors

Adamic/Adar

Ensemble of short paths

*Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

link prediction generative model
Link Prediction – Generative Model

Unit volume universe

Model:

  • Nodes are uniformly distributed points in a latent space
  • This space has a distance metric
  • Points close to each other are likely to be connected in the graph
  • Logistic distance function (Raftery+/2002)
slide63

Link Prediction – Generative Model

α determines the steepness

1

½

radius r

Model:

Nodes are uniformly distributed points in a latent space

This space has a distance metric

Points close to each other are likely to be connected in the graph

Higher probability of linking

  • Link prediction ≈ find nearest neighbor who is not currently linked to the node.
    • Equivalent to inferring distances in the latent space
common neighbors
Common Neighbors
  • Pr2(i,j) = Pr(common neighbor|dij)

j

i

Product of two logistic probabilities, integrated over a volume determined by dij

common neighbors1
Common Neighbors
  • OPT = node closest to i
  • MAX = node with max common neighbors with i
  • Theorem:

w.h.p

dOPT ≤ dMAX≤ dOPT + 2[ε/V(1)]1/D

Link prediction by common neighbors is asymptotically optimal

common neighbors distinct radii
Common Neighbors: Distinct Radii

j

k

  • Node k has radius rk .
    • ik if dik ≤ rk (Directed graph)
      • rk captures popularity of node k
    • “Weighted” common neighbors:
      • Predict (i,j) pairs with highest Σ w(r)η(r)

i

m

rk

# common neighbors of radius r

Weight for nodes of radius r

type 2 common neighbors
Type 2 common neighbors

j

k

i

rk

Adamic/Adar

Presence of common neighbor is very informative

Absence is very informative

1/r

r is close to max radius

Real world graphs generally fall in this range

hop paths
ℓ-hop Paths
  • Common neighbors = 2 hop paths
  • For longer paths:
  • Bounds are weaker
  • For ℓ’ ≥ℓwe need ηℓ’ >> ηℓto obtain similar bounds
    •  justifies the exponentially decaying weight given to longer paths by the Katz measure
summary1
Summary
  • Three key ingredients
    • Closer points are likelier to be linked.

Small World Model- Watts, Strogatz, 1998, Kleinberg 2001

    • Triangle inequality holds

necessary to extend to ℓ-hop paths

    • Points are spread uniformly at random

 Otherwise properties will depend on location as well as distance

summary2
Summary

In sparse graphs, length 3 or more paths help in prediction.

Differentiating between different degrees is important

For large dense graphs, common neighbors are enough

Link prediction accuracy*

The number of paths matters, not the length

Random

Shortest Path

Common Neighbors

Adamic/Adar

Ensemble of short paths

*Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

conclusions
Conclusions
  • Discussed three problems
    • Estimating CTR for Content Match
      • Combat sparsity by hierarchical smoothing
    • Traffic Shaping for Display Advertising
      • Joint optimization of CTR and underdelivery-reduction
      • Optimal traffic shaping at runtime using cached duals
    • Theoretical underpinnings
      • Latent space model
      • Link prediction ≈ finding nearest neighbors in this space
other work
Other Work
  • Computational Advertising
  • Combining IR with click feedback
  • Multi-armed bandits using hierarchies
  • Online learning under finite ad lifetimes
  • Web Search
  • Finding Quicklinks
  • Titles for Quicklinks
  • Incorporating tweets into search results
  • Website clustering
  • Webpage segmentation
  • Template detection
  • Finding hidden query aspects
  • Graph Mining
  • Epidemic thresholds
  • Non-parametric prediction in dynamic graphs
  • Graph sampling
  • Graph generation models
  • Community detection
model6
Model
  • Goal: Smoothing across siblings in hierarchy
  • Our approach:
    • Each region has a latent state Sr
    • yr is independent of hierarchy given Sr
    • Sr is drawn from the parent region Spa(r)

Level i

Level i+1

73

data transformation1
Data Transformation

N * Var(MLE)

  • Problem:
  • Solution: Freeman-Tukey transform
    • Differentiates regions with 0 clicks
    • Variance stabilization:

MLE CTR

N * Var(yr)

Mean yr

ad