Challenges in Computational Advertising

Challenges in Computational Advertising DeepayanChakrabarti(deepay@yahoo-inc.com)

Online Advertising Overview Pick ads Ads Advertisers Ad Network Content User Examples:Yahoo, Google, MSN, RightMedia, … Content Provider

Advertising Setting Sponsored Search Display Content Match

Advertising Setting Sponsored Search Display Content Match Pick ads

Advertising Setting • Graphical display ads • Mostly for brand awareness • Revenue based on number of impressions (not clicks) Sponsored Search Display Content Match

Advertising Setting Sponsored Search Display Content Match Content match ad

Advertising Setting Sponsored Search Display Content Match Text ads Pick ads Match ads to the content

Advertising Setting • The user intent is unclear • Revenue depends on number of clicks • Query (webpage) is long and noisy Sponsored Search Display Content Match

Advertising Setting Sponsored Search Display Content Match Search Query Sponsored Search Ads

This presentation • Content Match [KDD 2007]: • How can we estimate the click-through rate (CTR) of an ad on a page? CTR for ad j on page i ~109 pages ~106 ads

This presentation • Estimating CTR for Content Match [KDD ‘07] • Traffic Shaping for Display Advertising [EC ‘12] Display ads Article summary click Alternates

This presentation • Estimating CTR for Content Match [KDD ‘07] • Traffic Shaping for Display Advertising[EC ‘12] • Recommend articles (not ads) • need high CTR on article summaries • + prefer articles on which under-delivering ads can be shown

This presentation • Estimating CTR for Content Match [KDD ‘07] • Traffic Shaping for Display Advertising [EC ‘12] • Theoretical underpinnings[COLT ‘10 best student paper] • Represent relationships as a graph • Recommendation = Link Prediction • Many useful heuristics exist • Why do these heuristics work? Goal: Suggest friends

Estimating CTR for Content Match • Contextual Advertising • Show an ad on a webpage (“impression”) • Revenue is generated if a user clicks • Problem: Estimate the click-through rate (CTR) of an ad on a page CTR for ad j on page i ~109 pages ~106 ads

Estimating CTR for Content Match • Why not use the MLE? • Few (page, ad) pairs have N>0 • Very few have c>0 as well • MLE does not differentiate between 0/10 and 0/100 • We have additional information: hierarchies

Estimating CTR for Content Match • Use an existing, well-understood hierarchy • Categorize ads and webpages to leaves of the hierarchy • CTR estimates of siblings are correlated • The hierarchy allows us to aggregate data • Coarser resolutions • provide reliable estimates for rare events • which then influences estimation at finer resolutions

Estimating CTR for Content Match Level 0 • Region= (page node, ad node) • Region Hierarchy • A cross-product of the page hierarchy and the ad hierarchy Level i Region Ad classes Page classes Page hierarchy Ad hierarchy

Estimating CTR for Content Match • Our Approach • Data Transformation • Model • Model Fitting

Data Transformation • Problem: • Solution: Freeman-Tukey transform • Differentiates regions with 0 clicks • Variance stabilization:

Model • Goal: Smoothing across siblings in hierarchy[Huang+Cressie/2000] Level i Each region has a latent state Sr yr is independent of the hierarchy given Sr Sr is drawn from its parent Spa(r) Sparent latent S3 S1 S4 Level i+1 S2 y1 y2 y4 observable 20

Model wpa(r) Spa(r) variance wr Vpa(r) βpa(r) ypa(r) upa(r) Sr variance Vr coeff. βr ur yr 21

Model • However, learning Wr, Vr and βrfor each region is clearly infeasible • Assumptions: • All regions at the same level ℓ sharethe same W(ℓ) and β(ℓ) • Vr = V/Nr for some constant V, since wr Spa(r) Sr Vr βr yr ur

Model • Implications: • determines degree of smoothing • : • Sr varies greatly from Spa(r) • Each region learns its own Sr • No smoothing • : • All Sr are identical • A regression model on features ur is learnt • Maximum Smoothing wr Spa(r) Sr Vr βr yr ur

Model • Implications: • determines degree of smoothing • Var(Sr) increases from root to leaf • Better estimates at coarser resolutions wr Spa(r) Sr Vr βr yr ur

Model • Implications: • determines degree of smoothing • Var(Sr) increases from root to leaf • Correlations among siblings atlevel ℓ: • Depends only on level of least commonancestor wr Spa(r) Sr Vr βr ) yr ur ) > Corr( , Corr( ,

Estimating CTR for Content Match • Our Approach • Data Transformation (Freeman-Tukey) • Model (Tree-structured Markov Chain) • Model Fitting

Model Fitting • Fitting using a Kalman filtering algorithm • Filtering: Recursively aggregate data from leaves to root • Smoothing: Propagate information from root to leaves • Complexity: linear in the number of regions, for both time and space filtering smoothing

Model Fitting • Fitting using a Kalman filtering algorithm • Filtering: Recursively aggregate data from leaves to root • Smoothing: Propagates information from root to leaves • Kalman filter requires knowledge of β, V, and W • EM wrapped around the Kalman filter filtering smoothing

Experiments • 503M impressions • 7-level hierarchy of which the top 3 levels were used • Zero clicks in • 76% regions in level 2 • 95% regions in level 3 • Full dataset DFULL, and a 2/3 sample DSAMPLE

Experiments • Estimate CTRs for all regions R in level 3 with zero clicks in DSAMPLE • Some of these regions R>0 get clicks in DFULL • A good model should predict higher CTRs for R>0 as against the other regions in R

Experiments • We compared 4 models • TS: our tree-structured model • LM (level-mean): each level smoothed independently • NS (no smoothing): CTR proportional to 1/Nr • Random: Assuming |R>0| is given, randomly predict the membership of R>0 out of R

Experiments TS Random LM, NS

Experiments • MLE=0 everywhere, since 0 clicks were observed • What about estimated CTR? Variability from coarser resolutions Close to MLE for large N Estimated CTR Estimated CTR Impressions Impressions No Smoothing (NS) Our Model (TS)

Estimating CTR for Content Match • We presented a method to estimate • rates of extremely rare events • at multiple resolutions • under severe sparsity constraints • Key points: • Tree-structured generative model • Extremely fast parameter fitting

Traffic Shaping • Estimating CTR for Content Match [KDD ‘07] • Traffic Shaping for Display Advertising [EC ‘12] • Theoretical underpinnings [COLT ‘10 best student paper]

Traffic Shaping Which article summary should be picked? Ans:The one with highest expected CTR Which ad should be displayed? Ans:The ad that minimizes underdelivery Article pool

Underdelivery • Advertisers are guaranteed some impressions (say, 1M) over some time (say, 2 months) • only to users matching their specs • only when they visit certain types of pages • only on certain positions on the page • An underdelivering ad is one that is likely to miss its guarantee

Underdelivery • How can underdelivery be computed? • Need user traffic forecasts • Depends on other ads in the system • An ad-serving systemwill try to minimizeunder-delivery on thisgraph Demand dj Supply sℓ j ℓ Forecasted impressions(user, article, position) Ad inventory

Traffic Shaping Which article summary should be picked? Ans:The one with highest expected CTR Which ad should be displayed? Ans:The ad that minimizes underdelivery Goal: Combine the two

Traffic Shaping • Goal: Bias the article summary selection to • reduce under-delivery • but insignificant drop in CTR • AND do this in real-time

Outline • Formulation as an optimization problem • Real-time solution • Empirical results

Formulation Ad delivery fraction φℓj ℓ j Demand dj Traffic shaping fraction wki i Supply sk CTRcki k k:(user) j:(ads) i:(user, article) ℓ:(user, article, position)“Fully Qualified Impression” Goal: Infer traffic shaping fractions wki

Ad delivery fraction φℓj Formulation Traffic shaping fraction wki A CTRcki • Full traffic shaping graph: • All forecasted user traffic X all available articles • arriving at the homepage, • or directly on article page • Goal: Infer wki • But forced to infer φℓjas well B C Full Traffic Shaping Graph

Formulation sk wki cki i k ℓ j underdelivery (Satisfy demand constraints) demand Total user traffic flowing to j (accounting for CTR loss)

Formulation i k ℓ j (Satisfy demand constraints) (Bounds on traffic shaping fractions) (Shape only available traffic) (Ad delivery fractions)

Key Transformation • This allows a reformulation solely in terms of new variables zℓj • zℓj = fraction of supply that is shown ad j, assuming user always clicks article

Formulation • Convex program  can be solved optimally

Formulation • But we have another problem • At runtime, we must shape every incoming user without looking at the entire graph • Solution: • Periodically solve the convex problem offline • Store a cache derived from this solution • Reconstruct the optimal solution for each user at runtime, using only the cache

Outline • Formulation as an optimization problem • Real-time solution • Empirical results

Real-time solution Cache these Reconstruct using these All constraints can be expressed as constraints on σℓ

Challenges in Computational Advertising