Create Presentation
Download Presentation

Download Presentation
## Challenges in Computational Advertising

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Challenges in Computational Advertising**DeepayanChakrabarti(deepay@yahoo-inc.com)**Online Advertising Overview**Pick ads Ads Advertisers Ad Network Content User Examples:Yahoo, Google, MSN, RightMedia, … Content Provider**Advertising Setting**Sponsored Search Display Content Match**Advertising Setting**Sponsored Search Display Content Match Pick ads**Advertising Setting**• Graphical display ads • Mostly for brand awareness • Revenue based on number of impressions (not clicks) Sponsored Search Display Content Match**Advertising Setting**Sponsored Search Display Content Match Content match ad**Advertising Setting**Sponsored Search Display Content Match Text ads Pick ads Match ads to the content**Advertising Setting**• The user intent is unclear • Revenue depends on number of clicks • Query (webpage) is long and noisy Sponsored Search Display Content Match**Advertising Setting**Sponsored Search Display Content Match Search Query Sponsored Search Ads**This presentation**• Content Match [KDD 2007]: • How can we estimate the click-through rate (CTR) of an ad on a page? CTR for ad j on page i ~109 pages ~106 ads**This presentation**• Estimating CTR for Content Match [KDD ‘07] • Traffic Shaping for Display Advertising [EC ‘12] Display ads Article summary click Alternates**This presentation**• Estimating CTR for Content Match [KDD ‘07] • Traffic Shaping for Display Advertising[EC ‘12] • Recommend articles (not ads) • need high CTR on article summaries • + prefer articles on which under-delivering ads can be shown**This presentation**• Estimating CTR for Content Match [KDD ‘07] • Traffic Shaping for Display Advertising [EC ‘12] • Theoretical underpinnings[COLT ‘10 best student paper] • Represent relationships as a graph • Recommendation = Link Prediction • Many useful heuristics exist • Why do these heuristics work? Goal: Suggest friends**Estimating CTR for Content Match**• Contextual Advertising • Show an ad on a webpage (“impression”) • Revenue is generated if a user clicks • Problem: Estimate the click-through rate (CTR) of an ad on a page CTR for ad j on page i ~109 pages ~106 ads**Estimating CTR for Content Match**• Why not use the MLE? • Few (page, ad) pairs have N>0 • Very few have c>0 as well • MLE does not differentiate between 0/10 and 0/100 • We have additional information: hierarchies**Estimating CTR for Content Match**• Use an existing, well-understood hierarchy • Categorize ads and webpages to leaves of the hierarchy • CTR estimates of siblings are correlated • The hierarchy allows us to aggregate data • Coarser resolutions • provide reliable estimates for rare events • which then influences estimation at finer resolutions**Estimating CTR for Content Match**Level 0 • Region= (page node, ad node) • Region Hierarchy • A cross-product of the page hierarchy and the ad hierarchy Level i Region Ad classes Page classes Page hierarchy Ad hierarchy**Estimating CTR for Content Match**• Our Approach • Data Transformation • Model • Model Fitting**Data Transformation**• Problem: • Solution: Freeman-Tukey transform • Differentiates regions with 0 clicks • Variance stabilization:**Model**• Goal: Smoothing across siblings in hierarchy[Huang+Cressie/2000] Level i Each region has a latent state Sr yr is independent of the hierarchy given Sr Sr is drawn from its parent Spa(r) Sparent latent S3 S1 S4 Level i+1 S2 y1 y2 y4 observable 20**Model**wpa(r) Spa(r) variance wr Vpa(r) βpa(r) ypa(r) upa(r) Sr variance Vr coeff. βr ur yr 21**Model**• However, learning Wr, Vr and βrfor each region is clearly infeasible • Assumptions: • All regions at the same level ℓ sharethe same W(ℓ) and β(ℓ) • Vr = V/Nr for some constant V, since wr Spa(r) Sr Vr βr yr ur**Model**• Implications: • determines degree of smoothing • : • Sr varies greatly from Spa(r) • Each region learns its own Sr • No smoothing • : • All Sr are identical • A regression model on features ur is learnt • Maximum Smoothing wr Spa(r) Sr Vr βr yr ur**Model**• Implications: • determines degree of smoothing • Var(Sr) increases from root to leaf • Better estimates at coarser resolutions wr Spa(r) Sr Vr βr yr ur**Model**• Implications: • determines degree of smoothing • Var(Sr) increases from root to leaf • Correlations among siblings atlevel ℓ: • Depends only on level of least commonancestor wr Spa(r) Sr Vr βr ) yr ur ) > Corr( , Corr( ,**Estimating CTR for Content Match**• Our Approach • Data Transformation (Freeman-Tukey) • Model (Tree-structured Markov Chain) • Model Fitting**Model Fitting**• Fitting using a Kalman filtering algorithm • Filtering: Recursively aggregate data from leaves to root • Smoothing: Propagate information from root to leaves • Complexity: linear in the number of regions, for both time and space filtering smoothing**Model Fitting**• Fitting using a Kalman filtering algorithm • Filtering: Recursively aggregate data from leaves to root • Smoothing: Propagates information from root to leaves • Kalman filter requires knowledge of β, V, and W • EM wrapped around the Kalman filter filtering smoothing**Experiments**• 503M impressions • 7-level hierarchy of which the top 3 levels were used • Zero clicks in • 76% regions in level 2 • 95% regions in level 3 • Full dataset DFULL, and a 2/3 sample DSAMPLE**Experiments**• Estimate CTRs for all regions R in level 3 with zero clicks in DSAMPLE • Some of these regions R>0 get clicks in DFULL • A good model should predict higher CTRs for R>0 as against the other regions in R**Experiments**• We compared 4 models • TS: our tree-structured model • LM (level-mean): each level smoothed independently • NS (no smoothing): CTR proportional to 1/Nr • Random: Assuming |R>0| is given, randomly predict the membership of R>0 out of R**Experiments**TS Random LM, NS**Experiments**• MLE=0 everywhere, since 0 clicks were observed • What about estimated CTR? Variability from coarser resolutions Close to MLE for large N Estimated CTR Estimated CTR Impressions Impressions No Smoothing (NS) Our Model (TS)**Estimating CTR for Content Match**• We presented a method to estimate • rates of extremely rare events • at multiple resolutions • under severe sparsity constraints • Key points: • Tree-structured generative model • Extremely fast parameter fitting**Traffic Shaping**• Estimating CTR for Content Match [KDD ‘07] • Traffic Shaping for Display Advertising [EC ‘12] • Theoretical underpinnings [COLT ‘10 best student paper]**Traffic Shaping**Which article summary should be picked? Ans:The one with highest expected CTR Which ad should be displayed? Ans:The ad that minimizes underdelivery Article pool**Underdelivery**• Advertisers are guaranteed some impressions (say, 1M) over some time (say, 2 months) • only to users matching their specs • only when they visit certain types of pages • only on certain positions on the page • An underdelivering ad is one that is likely to miss its guarantee**Underdelivery**• How can underdelivery be computed? • Need user traffic forecasts • Depends on other ads in the system • An ad-serving systemwill try to minimizeunder-delivery on thisgraph Demand dj Supply sℓ j ℓ Forecasted impressions(user, article, position) Ad inventory**Traffic Shaping**Which article summary should be picked? Ans:The one with highest expected CTR Which ad should be displayed? Ans:The ad that minimizes underdelivery Goal: Combine the two**Traffic Shaping**• Goal: Bias the article summary selection to • reduce under-delivery • but insignificant drop in CTR • AND do this in real-time**Outline**• Formulation as an optimization problem • Real-time solution • Empirical results**Formulation**Ad delivery fraction φℓj ℓ j Demand dj Traffic shaping fraction wki i Supply sk CTRcki k k:(user) j:(ads) i:(user, article) ℓ:(user, article, position)“Fully Qualified Impression” Goal: Infer traffic shaping fractions wki**Ad delivery fraction φℓj**Formulation Traffic shaping fraction wki A CTRcki • Full traffic shaping graph: • All forecasted user traffic X all available articles • arriving at the homepage, • or directly on article page • Goal: Infer wki • But forced to infer φℓjas well B C Full Traffic Shaping Graph**Formulation**sk wki cki i k ℓ j underdelivery (Satisfy demand constraints) demand Total user traffic flowing to j (accounting for CTR loss)**Formulation**i k ℓ j (Satisfy demand constraints) (Bounds on traffic shaping fractions) (Shape only available traffic) (Ad delivery fractions)**Key Transformation**• This allows a reformulation solely in terms of new variables zℓj • zℓj = fraction of supply that is shown ad j, assuming user always clicks article**Formulation**• Convex program can be solved optimally**Formulation**• But we have another problem • At runtime, we must shape every incoming user without looking at the entire graph • Solution: • Periodically solve the convex problem offline • Store a cache derived from this solution • Reconstruct the optimal solution for each user at runtime, using only the cache**Outline**• Formulation as an optimization problem • Real-time solution • Empirical results**Real-time solution**Cache these Reconstruct using these All constraints can be expressed as constraints on σℓ