1 / 74

Challenges in Computational Advertising

Challenges in Computational Advertising. Deepayan Chakrabarti (deepay@yahoo-inc.com). Online Advertising Overview. Pick ads. Ads. Advertisers. Ad Network. Content. User. Examples: Yahoo, Google, MSN, RightMedia , …. Content Provider. Advertising Setting. Sponsored Search. Display.

waite
Download Presentation

Challenges in Computational Advertising

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenges in Computational Advertising DeepayanChakrabarti(deepay@yahoo-inc.com)

  2. Online Advertising Overview Pick ads Ads Advertisers Ad Network Content User Examples:Yahoo, Google, MSN, RightMedia, … Content Provider

  3. Advertising Setting Sponsored Search Display Content Match

  4. Advertising Setting Sponsored Search Display Content Match Pick ads

  5. Advertising Setting • Graphical display ads • Mostly for brand awareness • Revenue based on number of impressions (not clicks) Sponsored Search Display Content Match

  6. Advertising Setting Sponsored Search Display Content Match Content match ad

  7. Advertising Setting Sponsored Search Display Content Match Text ads Pick ads Match ads to the content

  8. Advertising Setting • The user intent is unclear • Revenue depends on number of clicks • Query (webpage) is long and noisy Sponsored Search Display Content Match

  9. Advertising Setting Sponsored Search Display Content Match Search Query Sponsored Search Ads

  10. This presentation • Content Match [KDD 2007]: • How can we estimate the click-through rate (CTR) of an ad on a page? CTR for ad j on page i ~109 pages ~106 ads

  11. This presentation • Estimating CTR for Content Match [KDD ‘07] • Traffic Shaping for Display Advertising [EC ‘12] Display ads Article summary click Alternates

  12. This presentation • Estimating CTR for Content Match [KDD ‘07] • Traffic Shaping for Display Advertising[EC ‘12] • Recommend articles (not ads) • need high CTR on article summaries • + prefer articles on which under-delivering ads can be shown

  13. This presentation • Estimating CTR for Content Match [KDD ‘07] • Traffic Shaping for Display Advertising [EC ‘12] • Theoretical underpinnings[COLT ‘10 best student paper] • Represent relationships as a graph • Recommendation = Link Prediction • Many useful heuristics exist • Why do these heuristics work? Goal: Suggest friends

  14. Estimating CTR for Content Match • Contextual Advertising • Show an ad on a webpage (“impression”) • Revenue is generated if a user clicks • Problem: Estimate the click-through rate (CTR) of an ad on a page CTR for ad j on page i ~109 pages ~106 ads

  15. Estimating CTR for Content Match • Why not use the MLE? • Few (page, ad) pairs have N>0 • Very few have c>0 as well • MLE does not differentiate between 0/10 and 0/100 • We have additional information: hierarchies

  16. Estimating CTR for Content Match • Use an existing, well-understood hierarchy • Categorize ads and webpages to leaves of the hierarchy • CTR estimates of siblings are correlated • The hierarchy allows us to aggregate data • Coarser resolutions • provide reliable estimates for rare events • which then influences estimation at finer resolutions

  17. Estimating CTR for Content Match Level 0 • Region= (page node, ad node) • Region Hierarchy • A cross-product of the page hierarchy and the ad hierarchy Level i Region Ad classes Page classes Page hierarchy Ad hierarchy

  18. Estimating CTR for Content Match • Our Approach • Data Transformation • Model • Model Fitting

  19. Data Transformation • Problem: • Solution: Freeman-Tukey transform • Differentiates regions with 0 clicks • Variance stabilization:

  20. Model • Goal: Smoothing across siblings in hierarchy[Huang+Cressie/2000] Level i Each region has a latent state Sr yr is independent of the hierarchy given Sr Sr is drawn from its parent Spa(r) Sparent latent S3 S1 S4 Level i+1 S2 y1 y2 y4 observable 20

  21. Model wpa(r) Spa(r) variance wr Vpa(r) βpa(r) ypa(r) upa(r) Sr variance Vr coeff. βr ur yr 21

  22. Model • However, learning Wr, Vr and βrfor each region is clearly infeasible • Assumptions: • All regions at the same level ℓ sharethe same W(ℓ) and β(ℓ) • Vr = V/Nr for some constant V, since wr Spa(r) Sr Vr βr yr ur

  23. Model • Implications: • determines degree of smoothing • : • Sr varies greatly from Spa(r) • Each region learns its own Sr • No smoothing • : • All Sr are identical • A regression model on features ur is learnt • Maximum Smoothing wr Spa(r) Sr Vr βr yr ur

  24. Model • Implications: • determines degree of smoothing • Var(Sr) increases from root to leaf • Better estimates at coarser resolutions wr Spa(r) Sr Vr βr yr ur

  25. Model • Implications: • determines degree of smoothing • Var(Sr) increases from root to leaf • Correlations among siblings atlevel ℓ: • Depends only on level of least commonancestor wr Spa(r) Sr Vr βr ) yr ur ) > Corr( , Corr( ,

  26. Estimating CTR for Content Match • Our Approach • Data Transformation (Freeman-Tukey) • Model (Tree-structured Markov Chain) • Model Fitting

  27. Model Fitting • Fitting using a Kalman filtering algorithm • Filtering: Recursively aggregate data from leaves to root • Smoothing: Propagate information from root to leaves • Complexity: linear in the number of regions, for both time and space filtering smoothing

  28. Model Fitting • Fitting using a Kalman filtering algorithm • Filtering: Recursively aggregate data from leaves to root • Smoothing: Propagates information from root to leaves • Kalman filter requires knowledge of β, V, and W • EM wrapped around the Kalman filter filtering smoothing

  29. Experiments • 503M impressions • 7-level hierarchy of which the top 3 levels were used • Zero clicks in • 76% regions in level 2 • 95% regions in level 3 • Full dataset DFULL, and a 2/3 sample DSAMPLE

  30. Experiments • Estimate CTRs for all regions R in level 3 with zero clicks in DSAMPLE • Some of these regions R>0 get clicks in DFULL • A good model should predict higher CTRs for R>0 as against the other regions in R

  31. Experiments • We compared 4 models • TS: our tree-structured model • LM (level-mean): each level smoothed independently • NS (no smoothing): CTR proportional to 1/Nr • Random: Assuming |R>0| is given, randomly predict the membership of R>0 out of R

  32. Experiments TS Random LM, NS

  33. Experiments • MLE=0 everywhere, since 0 clicks were observed • What about estimated CTR? Variability from coarser resolutions Close to MLE for large N Estimated CTR Estimated CTR Impressions Impressions No Smoothing (NS) Our Model (TS)

  34. Estimating CTR for Content Match • We presented a method to estimate • rates of extremely rare events • at multiple resolutions • under severe sparsity constraints • Key points: • Tree-structured generative model • Extremely fast parameter fitting

  35. Traffic Shaping • Estimating CTR for Content Match [KDD ‘07] • Traffic Shaping for Display Advertising [EC ‘12] • Theoretical underpinnings [COLT ‘10 best student paper]

  36. Traffic Shaping Which article summary should be picked? Ans:The one with highest expected CTR Which ad should be displayed? Ans:The ad that minimizes underdelivery Article pool

  37. Underdelivery • Advertisers are guaranteed some impressions (say, 1M) over some time (say, 2 months) • only to users matching their specs • only when they visit certain types of pages • only on certain positions on the page • An underdelivering ad is one that is likely to miss its guarantee

  38. Underdelivery • How can underdelivery be computed? • Need user traffic forecasts • Depends on other ads in the system • An ad-serving systemwill try to minimizeunder-delivery on thisgraph Demand dj Supply sℓ j ℓ Forecasted impressions(user, article, position) Ad inventory

  39. Traffic Shaping Which article summary should be picked? Ans:The one with highest expected CTR Which ad should be displayed? Ans:The ad that minimizes underdelivery Goal: Combine the two

  40. Traffic Shaping • Goal: Bias the article summary selection to • reduce under-delivery • but insignificant drop in CTR • AND do this in real-time

  41. Outline • Formulation as an optimization problem • Real-time solution • Empirical results

  42. Formulation Ad delivery fraction φℓj ℓ j Demand dj Traffic shaping fraction wki i Supply sk CTRcki k k:(user) j:(ads) i:(user, article) ℓ:(user, article, position)“Fully Qualified Impression” Goal: Infer traffic shaping fractions wki

  43. Ad delivery fraction φℓj Formulation Traffic shaping fraction wki A CTRcki • Full traffic shaping graph: • All forecasted user traffic X all available articles • arriving at the homepage, • or directly on article page • Goal: Infer wki • But forced to infer φℓjas well B C Full Traffic Shaping Graph

  44. Formulation sk wki cki i k ℓ j underdelivery (Satisfy demand constraints) demand Total user traffic flowing to j (accounting for CTR loss)

  45. Formulation i k ℓ j (Satisfy demand constraints) (Bounds on traffic shaping fractions) (Shape only available traffic) (Ad delivery fractions)

  46. Key Transformation • This allows a reformulation solely in terms of new variables zℓj • zℓj = fraction of supply that is shown ad j, assuming user always clicks article

  47. Formulation • Convex program  can be solved optimally

  48. Formulation • But we have another problem • At runtime, we must shape every incoming user without looking at the entire graph • Solution: • Periodically solve the convex problem offline • Store a cache derived from this solution • Reconstruct the optimal solution for each user at runtime, using only the cache

  49. Outline • Formulation as an optimization problem • Real-time solution • Empirical results

  50. Real-time solution Cache these Reconstruct using these All constraints can be expressed as constraints on σℓ

More Related