catching the drift learning broad matches from clickthrough data
Skip this Video
Download Presentation
Catching the Drift: Learning Broad Matches from Clickthrough Data

Loading in 2 Seconds...

play fullscreen
1 / 16

Catching the Drift: Learning Broad Matches from Clickthrough Data - PowerPoint PPT Presentation

  • Uploaded on

Catching the Drift: Learning Broad Matches from Clickthrough Data. Sonal Gupta , Mikhail Bilenko, Matthew Richardson University of Texas at Austin , Microsoft Research. kw 1 kw 11 kw 12   kw n kw n1 kw n2 . kw 1 kw 2  kw n. Ad Selection and Ranking. Ad 1

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Catching the Drift: Learning Broad Matches from Clickthrough Data' - oria

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
catching the drift learning broad matches from clickthrough data

Catching the Drift: Learning Broad Matches from Clickthrough Data

Sonal Gupta, Mikhail Bilenko, Matthew Richardson

University of Texas at Austin, Microsoft Research



















Web Page






Expanded Keywords

Extracted Keywords

  • Keyword-based online advertising: bidded keywords are extracted from context
    • Context: query (search ads) or page (content ads)
  • Broad matching: expanding keywords via keyword-to-keywords mapping
    • Example: electric cars tesla, hybrids, toyotaprius, golf carts
  • Broad matching benefits advertisers (increased reach, less campaign tuning), users (more relevant ads), ad platform (higher monetization)

Selected Ads

identifying broad matches
Identifying Broad Matches
  • Good keyword mappings retrieve relevant ads that users click
  • How to measure what is relevant and likely to be clicked?
    • Human judgments: expensive, hard to scale
    • Past user clicks: provide click data for kw → kw’ when user was shown ad(kw') in context of kw
      • Highly available, less trustworthy
  • What similarity functions may indicate relevance of kw → kw' ?
    • Syntactic (edit distance, TF-IDF cosine, string kernels, …)
    • Co-occurrence (in documents, query sessions, bid campaigns, …)
    • Expanded representation (search result snippets, category bags, …)
ϕ1(kw, kw')

ϕn(kw, kw')

  • Task: train a learner to estimate p(click| kw → kw') for any kw → kw'
  • Data
    • triples from clickthrough logs, where kw → kw' was suggested by previous broad match mappings
  • Features
    • Convert each pair to a feature vector capturing similarities etc.

(kw → kw') →

    • For each triple , create an instance: (ϕ(kw, kw'), click)
  • Learner: max-margin averaged perceptron (strong theory, very efficient)

where ϕi(kw, kw') can be any function of kw, kw' or both

example creating an instance
Example: Creating an Instance
  • Historical broad match clickthrough data: kw kw' ad(kw') click event
  • digital slr canon rebelCanon Rebel Kit for $499click
  • seattle baseball mariners tickets Mariners season ticketsno click
  • Feature functions
  • Instances
  • [0.78 0.001 0.9], 1
  • [0.05 0.02 0.2], 0
  • Data
    • 2 months of previous broad match ads from Microsoft Content Ads logs
      • 1 month for training, 1 month for testing
    • 68 features (syntactic, co-occurrence based, etc.); greedy feature selection
  • Metrics
    • LogLoss:
    • LogLoss Lift: difference between obtained LogLoss and an oracle that has access to empirical p(click | kw → kw') in test set.
    • CTR and revenue results in live test with users
live test results
Use CTR prediction to maximize expected revenue

Re-rank mappings to incorporate revenue

+18% revenue, -2% CTR

Live Test Results
online learning with amnesia
Online Learning with Amnesia
  • Advertisers, campaigns, bidded keywords and delivery contexts change very rapidly: high concept drift
  • Recent data is more informative
    • Goal: utilize older data while capturing changes in distributions
  • Averaged Perceptron doesn’t capture drift
  • Solution: Amnesiac Averaged Perceptron
    • Exponential weight decay when averaging hypotheses
contributions and conclusions
Contributions and Conclusions

learning broad matches from implicit feedback

  • Combining arbitrary similarity measures/features
  • Using clickthrough logs as implicit feedback
  • Amnesiac Averaged Perceptron
    • Exponentially weighted averaging: distant examples “fade out”
    • Online learning adapts to market dynamics
features and feature selection
Features and Feature Selection

Co-occurrence feature examples:

User search sessions: keywords searched within 10 mins

Advertiser campaigns: keywords co-bidded by the same advertiser

Past clickthrough rates of original and broad matched keywords

Various syntactic similarities

Various existing broad matching lists

and so on…

Feature Selection:

A total of 68 features

Greedy feature selection


additional information
Additional Information
  • Estimation of expected value of click over all the ads shown for a broad match mapping E(p(click(ad(kw))|q))
  • Query Expansion vs. Broad Matching
    • Our broad matching algorithm can be extended for query expansion
    • But, broad matching is for a fixed set of bidded keywords
  • Forgetron vs. Amesiac Averaged Perceptron
    • Forgetron maintains a set of budget support vectors: stores examples explicitly and does not take into account all the data
    • AAP: weighted average over all the examples, no need to store examples explicitly