Catching the drift learning broad matches from clickthrough data
1 / 16

Catching the Drift: Learning Broad Matches from Clickthrough Data - PowerPoint PPT Presentation

  • Uploaded on

Catching the Drift: Learning Broad Matches from Clickthrough Data. Sonal Gupta , Mikhail Bilenko, Matthew Richardson University of Texas at Austin , Microsoft Research. kw 1 kw 11 kw 12   kw n kw n1 kw n2 . kw 1 kw 2  kw n. Ad Selection and Ranking. Ad 1

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Catching the Drift: Learning Broad Matches from Clickthrough Data' - oria

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Catching the drift learning broad matches from clickthrough data l.jpg

Catching the Drift: Learning Broad Matches from Clickthrough Data

Sonal Gupta, Mikhail Bilenko, Matthew Richardson

University of Texas at Austin, Microsoft Research

Introduction l.jpg



















Web Page






Expanded Keywords

Extracted Keywords


  • Keyword-based online advertising: bidded keywords are extracted from context

    • Context: query (search ads) or page (content ads)

  • Broad matching: expanding keywords via keyword-to-keywords mapping

    • Example: electric cars tesla, hybrids, toyotaprius, golf carts

  • Broad matching benefits advertisers (increased reach, less campaign tuning), users (more relevant ads), ad platform (higher monetization)

Selected Ads

Identifying broad matches l.jpg
Identifying Broad Matches

  • Good keyword mappings retrieve relevant ads that users click

  • How to measure what is relevant and likely to be clicked?

    • Human judgments: expensive, hard to scale

    • Past user clicks: provide click data for kw → kw’ when user was shown ad(kw') in context of kw

      • Highly available, less trustworthy

  • What similarity functions may indicate relevance of kw → kw' ?

    • Syntactic (edit distance, TF-IDF cosine, string kernels, …)

    • Co-occurrence (in documents, query sessions, bid campaigns, …)

    • Expanded representation (search result snippets, category bags, …)

Approach l.jpg

ϕ1(kw, kw')

ϕn(kw, kw')


  • Task: train a learner to estimate p(click| kw → kw') for any kw → kw'

  • Data

    • <kw, ad(kw'), click> triples from clickthrough logs, where kw → kw' was suggested by previous broad match mappings

  • Features

    • Convert each pair to a feature vector capturing similarities etc.

      (kw → kw') →

    • For each triple <kw, ad(kw'), click>, create an instance: (ϕ(kw, kw'), click)

  • Learner: max-margin averaged perceptron (strong theory, very efficient)

where ϕi(kw, kw') can be any function of kw, kw' or both

Example creating an instance l.jpg
Example: Creating an Instance

  • Historical broad match clickthrough data: kw kw' ad(kw') click event

  • digital slr canon rebelCanon Rebel Kit for $499click

  • seattle baseball mariners tickets Mariners season ticketsno click

  • Feature functions

  • Instances

  • [0.78 0.001 0.9], 1

  • [0.05 0.02 0.2], 0

Experiments l.jpg

  • Data

    • 2 months of previous broad match ads from Microsoft Content Ads logs

      • 1 month for training, 1 month for testing

    • 68 features (syntactic, co-occurrence based, etc.); greedy feature selection

  • Metrics

    • LogLoss:

    • LogLoss Lift: difference between obtained LogLoss and an oracle that has access to empirical p(click | kw → kw') in test set.

    • CTR and revenue results in live test with users

Live test results l.jpg

Use CTR prediction to maximize expected revenue

Re-rank mappings to incorporate revenue

+18% revenue, -2% CTR

Live Test Results

Online learning with amnesia l.jpg
Online Learning with Amnesia

  • Advertisers, campaigns, bidded keywords and delivery contexts change very rapidly: high concept drift

  • Recent data is more informative

    • Goal: utilize older data while capturing changes in distributions

  • Averaged Perceptron doesn’t capture drift

  • Solution: Amnesiac Averaged Perceptron

    • Exponential weight decay when averaging hypotheses

Contributions and conclusions l.jpg
Contributions and Conclusions

learning broad matches from implicit feedback

  • Combining arbitrary similarity measures/features

  • Using clickthrough logs as implicit feedback

  • Amnesiac Averaged Perceptron

    • Exponentially weighted averaging: distant examples “fade out”

    • Online learning adapts to market dynamics

Features and feature selection l.jpg
Features and Feature Selection

Co-occurrence feature examples:

User search sessions: keywords searched within 10 mins

Advertiser campaigns: keywords co-bidded by the same advertiser

Past clickthrough rates of original and broad matched keywords

Various syntactic similarities

Various existing broad matching lists

and so on…

Feature Selection:

A total of 68 features

Greedy feature selection


Additional information l.jpg
Additional Information

  • Estimation of expected value of click over all the ads shown for a broad match mapping E(p(click(ad(kw))|q))

  • Query Expansion vs. Broad Matching

    • Our broad matching algorithm can be extended for query expansion

    • But, broad matching is for a fixed set of bidded keywords

  • Forgetron vs. Amesiac Averaged Perceptron

    • Forgetron maintains a set of budget support vectors: stores examples explicitly and does not take into account all the data

    • AAP: weighted average over all the examples, no need to store examples explicitly