Catching the drift learning broad matches from clickthrough data l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

Catching the Drift: Learning Broad Matches from Clickthrough Data PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on
  • Presentation posted in: General

Catching the Drift: Learning Broad Matches from Clickthrough Data. Sonal Gupta , Mikhail Bilenko, Matthew Richardson University of Texas at Austin , Microsoft Research. kw 1 kw 11 kw 12   kw n kw n1 kw n2 . kw 1 kw 2  kw n. Ad Selection and Ranking. Ad 1

Download Presentation

Catching the Drift: Learning Broad Matches from Clickthrough Data

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Catching the drift learning broad matches from clickthrough data l.jpg

Catching the Drift: Learning Broad Matches from Clickthrough Data

Sonal Gupta, Mikhail Bilenko, Matthew Richardson

University of Texas at Austin, Microsoft Research


Introduction l.jpg

kw1

kw11

kw12

kwn

kwn1

kwn2

kw1

kw2

kwn

Ad

Selection

and

Ranking

Ad1

Ad2

Adk

Query

or

Web Page

Broad

Match

Expansion

Keyword

Extraction

Expanded Keywords

Extracted Keywords

Introduction

  • Keyword-based online advertising: bidded keywords are extracted from context

    • Context: query (search ads) or page (content ads)

  • Broad matching: expanding keywords via keyword-to-keywords mapping

    • Example: electric cars tesla, hybrids, toyotaprius, golf carts

  • Broad matching benefits advertisers (increased reach, less campaign tuning), users (more relevant ads), ad platform (higher monetization)

Selected Ads


Identifying broad matches l.jpg

Identifying Broad Matches

  • Good keyword mappings retrieve relevant ads that users click

  • How to measure what is relevant and likely to be clicked?

    • Human judgments: expensive, hard to scale

    • Past user clicks: provide click data for kw → kw’ when user was shown ad(kw') in context of kw

      • Highly available, less trustworthy

  • What similarity functions may indicate relevance of kw → kw' ?

    • Syntactic (edit distance, TF-IDF cosine, string kernels, …)

    • Co-occurrence (in documents, query sessions, bid campaigns, …)

    • Expanded representation (search result snippets, category bags, …)


Approach l.jpg

ϕ1(kw, kw')

ϕn(kw, kw')

Approach

  • Task: train a learner to estimate p(click| kw → kw') for any kw → kw'

  • Data

    • <kw, ad(kw'), click> triples from clickthrough logs, where kw → kw' was suggested by previous broad match mappings

  • Features

    • Convert each pair to a feature vector capturing similarities etc.

      (kw → kw') →

    • For each triple <kw, ad(kw'), click>, create an instance: (ϕ(kw, kw'), click)

  • Learner: max-margin averaged perceptron (strong theory, very efficient)

where ϕi(kw, kw') can be any function of kw, kw' or both


Example creating an instance l.jpg

Example: Creating an Instance

  • Historical broad match clickthrough data: kw kw' ad(kw') click event

  • digital slr canon rebelCanon Rebel Kit for $499click

  • seattle baseballmariners tickets Mariners season ticketsno click

  • Feature functions

  • Instances

  • [0.78 0.001 0.9], 1

  • [0.05 0.02 0.2], 0


Experiments l.jpg

Experiments

  • Data

    • 2 months of previous broad match ads from Microsoft Content Ads logs

      • 1 month for training, 1 month for testing

    • 68 features (syntactic, co-occurrence based, etc.); greedy feature selection

  • Metrics

    • LogLoss:

    • LogLoss Lift: difference between obtained LogLoss and an oracle that has access to empirical p(click | kw → kw') in test set.

    • CTR and revenue results in live test with users


Results l.jpg

Results


Live test results l.jpg

Use CTR prediction to maximize expected revenue

Re-rank mappings to incorporate revenue

+18% revenue, -2% CTR

Live Test Results


Online learning with amnesia l.jpg

Online Learning with Amnesia

  • Advertisers, campaigns, bidded keywords and delivery contexts change very rapidly: high concept drift

  • Recent data is more informative

    • Goal: utilize older data while capturing changes in distributions

  • Averaged Perceptron doesn’t capture drift

  • Solution: Amnesiac Averaged Perceptron

    • Exponential weight decay when averaging hypotheses


Results10 l.jpg

Results


Contributions and conclusions l.jpg

Contributions and Conclusions

learning broad matches from implicit feedback

  • Combining arbitrary similarity measures/features

  • Using clickthrough logs as implicit feedback

  • Amnesiac Averaged Perceptron

    • Exponentially weighted averaging: distant examples “fade out”

    • Online learning adapts to market dynamics


Thank you l.jpg

Thank You!


Features and feature selection l.jpg

Features and Feature Selection

Co-occurrence feature examples:

User search sessions: keywords searched within 10 mins

Advertiser campaigns: keywords co-bidded by the same advertiser

Past clickthrough rates of original and broad matched keywords

Various syntactic similarities

Various existing broad matching lists

and so on…

Feature Selection:

A total of 68 features

Greedy feature selection

13


Additional information l.jpg

Additional Information

  • Estimation of expected value of click over all the ads shown for a broad match mapping E(p(click(ad(kw))|q))

  • Query Expansion vs. Broad Matching

    • Our broad matching algorithm can be extended for query expansion

    • But, broad matching is for a fixed set of bidded keywords

  • Forgetron vs. Amesiac Averaged Perceptron

    • Forgetron maintains a set of budget support vectors: stores examples explicitly and does not take into account all the data

    • AAP: weighted average over all the examples, no need to store examples explicitly


Results15 l.jpg

Results


Slide16 l.jpg

Amnesiac Averaged Perceptron


  • Login