1 / 16

Catching the Drift: Learning Broad Matches from Clickthrough Data

Catching the Drift: Learning Broad Matches from Clickthrough Data. Sonal Gupta , Misha Bilenko , Matt Richardson. kw 1 kw 11 kw 12   kw n kw n1 kw n2 . kw 1 kw 2  kw n. Ad Selection and Ranking. Ad 1 Ad 2  Ad k. Query or Web Page. Broad Match

Olivia
Download Presentation

Catching the Drift: Learning Broad Matches from Clickthrough Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Catching the Drift: Learning Broad Matches from Clickthrough Data Sonal Gupta, Misha Bilenko, Matt Richardson

  2. kw1 kw11 kw12   kwn kwn1 kwn2  kw1 kw2  kwn Ad Selection and Ranking Ad1 Ad2  Adk Query or Web Page Broad Match Expansion Keyword Extraction Expanded Keywords Extracted Keywords Introduction • Keyword-based online advertising: bidded keywords are extracted from context • Context: query (search ads) or page (content ads) • Broad matching: expanding keywords via keyword-to-keywords mapping • Example: electric cars tesla, hybrids, toyotaprius, golf carts • Broad matching benefits advertisers (increased reach, less campaign tuning), users (more relevant ads), ad platform (higher monetization) Selected Ads

  3. Identifying Broad Matches • Good keyword mappings retrieve relevant ads that users click • How to measure what is relevant and likely to be clicked? • Human judgments: expensive, hard to scale • Past user clicks: provide data for kw → kw’ when user was shown ad(kw') in context of kw • Highly available, less trustworthy • What similarity functions may indicate relevance of kw → kw' ? • Syntactic (edit distance, TF-IDF cosine, string kernels, …) • Co-occurrence (in documents, query sessions, bid campaigns, …) • Expanded representation (search result snippets, category bags, …)

  4. ϕ1(kw, kw') … ϕn(kw, kw') Approach • Task: train a learner to estimate p(click| kw → kw') for any kw → kw' • Data • <kw, ad(kw'), click> triples from clickthrough logs, where kw → kw' was suggested by previous broad match mappings • Features • Convert each pair to a feature vector capturing similarities etc. (kw → kw') → • For each triple <kw, ad(kw'), click>, create an instance: (ϕ(kw, kw'), click) • Learner: max-margin averaged perceptron (strong theory, very efficient) where ϕi(kw, kw') can be any function of kw, kw' or both

  5. Example: Creating an Instance • Historical broad match clickthrough data: kw kw' ad(kw') click event • digital slr canon rebelCanon Rebel Kit for $499click • seattle baseball mariners tickets Mariners season ticketsno click • Feature functions • Instances • [0.78 0.001 0.9], 1 • [0.05 0.02 0.2], 0

  6. Experiments • Data • 2 months of previous broad match ads from Microsoft Content Ads logs • 1 month for training, 1 month for testing • 68 features (syntactic, co-occurrence based, etc.); greedy feature selection • Metrics • LogLoss: • LogLoss Lift: difference between obtained LogLoss and an oracle that has access to empirical p(click | kw → kw') in test set. • CTR and revenue improvements in live test with users

  7. Results

  8. Use CTR prediction to maximize expected revenue Re-rank mappings to incorporate revenue +18% revenue, -2% CTR Live Test Results

  9. Online Learning with Amnesia • Advertisers, campaigns, bidded keywords and delivery contexts change very rapidly: high concept drift • Recent data is more informative • Goal: utilize older data while capturing changes in distributions • Averaged Perceptron doesn’t capture drift • Solution: Amnesiac Averaged Perceptron • Exponential weight decay when averaging hypotheses

  10. Results

  11. Contributions and Conclusions learning broad matches from implicit feedback • Combining arbitrary similarity measures/features • Using clickthrough logs as implicit feedback • Amnesiac Averaged Perceptron • Exponentially weighted averaging: distant examples “fade out” • Online learning adapts to market dynamics

  12. Thank You!

  13. Features and Feature Selection Co-occurrence feature examples: User search sessions: keywords searched within 10 mins Advertiser campaigns: keywords co-bidded by the same advertiser Past clickthrough rates of original and broad matched keywords Various syntactic similarities Various existing broad matching lists and so on… Feature Selection: A total of 68 features Greedy feature selection 13

  14. Additional Information • Estimation of expected value of click over all the ads shown for a broad match mapping E(p(click(ad(kw))|q)) • Query Expansion vs. Broad Matching • Our broad matching algorithm can be extended for query expansion • But, broad matching is for a fixed set of bidded keywords • Forgetron vs. Amesiac Averaged Perceptron • Forgetron maintains a set of budget support vectors: stores examples explicitly and does not take into account all the data • AAP: weighted average over all the examples, no need to store examples explicitly

  15. Results

  16. Amnesiac Averaged Perceptron

More Related