Catching the Drift: Learning Broad Matches from Clickthrough Data

Catching the Drift: Learning Broad Matches from Clickthrough Data Sonal Gupta, Misha Bilenko, Matt Richardson

kw1 kw11 kw12   kwn kwn1 kwn2  kw1 kw2  kwn Ad Selection and Ranking Ad1 Ad2  Adk Query or Web Page Broad Match Expansion Keyword Extraction Expanded Keywords Extracted Keywords Introduction • Keyword-based online advertising: bidded keywords are extracted from context • Context: query (search ads) or page (content ads) • Broad matching: expanding keywords via keyword-to-keywords mapping • Example: electric cars tesla, hybrids, toyotaprius, golf carts • Broad matching benefits advertisers (increased reach, less campaign tuning), users (more relevant ads), ad platform (higher monetization) Selected Ads

Identifying Broad Matches • Good keyword mappings retrieve relevant ads that users click • How to measure what is relevant and likely to be clicked? • Human judgments: expensive, hard to scale • Past user clicks: provide data for kw → kw’ when user was shown ad(kw') in context of kw • Highly available, less trustworthy • What similarity functions may indicate relevance of kw → kw' ? • Syntactic (edit distance, TF-IDF cosine, string kernels, …) • Co-occurrence (in documents, query sessions, bid campaigns, …) • Expanded representation (search result snippets, category bags, …)

ϕ1(kw, kw') … ϕn(kw, kw') Approach • Task: train a learner to estimate p(click| kw → kw') for any kw → kw' • Data • <kw, ad(kw'), click> triples from clickthrough logs, where kw → kw' was suggested by previous broad match mappings • Features • Convert each pair to a feature vector capturing similarities etc. (kw → kw') → • For each triple <kw, ad(kw'), click>, create an instance: (ϕ(kw, kw'), click) • Learner: max-margin averaged perceptron (strong theory, very efficient) where ϕi(kw, kw') can be any function of kw, kw' or both

Example: Creating an Instance • Historical broad match clickthrough data: kw kw' ad(kw') click event • digital slr canon rebelCanon Rebel Kit for $499click • seattle baseball mariners tickets Mariners season ticketsno click • Feature functions • Instances • [0.78 0.001 0.9], 1 • [0.05 0.02 0.2], 0

Experiments • Data • 2 months of previous broad match ads from Microsoft Content Ads logs • 1 month for training, 1 month for testing • 68 features (syntactic, co-occurrence based, etc.); greedy feature selection • Metrics • LogLoss: • LogLoss Lift: difference between obtained LogLoss and an oracle that has access to empirical p(click | kw → kw') in test set. • CTR and revenue improvements in live test with users

Results

Use CTR prediction to maximize expected revenue Re-rank mappings to incorporate revenue +18% revenue, -2% CTR Live Test Results

Online Learning with Amnesia • Advertisers, campaigns, bidded keywords and delivery contexts change very rapidly: high concept drift • Recent data is more informative • Goal: utilize older data while capturing changes in distributions • Averaged Perceptron doesn’t capture drift • Solution: Amnesiac Averaged Perceptron • Exponential weight decay when averaging hypotheses

Results

Contributions and Conclusions learning broad matches from implicit feedback • Combining arbitrary similarity measures/features • Using clickthrough logs as implicit feedback • Amnesiac Averaged Perceptron • Exponentially weighted averaging: distant examples “fade out” • Online learning adapts to market dynamics

Thank You!

Features and Feature Selection Co-occurrence feature examples: User search sessions: keywords searched within 10 mins Advertiser campaigns: keywords co-bidded by the same advertiser Past clickthrough rates of original and broad matched keywords Various syntactic similarities Various existing broad matching lists and so on… Feature Selection: A total of 68 features Greedy feature selection 13

Additional Information • Estimation of expected value of click over all the ads shown for a broad match mapping E(p(click(ad(kw))|q)) • Query Expansion vs. Broad Matching • Our broad matching algorithm can be extended for query expansion • But, broad matching is for a fixed set of bidded keywords • Forgetron vs. Amesiac Averaged Perceptron • Forgetron maintains a set of budget support vectors: stores examples explicitly and does not take into account all the data • AAP: weighted average over all the examples, no need to store examples explicitly

Results

Amnesiac Averaged Perceptron

Catching the Drift: Learning Broad Matches from Clickthrough Data