1 / 28

Fast Top-k Retrieval for Model Based Recommendation

Fast Top-k Retrieval for Model Based Recommendation. Deepak Agarwal (Yahoo! Research) Maxim Gurevich (Google) Presented by Guang LING. Outline. Motivation Problem definition The approach Binary classification L2-regression of scores Experiments Conclusion. Motivation.

fran
Download Presentation

Fast Top-k Retrieval for Model Based Recommendation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Top-k Retrieval for Model Based Recommendation Deepak Agarwal (Yahoo! Research) Maxim Gurevich (Google) Presented by Guang LING

  2. Outline • Motivation • Problem definition • The approach • Binary classification • L2-regression of scores • Experiments • Conclusion

  3. Motivation • Suppose that we • Are a search engine company (Google, say) • Want to display ads given a query • Have ML score for each ads given a query • Given a query • How to select the top-k ads to display • In a very short amount of time

  4. Motivation Request User profile Pages News Ads • Challenges: • Many users/requests • Many content items • Strict latency constraints • Increasingly complex matching logic

  5. Traditional IR solutions • Exploit content overlap matching function(tf-idf/cosine similarity) • Queries and documents “live” in the same high-dimensional space • Allows effectively reducing query result space • Highly optimized inverted index architecture • Joins inverted lists of query terms • Returns shortlist of result candidates • Few candidates undergo complex re-ranking

  6. Inverted index architecture Inverted index architecture Bag of words representation of documents Now given a query “canon camera”

  7. Inverted index architecture Inverted index architecture Bag of words representation of documents Now given a query “canon camera”

  8. Index based pre-filtering Expensive (query, ad1) (ad1,score’1) (ad1,score1) ML model Top-2 (ada,scorea) (query, ad2) (ad2,score2) (ad2,score’2) query … … … (adb,scoreb) (query, adn) (adn,scoren) (adK,score’K) Inverted index ML model Top-2 (ada,scorea) query (adx,score’x) (adx,scorex) (ady,score’y)

  9. Problem definition • Terminology: queries and documents • scr(d,q) – the (black-box) ML score of d on q • Goal: given q, find k items from D with highest scr(d,q) • Reduce to an inverted index query • Leverage extensive work on efficient inverted indexing • Challenges • How to construct the index • How to query it

  10. Prior work • Learning to rank • A different problem:second-stage reranking of few documents retrieved by the first stage • We are building the first stage given the second stage • S. Goel, J. Langford, and A. Strehl. Predictive indexing for fast search [NIPS08] • A heuristic for building the index given an ML function and a query log • Fast and simple index building and retrieval • Not the standard dot product scoring • Does not support the standard docId sorted indices - harder to integrate into existing systems • Lower accuracy

  11. The approach • Let ascr(d,q) be from the class of functions amenable to indexing: vector dot product • q = q is the original (sparse) query vector • d is not directly known • For each document: find d such that q’d scr(d,q) • Index d-s • Given q, query the index and retrieve top-K candidates according to ascr • Compute the true ML scores of candidates and return the top-k

  12. Constructing the index: an optimization problem • Objective: find D={d1, d2,…, dn} minimizing score loss on a representative query load Q • Sparsification • d-s are high dimensional • Dense d-s will result in prohibitive index size • Add index size constraint:

  13. Relaxing the problem • Do not know how to optimize directly • Relax the L0 index size constraint to L1 • Relax the objective function • Binary classification of being in top-k • L2 regression of ML scores

  14. Binary classification • For each document d • Learn vector d that predicts whether d is among top-k on qQ • Predict by simple thresholding operator q’d >  • Let y(q,d) be an indicator (-1,1) of whether d is among top-k on q • Efficiently solvable [Liblinear]

  15. L2 regression of scores • For all pairs (q,d): minimize the discrepancy between true and approximate scores • Again, decomposable by documents • Efficiently solvable by a coordinate descent algorithm

  16. Practical issues • Vectors d contain negative values • Less efficient retrieval • Independent solution for each document • Easy to parallelize • Easy to add new documents

  17. Experiments • Experiment setup • Synthetic model – simple • 10K document, 10K terms (words), 12K queries • For each term, generate a random permutation of 10K documents, assign weight (1-1/100)^i to the term for document at position i • Queries are length 3 terms generated from power-law distribution • Final score are summed score of individual scores for each term

  18. Experiments • Experiment setup • Synthetic model – complex • 10K document, 10K terms (words), 12K queries • For each term, generate a random permutation of 10K documents, assign weight (1-1/100)^i to the term for document at position i • Queries are length 5 terms generated from power-law distribution • In addition, each pair and triplet of terms are associated with a random permutation of documents and induced scores • Final score are summed score of individual scores for each term, pair and triplet

  19. Experiments • Experiment setup • CTR model • Computational advertising dataset • Logistic regression model • 50K documents (ads), sampled 50K queries • Trained on a day’s live traffic

  20. Experiments • Datasets • Two synthetic models: simple and complex • |D|=10K, |Q|=10K, 2K test queries • CTR model • |D|=50K, |Q|=50K, 50K test queries from a following day • Baselines • Random: k random documents • Static: fixed set of k documents with highest average scores • Predictive: Predictive indexing [Goel et al.]

  21. Evaluation metrics • Recall: exact retrieval of true top-k • Overly conservative • Score loss: average loss in the score of retrieved docs • Captures application specific utility, e.g., CTR

  22. Accuracy

  23. Accuracy (2)

  24. Index size

  25. Retrieval latency: CTR model • Disclaimer: prototype implementation • Brute-force (scoring all 50K ads): 4s per impression • Scoring top-100 candidates: 9ms • Top-100 retrieval • Baselines: ~0 (negligible) • Our approach: ~15ms

  26. Index construction • ~1min per document (prototype implementation) • Trivially parallelizable • Easy to add new documents

  27. Conclusions • A practical method for indexing black-box ML models • Integrates with existing indexing systems • Scales well to large itemsets • Tunable space-speed-accuracy tradeoff

  28. Thank You

More Related