an interactive learning approach to optimizing information retrieval systems l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
An Interactive Learning Approach to Optimizing Information Retrieval Systems PowerPoint Presentation
Download Presentation
An Interactive Learning Approach to Optimizing Information Retrieval Systems

Loading in 2 Seconds...

play fullscreen
1 / 53
marinel

An Interactive Learning Approach to Optimizing Information Retrieval Systems - PowerPoint PPT Presentation

74 Views
Download Presentation
An Interactive Learning Approach to Optimizing Information Retrieval Systems
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. An Interactive Learning Approach to Optimizing Information Retrieval Systems CMU ML Lunch September 27th, 2010 Yisong Yue Carnegie Mellon University

  2. Information Systems

  3. Interactive Learning Setting • Find the best ranking function (out of 1000) • Show results, evaluate using click logs • Clicks biased (users only click on what they see) • Explore / exploit problem

  4. Interactive Learning Setting • Find the best ranking function (out of 1000) • Show results, evaluate using click logs • Clicks biased (users only click on what they see) • Explore / exploit problem • Technical issues • How to interpret clicks? • What is the utility function? • What results to show users?

  5. Interactive Learning Setting • Find the best ranking function (out of 1000) • Show results, evaluate using click logs • Clicks biased (users only click on what they see) • Explore / exploit problem • Technical issues • How to interpret clicks? • What is the utility function? • What results to show users?

  6. Team-Game Interleaving (u=thorsten, q=“svm”) f1(u,q)  r1 f2(u,q)  r2 NEXTPICK 1. Kernel Machines http://svm.first.gmd.de/ 2. SVM-Light Support Vector Machine http://ais.gmd.de/~thorsten/svm light/ 3. Support Vector Machine and Kernel ... References http://svm.research.bell-labs.com/SVMrefs.html 4. Lucent Technologies: SVM demo applet http://svm.research.bell-labs.com/SVT/SVMsvt.html 5. Royal Holloway Support Vector Machine http://svm.dcs.rhbnc.ac.uk 1. Kernel Machines http://svm.first.gmd.de/ 2. Support Vector Machinehttp://jbolivar.freeservers.com/ 3. An Introduction to Support Vector Machineshttp://www.support-vector.net/ 4. Archives of SUPPORT-VECTOR-MACHINES ...http://www.jiscmail.ac.uk/lists/SUPPORT... 5. SVM-Light Support Vector Machine http://ais.gmd.de/~thorsten/svm light/ Interleaving(r1,r2) 1. Kernel Machines T2http://svm.first.gmd.de/ 2. Support Vector Machine T1http://jbolivar.freeservers.com/ 3. SVM-Light Support Vector Machine T2http://ais.gmd.de/~thorsten/svm light/ 4. An Introduction to Support Vector Machines T1http://www.support-vector.net/ 5. Support Vector Machine and Kernel ... ReferencesT2 http://svm.research.bell-labs.com/SVMrefs.html 6. Archives of SUPPORT-VECTOR-MACHINES ... T1http://www.jiscmail.ac.uk/lists/SUPPORT... 7. Lucent Technologies: SVM demo applet T2http://svm.research.bell-labs.com/SVT/SVMsvt.html Invariant: For all k, in expectation same number of team members in top k from each team. • Interpretation: (r1 Âr2) ↔ clicks(T1) > clicks(T2) [Radlinski, Kurup, Joachims; CIKM 2008]

  7. Setting • Find the best ranking function (out of 1000) • Evaluate using click logs • Clicks biased (users only click on what they see) • Explore / exploit problem • Technical issues • How to interpret clicks? • What is the utility function? • What results to show users?

  8. Interleave A vs B

  9. Interleave A vs C

  10. Interleave B vs C

  11. Interleave A vs B

  12. Dueling Bandits Problem • Given K bandits b1, …, bK • Each iteration: compare (duel) two bandits • E.g., interleaving two retrieval functions [Yue & Joachims, ICML 2009] [Yue, Broder, Kleinberg, Joachims, COLT 2009]

  13. Dueling Bandits Problem • Given K bandits b1, …, bK • Each iteration: compare (duel) two bandits • E.g., interleaving two retrieval functions • Cost function (regret): • (bt, bt’) are the two bandits chosen • b* is the overall best one • (% users who prefer best bandit over chosen ones) [Yue & Joachims, ICML 2009] [Yue, Broder, Kleinberg, Joachims, COLT 2009]

  14. Example 1 • P(f* > f) = 0.9 • P(f* > f’) = 0.8 • Incurred Regret = 0.7 • Example 2 • P(f* > f) = 0.7 • P(f* > f’) = 0.6 • Incurred Regret = 0.3 • Example 3 • P(f* > f) = 0.51 • P(f* > f) = 0.55 • Incurred Regret = 0.06

  15. Assumptions • P(bi > bj) = ½ + εij (distinguishability) • Strong Stochastic Transitivity • For three bandits bi > bj > bk : • Monotonicity property • Stochastic Triangle Inequality • For three bandits bi > bj > bk : • Diminishing returns property • Satisfied by many standard models • E.g., Logistic / Bradley-Terry

  16. Explore then Exploit • First explore • Try to gather as much information as possible • Accumulates regret based on which bandits we decide to compare • Then exploit • We have a (good) guess as to which bandit best • Repeatedly compare that bandit with itself • (i.e., interleave that ranking with itself)

  17. Naïve Approach • In deterministic case, O(K) comparisons to find max • Extend to noisy case: • Repeatedly compare until confident one is better • Problem: comparing two awful (but similar) bandits • Example: • P(A > B) = 0.85 • P(A > C) = 0.85 • P(B > C) = 0.51 • Comparing B and C requires many comparisons!

  18. Interleaved Filter • Choose candidate bandit at random ► [Yue, Broder, Kleinberg, Joachims, COLT 2009]

  19. Interleaved Filter • Choose candidate bandit at random • Make noisy comparisons (Bernoulli trial) against all other bandits simultaneously • Maintain mean and confidence interval for each pair of bandits being compared ► [Yue, Broder, Kleinberg, Joachims, COLT 2009]

  20. Interleaved Filter • Choose candidate bandit at random • Make noisy comparisons (Bernoulli trial) against all other bandits simultaneously • Maintain mean and confidence interval for each pair of bandits being compared • …until another bandit is better • With confidence 1 – δ ► [Yue, Broder, Kleinberg, Joachims, COLT 2009]

  21. Interleaved Filter • Choose candidate bandit at random • Make noisy comparisons (Bernoulli trial) against all other bandits simultaneously • Maintain mean and confidence interval for each pair of bandits being compared • …until another bandit is better • With confidence 1 – δ • Repeat process with new candidate • (Remove all empirically worse bandits) ► [Yue, Broder, Kleinberg, Joachims, COLT 2009]

  22. Interleaved Filter • Choose candidate bandit at random • Make noisy comparisons (Bernoulli trial) against all other bandits simultaneously • Maintain mean and confidence interval for each pair of bandits being compared • …until another bandit is better • With confidence 1 – δ • Repeat process with new candidate • (Remove all empirically worse bandits) • Continue until 1 candidate left ► [Yue, Broder, Kleinberg, Joachims, COLT 2009]

  23. Intuition • Simulate comparing candidate with all remaining bandits simultaneously [Yue, Broder, Kleinberg, Joachims, COLT 2009]

  24. Intuition • Simulate comparing candidate with all remaining bandits simultaneously • Example: • P(A > B) = 0.85 • P(A > C) = 0.85 • P(B > C) = 0.51 • B is candidate • # comparisons between B vs C bounded by comparisons between B vs A! [Yue, Broder, Kleinberg, Joachims, COLT 2009]

  25. Regret Analysis • Can model sequence of candidate bandits as a random walk. • Which will be the next candidate bandit? • O(Log K) rounds 1/3 1/3 1/3 [Yue, Broder, Kleinberg, Joachims, COLT 2009]

  26. Regret Analysis • After each round, we remove a constant fraction of the remaining bandits. • O(K) total matches 4 matches 2 matches 0 matches [Yue, Broder, Kleinberg, Joachims, COLT 2009]

  27. Regret Analysis • T – time horizon • K – # bandits / retrieval functions • ε – best vs 2nd best • Average regret RT / T → 0 • Information-theoretically optimal • Also need to prove correctness (see paper) [Yue, Broder, Kleinberg, Joachims, COLT 2009]

  28. Summary • Provably efficient online algorithm • (In a regret sense) • Also results for continuous (convex) setting

  29. Summary • Provably efficient online algorithm • (In a regret sense) • Also results for continuous (convex) setting • Requires comparison oracle • Reflects user preferences • Independence / Unbiased • Strong transitivity • Triangle Inequality

  30. Directions to Explore • Relaxing assumptions • E.g., strong transitivity & triangle inequality • Integrating context • Dealing with large K • Assume additional structure on for retrieval functions? • Other cost models • PAC setting (fixed budget, find the best possible) • Dynamic or changing user interests / environment

  31. Improving Comparison Oracle

  32. Improving Comparison Oracle • Dueling Bandits Problem • Interactive learning framework • Provably minimizes regret • Assumes idealized comparison oracle • Can we improve the comparison oracle? • Can we improve how we interpret results?

  33. Determining Statistical Significance • Each q, interleave A(q) and B(q), log clicks • t-Test • For each q, score: % clicks on A(q) • E.g., 3/4 = 0.75 • Sample mean score (e.g., 0.6) • Compute confidence (p value) • E.g., want p = 0.05 (i.e., 95% confidence) • More data, more confident

  34. Limitation • Example: query session with 2 clicks • One click at rank 1 (from A) • Later click at rank 4 (from B) • Normally would count this query session as a tie

  35. Limitation • Example: query session with 2 clicks • One click at rank 1 (from A) • Later click at rank 4 (from B) • Normally would count this query session as a tie • But second click is probably more informative… • …so B should get more credit for this query

  36. Linear Model • Feature vector φ(q,c): • Weight of click is wTφ(q,c) [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

  37. Example • wTφ(q,c) differentiates last clicks and other clicks [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

  38. Example • wTφ(q,c) differentiates last clicks and other clicks • Interleave A vs B • 3 clicks per session • Last click 60% on result from A • Other 2 clicks random [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

  39. Example • wTφ(q,c) differentiates last clicks and other clicks • Interleave A vs B • 3 clicks per session • Last click 60% on result from A • Other 2 clicks random • Conventional w = (1,1) – has significant variance • Only count last click w = (1,0) – minimizes variance [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

  40. Scoring Query Sessions • Feature representation for query session: [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

  41. Scoring Query Sessions • Feature representation for query session: • Weighted score for query: • Positive score favors A, negative favors B [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

  42. Supervised Learning • Will optimize for z-Test: Inverse z-Test • Approximately equal t-Test for large samples • z-Score = mean / standard deviation (Assumes A > B) [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

  43. Experiment Setup • Data collection • Pool of retrieval functions • Hash users into partitions • Run interleaving of different pairs in parallel • Collected on arXiv.org • 2 pools of retrieval functions • Training Pool: (6 pairs) know A > B • New Pool: (12 pairs)

  44. Training Pool – Cross Validation

  45. Experimental Results • Inverse z-Test works well • Beats baseline on most of new interleaving pairs • Direction of tests all in agreement • In 6/12 pairs, for p=0.1, reduces sample size by 10% • In 4/12 pairs, achieves p=0.05, but not baseline • 400 to 650 queries per interleaving experiment • Weights hard to interpret (features correlated) • Largest weight: “1 if single click & rank > 1”

  46. Interactive Learning • Dueling Bandits Problem • System learns “on-the-fly”. • Maximize total user utility over time • Exploration / exploitation tradeoff

  47. Interactive Learning • Dueling Bandits Problem • System learns “on-the-fly”. • Maximize total user utility over time • Exploration / exploitation tradeoff • Interpreting Implicit Feedback • Supervised learning to learn better interpretation • How do we “close the loop”? • Simple yet practical model • Efficient & compatible with existing approaches

  48. Thank You! Slides, papers & software available at www.yisongyue.com

  49. Extra Slides