1 / 49

Beat the Mean Bandit

Beat the Mean Bandit. ICML 2011 Yisong Yue Carnegie Mellon University Joint work with Thorsten Joachims (Cornell University). Optimizing Information Retrieval Systems. Increasingly reliant on user feedback E.g., clicks on search results Online learning is a popular modeling tool

zagiri
Download Presentation

Beat the Mean Bandit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Beat the Mean Bandit ICML 2011 Yisong Yue Carnegie Mellon University Joint work with Thorsten Joachims (Cornell University)

  2. Optimizing Information Retrieval Systems • Increasingly reliant on user feedback • E.g., clicks on search results • Online learning is a popular modeling tool • Especially partial-information (bandit) settings • Our focus: learning from relative preferences • Motivated by recent work on interleaved retrieval evaluation (example following)

  3. Team Draft Interleaving(Comparison Oracle for Search) • Ranking A • Napa Valley – The authority for lodging... • www.napavalley.com • Napa Valley Wineries - Plan your wine... • www.napavalley.com/wineries • Napa Valley College • www.napavalley.edu/homex.asp • 4. Been There | Tips | Napa Valley • www.ivebeenthere.co.uk/tips/16681 • 5. Napa Valley Wineries and Wine • www.napavintners.com • 6. Napa Country, California – Wikipedia • en.wikipedia.org/wiki/Napa_Valley Ranking B 1. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley – The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org • Presented Ranking • Napa Valley – The authority for lodging... • www.napavalley.com • 2. Napa Country, California – Wikipedia • en.wikipedia.org/wiki/Napa_Valley • 3. Napa: The Story of an American Eden... • books.google.co.uk/books?isbn=... • Napa Valley Wineries – Plan your wine... • www.napavalley.com/wineries • 5. Napa Valley Hotels – Bed and Breakfast... • www.napalinks.com • Napa Balley College • www.napavalley.edu/homex.asp • 7 NapaValley.org • www.napavalley.org A B [Radlinski et al. 2008]

  4. Team Draft Interleaving(Comparison Oracle for Search) • Ranking A • Napa Valley – The authority for lodging... • www.napavalley.com • Napa Valley Wineries - Plan your wine... • www.napavalley.com/wineries • Napa Valley College • www.napavalley.edu/homex.asp • 4. Been There | Tips | Napa Valley • www.ivebeenthere.co.uk/tips/16681 • 5. Napa Valley Wineries and Wine • www.napavintners.com • 6. Napa Country, California – Wikipedia • en.wikipedia.org/wiki/Napa_Valley Ranking B 1. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley – The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org • Presented Ranking • Napa Valley – The authority for lodging... • www.napavalley.com • 2. Napa Country, California – Wikipedia • en.wikipedia.org/wiki/Napa_Valley • 3. Napa: The Story of an American Eden... • books.google.co.uk/books?isbn=... • Napa Valley Wineries – Plan your wine... • www.napavalley.com/wineries • 5. Napa Valley Hotels – Bed and Breakfast... • www.napalinks.com • Napa Balley College • www.napavalley.edu/homex.asp • 7 NapaValley.org • www.napavalley.org Click B wins! Click [Radlinski et al. 2008]

  5. Interleave A vs B

  6. Interleave A vs C

  7. Interleave B vs C

  8. Interleave A vs B

  9. Outline • Learning Formulation • Dueling Bandits Problem [Yue et al. 2009] • Modeling transitivity violation • E.g., (A >> B) AND (B >> C) IMPLIES (A >> C) ?? • Not done in previous work

  10. Outline • Learning Formulation • Dueling Bandits Problem [Yue et al. 2009] • Modeling transitivity violation • E.g., (A >> B) AND (B >> C) IMPLIES (A >> C) ?? • Not done in previous work • Algorithm: Beat-the-Mean • Empirical Validation

  11. Dueling Bandits Problem • Given K bandits b1, …, bK • Each iteration: compare (duel) two bandits • E.g., interleaving two retrieval functions [Yue et al. 2009]

  12. Dueling Bandits Problem • Given K bandits b1, …, bK • Each iteration: compare (duel) two bandits • E.g., interleaving two retrieval functions • Cost function (regret): • (bt, bt’) are the two bandits chosen • b* is the overall best one • (% users who prefer best bandit over chosen ones) [Yue et al. 2009]

  13. Example Pairwise Preferences • Values are Pr(row > col) – 0.5 • Derived from interleaving experiments on http://arXiv.org

  14. Example Pairwise Preferences • Compare E & F: • P(A > E) = 0.61 • P(A > F) = 0.61 • Incurred Regret = 0.22 • Values are Pr(row > col) – 0.5 • Derived from interleaving experiments on http://arXiv.org

  15. Example Pairwise Preferences • Compare B & C: • P(A > B) = 0.55 • P(A > C) = 0.55 • Incurred Regret = 0.10 • Values are Pr(row > col) – 0.5 • Derived from interleaving experiments on http://arXiv.org

  16. Example Pairwise Preferences Interleaving shows ranking produced by A. • Compare A & A: • P(A > A) = 0.50 • P(A > A) = 0.50 • Incurred Regret = 0.00 • Values are Pr(row > col) – 0.5 • Derived from interleaving experiments on http://arXiv.org

  17. Example Pairwise Preferences • Violation in internal consistency! • For strong stochastic transitivity: • A > D should be at least 0.06 • Values are Pr(row > col) – 0.5 • Derived from interleaving experiments on http://arXiv.org

  18. Example Pairwise Preferences • Violation in internal consistency! • For strong stochastic transitivity: • C > E should be at least 0.04 • Values are Pr(row > col) – 0.5 • Derived from interleaving experiments on http://arXiv.org

  19. Example Pairwise Preferences • Violation in internal consistency! • For strong stochastic transitivity: • D > F should be at least 0.04 • Values are Pr(row > col) – 0.5 • Derived from interleaving experiments on http://arXiv.org

  20. Modeling Assumptions • P(bi > bj) = ½ + εij • Let b1 be the best overall bandit • Relaxed Stochastic Transitivity • For three bandits b1 > bj > bk : • γ ≥ 1 (γ = 1 for strong transitivity **) • Relaxed internal consistency property • Stochastic Triangle Inequality • For three bandits b1 > bj > bk : • Diminishing returns property (** γ = 1 required in previous work, and required to apply for all bandit triplets)

  21. Example Pairwise Preferences γ = 1.5 • Values are Pr(row > col) – 0.5 • Derived from interleaving experiments on http://arXiv.org

  22. Beat-the-Mean

  23. Beat-the-Mean Comparison Results

  24. Beat-the-Mean Mean Score & Confidence Interval

  25. Beat-the-Mean A’s performance vs rest

  26. Beat-the-Mean A’s mean performance

  27. Beat-the-Mean

  28. Beat-the-Mean

  29. Beat-the-Mean

  30. Beat-the-Mean

  31. Beat-the-Mean

  32. Beat-the-Mean

  33. Beat-the-Mean

  34. Beat-the-Mean B dominates E! (B’s lower bound greater than E’s upper bound)

  35. Beat-the-Mean

  36. Beat-the-Mean

  37. Beat-the-Mean

  38. Beat-the-Mean B dominates F! (B’s lower bound greater than F’s upper bound)

  39. Beat-the-Mean

  40. Beat-the-Mean B dominates D! (B’s lower bound greater than D’s upper bound)

  41. Beat-the-Mean A dominates C! (A’s lower bound greater than C’s upper bound)

  42. Beat-the-Mean Eventually… A is last bandit remaining. A is declared best bandit!

  43. Regret Guarantee • Playing against mean bandit calibrates preference scores • Estimates of (active) bandits directly comparable • One estimate per active bandit = linear number of estimates

  44. Regret Guarantee • Playing against mean bandit calibrates preference scores • Estimates of (active) bandits directly comparable • One estimate per active bandit = linear number of estimates • We can bound comparisons needed to remove worst bandit • Varies smoothly with transitivity parameter γ • High probability bound • We can bound the regret incurred by each comparison • Varies smoothly with transitivity parameter γ

  45. Regret Guarantee • Playing against mean bandit calibrates preference scores • Estimates of (active) bandits directly comparable • One estimate per active bandit = linear number of estimates • We can bound comparisons needed to remove worst bandit • Varies smoothly with transitivity parameter γ • High probability bound • We can bound the regret incurred by each comparison • Varies smoothly with transitivity parameter γ • Thus, we can bound the total regret with high probability: • γ is typically close to 1 We also have a similar PAC guarantee.

  46. Regret Guarantee • Playing against mean bandit calibrates preference scores • Estimates of (active) bandits directly comparable • One estimate per active bandit = linear number of estimates • We can bound comparisons needed to remove worst bandit • Varies smoothly with transitivity parameter γ • High probability bound • We can bound the regret incurred by each comparison • Varies smoothly with transitivity parameter γ • Thus, we can bound the total regret with high probability: • γ is typically close to 1 Not possible with previous approaches! We also have a similar PAC guarantee.

  47. Simulation experiment where γ = 1.3 • Light = Beat-the-Mean • Dark = Interleaved Filter [Yue et al. 2009] • Beat-the-Mean maintains linear regret guarantee • Interleaved Filter suffers quadratic regret in the worst case

  48. Simulation experiment where γ = 1 (original DB setting) • Light = Beat-the-Mean • Dark = Interleaved Filter [Yue et al. 2009] • Beat-the-Mean has high probability bound • Beat-the-Mean exhibits significantly lower variance

  49. Conclusions • Online learning approach using pairwise feedback • Well-suited for optimizing information retrieval systems from user feedback • Models violations in preference transitivity • Algorithm: Beat-the-Mean • Regret linear in #bandits and logarithmic in #iterations • Degrades smoothly with transitivity violation • Stronger guarantees than previous work • Empirically supported

More Related