1 / 66

Fast Convergence of Regularized Learning in Games

Fast Convergence of Regularized Learning in Games. Vasilis Syrgkanis Microsoft Research NYC. Alekh Agarwal Microsoft Research NYC. Haipeng Luo Princeton University. Robert Schapire Microsoft Research NYC. Strategic Interactions in Computer Systems.

beck
Download Presentation

Fast Convergence of Regularized Learning in Games

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Convergence of Regularized Learning in Games Vasilis Syrgkanis Microsoft Research NYC Alekh Agarwal Microsoft Research NYC Haipeng Luo Princeton University Robert Schapire Microsoft Research NYC

  2. Strategic Interactions in Computer Systems • Game theoretic scenarios in modern computer systems

  3. Strategic Interactions in Computer Systems • Game theoretic scenarios in modern computer systems $ $ $ internet routing advertising auctions

  4. How Do Players Behave? • Classical game theory: players play according to Nash Equilibrium • How do players converge to equilibrium? • Nash Equilibrium is computationally hard • Most scenarios: repeated strategic interactions • Simple adaptive game playing more natural • Learn to play well over time from past experience • e.g. Dynamic bid optimization tools in online ad auctions internet routing advertising auctions

  5. How Do Players Behave? • Classical game theory: players play according to Nash Equilibrium • How do players converge to equilibrium? • Nash Equilibrium is computationally hard • Most scenarios: repeated strategic interactions • Simple adaptive game playing more natural • Learn to play well over time from past experience • e.g. Dynamic bid optimization tools in online ad auctions Caveats! internet routing advertising auctions

  6. How Do Players Behave? • Classical game theory: players play according to Nash Equilibrium • How do players converge to equilibrium? • Nash Equilibrium is computationally hard • Most scenarios: repeated strategic interactions • Simple adaptive game playingmore natural • Learn to play well over time from past experience • e.g. Dynamic bid optimization tools in online ad auctions Caveats! internet routing advertising auctions

  7. How Do Players Behave? • Classical game theory: players play according to Nash Equilibrium • How do players converge to equilibrium? • Nash Equilibrium is computationally hard • Most scenarios: repeated strategic interactions • Simple adaptive game playingmore natural • Learn to play well over time from past experience • e.g. Dynamic bid optimization tools in online ad auctions Caveats! internet routing advertising auctions

  8. No-Regret Learning in Games • Each player uses no-regret learning algorithm with good regret against adaptive adversaries • Regret of each player against any fixed strategy converges to zero • Many simple algorithms achieve no-regret • MWU, Regret Matching, Follow the Regularized/Perturbed Leader, Mirror Descent [Freund and Schapire 1995, Foster and Vohra 1997, Hart and Mas-Collel 2000, Cesa-Bianchi and Lugosi 2006,…]

  9. No-Regret Learning in Games • Each player uses no-regret learning algorithm with good regret against adaptive adversaries • Regret of each player against any fixed strategy converges to zero • Many simple algorithms achieve no-regret • MWU, Regret Matching, Follow the Regularized/Perturbed Leader, Mirror Descent [Freund and Schapire 1995, Foster and Vohra 1997, Hart and Mas-Collel 2000, Cesa-Bianchi and Lugosi 2006,…]

  10. No-Regret Learning in Games • Each player uses no-regret learning algorithm with good regret against adaptive adversaries • Regret of each player against any fixed strategy converges to zero • Many simple algorithms achieve no-regret • MWU, Regret Matching, Follow the Regularized/Perturbed Leader, Mirror Descent [Freund and Schapire 1995, Foster and Vohra 1997, Hart and Mas-Collel 2000, Cesa-Bianchi and Lugosi 2006,…]

  11. Known Convergence Results • Empirical distribution converges to generalization of Nash equilibrium: Coarse Correlated Equilibrium [Young2004] • Sum of player utilities – Welfare- converges to approximate optimality (under structural conditions on the game) [Roughgarden2010] • Convergence rate inherited from adversarial analysis: typically • impossible to improve against adversaries

  12. Known Convergence Results • Empirical distribution converges to generalization of Nash equilibrium: Coarse Correlated Equilibrium [Young2004] • Sum of player utilities – Welfare- converges to approximate optimality (under structural conditions on the game) [Roughgarden2010] • Convergence rate inherited from adversarial analysis: typically • impossible to improve against adversaries

  13. Known Convergence Results • Empirical distribution converges to generalization of Nash equilibrium: Coarse Correlated Equilibrium [Young2004] • Sum of player utilities – Welfare- converges to approximate optimality (under structural conditions on the game) [Roughgarden2010] • Convergence rate inheritedfrom adversarial analysis: typically • impossible to improve against adversaries

  14. Known Convergence Results • Empirical distribution converges to generalization of Nash equilibrium: Coarse Correlated Equilibrium [Young2004] • Sum of player utilities – Welfare- converges to approximate optimality (under structural conditions on the game) [Roughgarden2010] • Convergence rate inheritedfrom adversarial analysis: typically • impossible to improve against adversaries

  15. Main Question When all playersinvoke no-regret learning: Is convergence faster than adversarial rate?

  16. High-Level Main Results Yes! We prove that if each player’s algorithm: 1. Makes stable predictions 2. Regret bounded by stability of the environment Then convergence faster than Can be achieved by regularization and recencybias.

  17. High-Level Main Results Yes! We prove that if each player’s algorithm: 1. Makes stable predictions 2. Regret bounded by stability of the environment Then convergence faster than Can be achieved by regularization and recencybias.

  18. High-Level Main Results Yes! We prove that if each player’s algorithm: 1. Makes stable predictions 2. Regret bounded by stability of the environment Then convergence faster than Can be achieved by regularization and recencybias.

  19. Model and Technical Results

  20. Repeated Game Model • players play a normal form game for time steps • Each player has strategy space and utility • At each step ,each player : • Picks a distribution over strategies and draws • Receives expected utility: • Observes expected utility vector , where coordinate is expected utility had strategy been played

  21. Repeated Game Model • players play a normal form game for time steps • Each player has strategy space and utility • At each step ,each player : • Picks a distribution over strategies and draws • Receives expected utility: • Observes expected utility vector , where coordinate is expected utility had strategy been played set of available actions reward

  22. Repeated Game Model • players play a normal form game for time steps • Each player has strategy space and utility • At each step ,each player : • Picks a distribution over strategies and draws • Receives expected utility: • Observes expected utility vector , where coordinate is expected utility had strategy been played utility from winning an item at some price bids in an auction

  23. Repeated Game Model • players play a normal form game for time steps • Each player has strategy space and utility • At each step ,each player : • Picks a distribution over strategies and draws • Receives expected utility: • Observes expected utility vector , where coordinate is expected utility had strategy been played utility from winning an item at some price latency of chosen path bids in an auction paths in a network

  24. Repeated Game Model • players play a normal form game for time steps • Each player has strategy space and utility • At each step ,each player : • Picks a distribution over strategies and draws • Receives expected utility: • Observes expected utility vector , where coordinate is expected utility had strategy been played

  25. Repeated Game Model • players play a normal form game for time steps • Each player has strategy space and utility • At each step ,each player : • Picks a distribution over strategies and draws • Receives expected utility: • Observes expected utility vector , where coordinate is expected utility had strategy been played

  26. Repeated Game Model • players play a normal form game for time steps • Each player has strategy space and utility • At each step ,each player : • Picks a distribution over strategies and draws • Receives expected utility: • Observes expected utility vector , where coordinate is expected utility had strategy been played

  27. Repeated Game Model • players play a normal form game for time steps • Each player has strategy space and utility • At each step ,each player : • Picks a distribution over strategies and draws • Receives expected utility: • Observes expected utility vector , where coordinate is expected utility had strategy been played

  28. Regret of Learning Algorithm • Regret: gain from switching to best fixed strategy • Many algorithms achieve regret

  29. Regret of Learning Algorithm • Regret: gain from switching to best fixed strategy • Many algorithms achieve regret Expected utility if played all the time Expected utility of algorithm

  30. Regret of Learning Algorithm • Regret: gain from switching to best fixed strategy • Many algorithms achieve regret Expected utility if played all the time Expected utility of algorithm

  31. Main Results We give general sufficient conditions on learning algorithms (and concrete example families of algorithms) such that: Thm1. Each player’s regret is Convergence to CCE at Thm2. Sum of player regrets is Corollary. Convergence to approximately optimal welfare if the game is “smooth” at Thm3.Give generic way to modify an algorithm to achieve small regret against “nice” algorithms and against any adversarial sequence

  32. Main Results We give general sufficient conditions on learning algorithms (and concrete example families of algorithms) such that: Thm1.Each player’s regret is Convergence to CCE at Thm2. Sum of player regrets is Corollary. Convergence to approximately optimal welfare if the game is “smooth” at Thm3.Give generic way to modify an algorithm to achieve small regret against “nice” algorithms and against any adversarial sequence

  33. Main Results We give general sufficient conditions on learning algorithms (and concrete example families of algorithms) such that: Thm1.Each player’s regret is Convergence to CCE at Thm2.Sum of player regrets is Corollary. Convergence to approximately optimal welfare if the game is “smooth” at Thm3.Give generic way to modify an algorithm to achieve small regret against “nice” algorithms and against any adversarial sequence

  34. Main Results We give general sufficient conditions on learning algorithms (and concrete example families of algorithms) such that: Thm1.Each player’s regret is Convergence to CCE at Thm2.Sum of player regrets is Corollary. Convergence to approximately optimal welfare if the game is “smooth” at Thm3.Give generic way to modify an algorithm to achieve small regret against “nice” algorithms and against any adversarial sequence

  35. Main Results We give general sufficient conditions on learning algorithms (and concrete example families of algorithms) such that: Thm1.Each player’s regret is Convergence to CCE at Thm2.Sum of player regrets is Corollary. Convergence to approximately optimal welfare if the game is “smooth” at Thm3.Give generic way to modify an algorithm to achieve small regret against “nice” algorithms and against any adversarial sequence • Defined by [Roughgarden 2010] • Main tool for proving welfare bounds (price of anarchy) • Applicable to e.g. congestion games, ad auctions

  36. Main Results We give general sufficient conditions on learning algorithms (and concrete example families of algorithms) such that: Thm1.Each player’s regret is Convergence to CCE at Thm2.Sum of player regrets is Corollary. Convergence to approximately optimal welfare if the game is “smooth” at Thm3.Give generic way to modify an algorithm to achieve small regret against “nice” algorithms and against any adversarial sequence • Defined by [Roughgarden 2010] • Main tool for proving welfare bounds (price of anarchy) • Applicable to e.g. congestion games, ad auctions

  37. Related Work • [Daskalakis, Deckelbaum, Kim 2011] • Two player zero sum games • Marginal empirical distributions converge to Nash at rate • Unnatural coordinated dynamics • [Rakhlin, Sridharan 2013] • Two player zero sum games • Optimistic Mirror Descent and Optimistic Hedge • Sum of regrets => Marginal empirical dist. converge to Nash at

  38. Related Work • [Daskalakis, Deckelbaum, Kim 2011] • Two player zero sum games • Marginal empirical distributions converge to Nash at rate • Unnatural coordinated dynamics • [Rakhlin, Sridharan 2013] • Two player zero sum games • Optimistic Mirror Descent and Optimistic Hedge • Sum of regrets => Marginal empirical dist. converge to Nash at • This work: • General multi-player games vs. 2-player zero-sum • Convergence to CCE and Welfare vs. Nash • Sufficient conditions vs. specific algorithms • Give new example algorithms (e.g. general recency bias, Optimistic FTRL)

  39. Main Results We give general sufficient conditions on learning algorithms (and concrete example families of algorithms) such that: Thm1.Each player’s regret is Convergence to CCE at Thm2. Sum of player regrets is Corollary. Convergence to approximately optimal welfare if the game is “smooth” at Thm3.Give generic way to modify an algorithm to achieve small regret against “nice” algorithms and against any adversarial sequence

  40. Why Expect Faster Rates? • If your opponent doesn’t change mixed strategy a lot • Your expected utility from a strategy is approximately the same between iterations • Last iteration’s utility good proxy for next iteration’s utility

  41. Why Expect Faster Rates? • If your opponent doesn’t change mixed strategy a lot • Your expected utility from a strategy is approximately the same between iterations • Last iteration’s utility good proxy for next iteration’s utility

  42. Why Expect Faster Rates? • If your opponent doesn’t change mixed strategy a lot • Your expected utility from a strategy is approximately the same between iterations • Last iteration’s utility good proxy for next iteration’s utility

  43. Why Expect Faster Rates? • If your opponent doesn’t change mixed strategy a lot • Your expected utility from a strategy is approximately the same between iterations • Last iteration’s utility good proxy for next iteration’s utility

  44. Why Expect Faster Rates? • If your opponent doesn’t change mixed strategy a lot • Your expected utility from a strategy is approximately the same between iterations • Last iteration’s utility good proxy for next iteration’s utility

  45. Simplified Sufficient Conditions for Fast Convergence 1. Stability of Mixed Strategies 2. Regret Bounded by Stability of Utility Sequence Then, for , each player’s regret is

  46. Simplified Sufficient Conditions for Fast Convergence 1. Stability of Mixed Strategies 2. Regret Bounded by Stability of Utility Sequence Then, for , each player’s regret is

  47. Simplified Sufficient Conditions for Fast Convergence 1. Stability of Mixed Strategies 2. Regret Bounded by Stability of Utility Sequence Plus conditions on constants A,B,C,D

  48. Example Algorithm Hedge [Littlestone-Warmuth’94, Freund-Schapire’97] Lemma. Optimistic Hedge satisfies both sufficient conditions Intuition. • Uses last iteration as “predictor” for next iteration • Equivalent to Follow the Regularized Leader; Regularization Stability

  49. Example Algorithm Hedge [Littlestone-Warmuth’94, Freund-Schapire’97] Lemma. Optimistic Hedge satisfies both sufficient conditions Intuition. • Uses last iteration as “predictor” for next iteration • Equivalent to Follow the Regularized Leader; Regularization Stability Past cumulative performance of action

  50. Example Algorithm Hedge [Littlestone-Warmuth’94, Freund-Schapire’97] Optimistic Hedge [Rakhlin-Sridharan’13] Lemma. Optimistic Hedge satisfies both sufficient conditions Intuition. • Uses last iteration as “predictor” for next iteration • Equivalent to Follow the Regularized Leader; Regularization Stability Past cumulative performance of action

More Related