1 / 58

Learning in Games

Learning in Games. Fictitious Play. Notation!. For n Players we have: n Finite Player’s Strategies Spaces S 1 , S 2 , …, S n n Opponent’s Strategies Spaces S -1 , S -2 , …, S -n n Payoff Functions u 1 , u 2 ,…, u n For each i and each s -i in S -i a set of Best Responses BR i (s -i ).

tom
Download Presentation

Learning in Games

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning in Games

  2. Fictitious Play

  3. Notation! For n Players we have: • n Finite Player’s Strategies Spaces S1, S2, …, Sn • n Opponent’s Strategies Spaces S-1, S-2, …, S-n • n Payoff Functions u1, u2,…, un • For each i and each s-i in S-i a set of Best Responses BRi (s-i)

  4. What is Fictitious Play? Each player creates an assessment about the opponent’s strategies in form of a weight function:

  5. Prediction Probability of player i assigning to player –i playing s-i at time t:

  6. Fictious Play is … • … any rule that assigns NOT UNIQUE!

  7. Further Definitions In 2 Player games: Marginal empirical distributions of j’s play (j=-i)

  8. Asymptotic Behavior • Propositions: • Strict Nash equilibria are absorbing for the process of fictitious play. • Any pure-strategy steady state of fictitous play must be a Nash equilibrium

  9. Example “matching pennies” H T H T

  10. Example “matching pennies” H T H T Weights: Row Player Col Player H T H T T T

  11. Example “matching pennies” H T H T Weights: Row Player Col Player H H T T H H T T T T H T H T H T H T H T H T

  12. Example “matching pennies” H T H T Weights: Row Player Col Player H T H T T T H T H T T H H T H T H T H T

  13. Example “matching pennies” H T H T Weights: Row Player Col Player H T H T T T H T H T T H H T H T H T H T

  14. Example “matching pennies” H T H T Weights: Row Player Col Player H T H T T T H T H T T H H T H T T H H T H T

  15. Example “matching pennies” H T H T Weights: Row Player Col Player H T H T T T H T H T T H H T H T T H H T H T

  16. Example “matching pennies” H T H T Weights: Row Player Col Player H T H T T T H T H T T H H T H T T H H T H T H H

  17. Example “matching pennies” H T H T Weights: Row Player Col Player H T H T T T H T H T T H H T H T T H H T H T H H

  18. Example “matching pennies” H T H T Weights: Row Player Col Player H T H T T T H T H T T H H T H T T T H H T H T H H

  19. Weights: Row Player Col Player T T T H T H H H H H H H H T H T

  20. Convergence? Strategies cycle and do not converge … …but the marginal empirical distributions?

  21. MATLAB Simulation - Pennies Game Play Weight / Time Payoff

  22. Proposition Under fictitious play, if the empirical distributions over each player’s choices converge, the strategy profile corresponding to the product of these distributions is a Nash equilibrium.

  23. A B C 1,0 0,1 A 0,1 1,0 B 1,0 0,1 C Rock-Paper-Scissors Game Play Weight / Time Payoff

  24. 1,0 0,1 0,1 1,0 1,0 0,1 Rock-Paper-Scissors Game Play Weight / Time Payoff

  25. 0,0 1,0 0,1 0,1 0,0 1,0 1,0 0,1 0,0 Shapley Game Game Play Weight / Time Payoff Does not converge!

  26. A B A 0,0 1,1 1,1 0,0 B Initial weights: 1 1.4 1 1.4 Persistent miscoordination Game Play Weight / Time Payoff Nash: (1,0) (0,1) (0.5,0.5)

  27. A B A 0,0 1,1 1,1 0,0 B Initial weights: 2 1.4 2 1.4 Persistent miscoordination Game Play Weight / Time Payoff Nash: (1,0) (0,1) (0.5,0.5)

  28. A B A 0,0 1,1 1,1 0,0 B Initial weights: 2 2.4 2 2.4 Persistent Miscoordination Game Play Weight / Time Payoff Nash: (1,0) (0,1) (0.5,0.5)

  29. Summary on fictitious play • In case of convergence, the time average of strategies forms a Nash Equilibrium • The average payoff does not need to be the one of a Nash (e.g. Miscoordination) • Time average may not converge at all (e.g. Shapley Game)

  30. References • Fudenberg D., Levine D. K. (1998) The Theory of Learning in Games MIT Press

  31. Nash Convergence of Gradient Dynamics in General-Sum Games

  32. Notation • 2 Players: • Strategies and • Payoff matricies R= C=

  33. Objective Functions • Payoff Functions: • Vr(,)=r11()+r22((1-)(1-)) +r12((1-))+r21((1-)) • Vc(,)=c11()+c22((1-)(1-)) +c12((1-))+c21((1-))

  34. Hillclimbing Idea

  35. Gradient Ascent for Iterated Games • With u=(r11+r22)-(r21+r12) • u’=(c11+c22)-(c21+c12)

  36. can be arbitrary strategies Stepsize Update Rule

  37. Problem • Gradient can lead the players to an infeasible point outside the unit square. 1 0 1

  38. Solution: • Redefine the gradient to the projection of the true gradient onto the boundary. Let this denote the constrained dynamics! 1 0 1

  39. U Infinitesimal Gradient Ascent (IGA) Become functions of time!

  40. 1. Case: U is invertible The two possible qualitative forms of the unconstrained strategy pair:

  41. 2. Case: U is not invertible Some examples of qualitative forms of the unconstrained strategy pair:

  42. Convergence If both players follow the IGA rule, then both player’s average payoffs will converge to the expected payoff of some Nash equilibrium If the strategy pair trajectory converges at all, then it converges to a Nash pair.

  43. Proposition Both previous propositions also hold with finite decreasing step size

  44. References • Singh S., Kearns M., Yishay M. (2000) Nash Convergence of Gradient Dynamics in General-Sum Games Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, pages 541-548

  45. Dynamic computation of Nash equilibria in Two-Player general-sum games.

  46. Notation • 2 Players: • Strategies and • Payoff matricies R= C=

  47. Objective Functions • Payoff Functions: • Row Player: • Col Player:

  48. Observation! is linear in each pi and qj Let xi denote the pure strategy for action i. This means: If then the value of pi the payoff. increasing increases

  49. Hill climbing (again) Multiplicative Update Rules

  50. Hill climbing (again) System of Differential Equations (i=1..n)

More Related