1 / 23

Collaboration in Repeated Games

Collaboration in Repeated Games. Michael L. Littman mlittman@cs.rutgers.edu Rutgers University. Motivation. Create agents that achieve their goals, perhaps working together. Separate payoff functions General sum Compute Nash equilibria (stable strategies) Algorithm assigns strategies

sburkett
Download Presentation

Collaboration in Repeated Games

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collaboration inRepeated Games Michael L. Littman mlittman@cs.rutgers.edu Rutgers University

  2. Motivation Create agents that achieve their goals, perhaps working together. • Separate payoff functions • General sum Compute Nash equilibria (stable strategies) • Algorithm assigns strategies • Not learning (at present) Planning and Learning with Hidden State

  3. A B Grid Game 3 (Hu & Wellman 01) U, D, R, L, X No move on collision Semiwalls (50%) -1 for step , -10 for collision, +100 for goal Both can get goal. Planning and Learning with Hidden State

  4. Repeated Markov Game S: Finite set of states A1, A2: Finite set of action choices R1(s, a1, a2): Payoff to first player R2(s, a1, a2): Payoff to second player P(s’| s, a1, a2): Transition function G: Goal (terminal) states (subset of S) Objective: maximize average (over repetitions) total reward Planning and Learning with Hidden State

  5. Nash Equilibrium Pair of strategies such that neither has incentive to deviate unilaterally. • can be a function of history • can be randomized Always exists. Claim: Assumes games are repeated; players choose best response. Planning and Learning with Hidden State

  6. A B Nash in Grid Game Average total: • (97, 48) • (48, 97) • (- ,-) (not Nash) • (64, 64) (not Nash) • (75, 75)? Planning and Learning with Hidden State

  7. A A A B A A B B B B Collaborative Solution Average total: • (96, 96) (not Nash) A won’t wait. B changes incentives. Planning and Learning with Hidden State

  8. Repeated Matrix Game One-state Markov game A1 = A2 = {cooperate, defect}: PD One (single-step) Nash Planning and Learning with Hidden State

  9. Nash-Value Problem Computational problem: • Given one-state Markov game (two tables) • Find a Nash (always exists) • Return each player’s value. In NP  co-NP; exact complexity open. Useful subproblem for Markov games. Planning and Learning with Hidden State

  10. Two Special Cases Saddle-point equilibrium • Deviation helps other player. • Value is unique solution to zero-sum game. Coordination equilibrium • Both players get maximum reward possible • Value is unique max value Planning and Learning with Hidden State

  11. Tit-for-Tat Saddle point, not coordination. Consider: cooperate, defect iff defected on. Better (3) than with defect-defect (1). Planning and Learning with Hidden State

  12. Tit-For-Tat is Nash (D,C) = 5 (C,C) = 3 Cooperation (TFT) is best response C: C, D: D = 3 C: C, D: C = 3 C: D, D: D = 1 C: D, D: C = 2.5 C D (C,D) = 0 (D,D) = 1 Planning and Learning with Hidden State

  13. Generalized TFT TFT stablizes mutually beneficial outcome. General class of policies: • Play beneficial action • Punish deviation to suppress temptation Need to generalize both components. Planning and Learning with Hidden State

  14. Security Level Values achievable without collaboration • Solution of zero-sum game (LP) • Can force a player to this value • Player can guarantee this value • Possibly stochastic Useful as a punishment, but also threshold PD: (1,1) Planning and Learning with Hidden State

  15. Two-Player Plot Mark payoff for each combo of actions Mark security level (C,D) (C,C) (D, D) (D,C) Planning and Learning with Hidden State

  16. Dominating Pair Let (s1, s2) be security-level values. Dominating pair of actions (a1, a2): c1 = R1(a1, a2), c2 = R2(a1, a2) c1 > s1, c2 > s2 ti is temptation payoff (t1 = maxa R1(a, a2)). ni > (ti-ci)/(ci-si) punishments sufficient to stablize cooperation (folk thm). Planning and Learning with Hidden State

  17. Alternation Repeat one, then the other. Repeat. If security below convex hull, can stablize. Planning and Learning with Hidden State

  18. Algorithm Find Nash pair for repeated matrix game in polynomial time. Idea: • If convex hull, use generalized TFT. • Else, can find one-step Nash quickly. Planning and Learning with Hidden State

  19. Proof Sketch Try improving S1 policy. If can’t, try improving S2. If can’t, Nash. If can, won’t hurt player 1 and player 1 can’t improve, so Nash. (S1, S2*) (S1, S2) (S1*, S2) Planning and Learning with Hidden State

  20. Symmetric Case R1(a, a’)= R2(a’, a) Value of game just maximum average! Alternate or accept security-level. Planning and Learning with Hidden State

  21. A B A B Symmetric Markov Game Episodic Roles chosen randomly Algorithm: • Maximize sum (MDP) • Security-level (0-sum) • Choose max if better Converges to Nash. Planning and Learning with Hidden State

  22. Conclusion Threats can help (Littman & Stone 01) Find repeated Nash in polynomial time Very simple structure for symmetric games Applies to Markov games Planning and Learning with Hidden State

  23. Discussion Objectives in game theory/RL for agents? Desiderata? How learn state space when repeated? Multiobjective negotiation? Learning: combine leading and following? Different unknown discount rates?? Incomplete rationality? Incomplete information of rewards? Planning and Learning with Hidden State

More Related