1 / 66

Learning in Games

Learning in Games. Georgios Piliouras. Games (i.e. Multi-Body Interactions). Interacting entities Pursuing their own goals Lack of centralized control. Prediction?. Games. (review). n players Set of strategies S i for each player i Possible states (strategy profiles) S=×S i

loc
Download Presentation

Learning in Games

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning in Games GeorgiosPiliouras

  2. Games (i.e. Multi-Body Interactions) • Interacting entities • Pursuing their own goals • Lack of centralized control Prediction?

  3. Games (review) • n players • Set of strategies Si for each player i • Possible states (strategy profiles) S=×Si • Utility ui:S→R • Social Welfare Q:S→R • Extend to allow probabilitiesΔ(Si), Δ(S) ui(Δ(S))=E(ui(S)) Q(Δ(S))=E(Q(S))

  4. Zero-Sum Games & Equilibria (review) 1/3 1/3 1/3 Paper Scissors Rock 1/3 Rock Paper 1/3 Scissors 1/3 Nash: A product of mixed strategies s.t. no player has a profitable deviating strategy.

  5. Why do we study Nash eq? • Nash eq. have a simple intuitive definition. • Nash eq. are applicable to all games. • In some classes of games, Nash eq. is reasonably good predictor of rational self-interested behavior (e.g. zero-sum games). • Even in general games, Nash eq. analysis seems like a natural, albeit optimistic, first step in understanding rational behavior.

  6. Why is it optimistic? • Nash eq. analysis presumes that agents can resolve issues regarding: • Convergence: Agent behavior will converge to a Nash. • Coordination: If there are many Nash eq, agents can coordinate on one of them. • Communication: Agents are fully aware of each other utilities/rationality. • Complexity: Computing a Nash can be hard even from a centralized perspective.

  7. Today: Learning in Games • Agentbehavioris online learning algorithm/dynamic • Input: Current state of environment/other agents • (+ history) • Output: Chosen (randomized) action • Analyze the evolution of systems of coupled dynamics, as a way to predict interacting agent behavior. • Advantages: Weaker assumptions. • If dynamic converges  Nash equilibrium (may not converge) • Disadvantages: Harder to analyze

  8. Today: Learning in Games • Agentbehavioris online learning algorithm/dynamic • Input: Current state of environment/other agents • (+ history) • Output: Chosen (randomized) action • Class 1: Best (Better) Response Dynamics • Class 2: No-regret dynamics • (e.g. Weighted Majority/Hedge dynamic)

  9. Best Response Dynamics (BR) • Start from arbitrary state (Si) • Choose arbitrary agent i • Agent i deviates to a best (better) response given the strategies of other. • Advantages: Simple, widely applicable • Disadvantages: No intelligence/learning Does this work?

  10. Congestion Games • n players and m resources (“edges”) • Each strategy corresponds to a set of resources (“paths”) • Each edge has a cost function ce(x) that determines the cost as a function on the # of players using it. • Cost experienced by a player = sum of edge costs x x x x Cost(red)=6 Cost(green)=8 2x x x 2x

  11. Potential Games • A potential game is a game that exhibits a function Φ: S→R s.t. for every s ∈ S and every agent i, ui(si,s-i) - ui(s) = Φ(si,s-i) - Φ(s) • Every congestion game is a potential game: • This implies that any such game has pure NE and that best response converges. Speed?

  12. BR Cycles in Zero-Sum Games

  13. No Regret Learning Regret(T) in a history of T periods: No single action significantly outperforms the dynamic. total profit of best fixed action in hindsight - total profit of algorithm An algorithm is characterized as “no regret” if for every input sequence the regret grows sublinearly in T. [Blackwell 56], [Hannan 57], [Fundberg, Levine 94],…

  14. No Regret Learning No single action significantly outperforms the dynamic.

  15. The Multiplicative Weights Algorithm a.k.a. Hedge a.k.a. Weighted Majority[LittlestoneWarmuth ’94, Freund Schapire ‘99] • Pick s with probability proportional to (1-ε)total(s), where total(s)denotes cumulative cost in all past periods. • Why is it regret minimizing? • Proof on the board.

  16. BREAK

  17. No Regret and Equilibria Do no-regret algorithmsconverge to Nash equilibriain general games? Do no-regret algorithmsconverge to other equilibriain general games?

  18. Other Equilibrium Notions (review) 1/3 1/3 1/3 Rock Paper Scissors 1/3 Rock Paper 1/3 Scissors 1/3 Choose any of the green outcomes uniformly (prob. 1/9) Nash: Aprobability distribution over outcomes, that is a product of mixed strategies s.t. no player has a profitable deviating strategy.

  19. Other Equilibrium Notions (review) 1/3 1/3 1/3 Rock Paper Scissors 1/3 Rock Paper 1/3 Scissors 1/3 Nash: Aprobability distribution over outcomes, s.t. no player has a profitable deviating strategy. Coarse Correlated Equilibria (CCE):

  20. Other Equilibrium Notions (review) Rock Paper Scissors Rock Paper Scissors Aprobability distribution over outcomes, s.t. no player has a profitable deviating strategy. Coarse Correlated Equilibria (CCE):

  21. Other Equilibrium Notions (review) Choose any of the green outcomes uniformly (prob. 1/6) Rock Paper Scissors Rock Paper Scissors Aprobability distribution over outcomes, s.t. no player has a profitable deviating strategy. Coarse Correlated Equilibria (CCE):

  22. Other Equilibrium Notions (review) Is this a CE? NO Choose any of the green outcomes uniformly (prob. 1/6) Rock Paper Scissors Rock Paper Scissors Aprobability distribution over outcomes, s.t. no player has a profitable deviating strategy even if he can condition the advice from the dist. . Correlated Equilibria (CE):

  23. Other Equilibrium Notions (review) NE CE CCE Pure NE

  24. No-regret & CCE A history of no-regret algorithmsis a sequence of outcomes s.t. no agent has a single deviating action that can increase her average payoff. A Coarse Correlated Equilibrium is a probability distribution over outcomes s.t.no agent has a single deviating action that can increase her expected payoff.

  25. No Regret and Equilibria Do no-regret algorithmsconverge to Nash equilibriain general games? Do no-regret algorithmsconverge to other equilibriain general games? Do no-regret algorithmsconverge to Nash equilibriain interestinggames?

  26. CCE in Zero-Sum Games In general games, CCE ⊇ conv(NE) Why? In zero-sum games, the marginals and utilities of CCE and NE agree Why? What does it imply for no-regret algs?

  27. BREAK 2 Can learning beat NASH equilibria by an arbitrary factor?

  28. CCE in Congestion Games Load balancing: n balls, n bins Makespan: Expected maximum latency over all links … … c(x)=x c(x)=x c(x)=x

  29. CCE in Congestion Games Pure Nash Makespan: 1 … 1 1 1 … c(x)=x c(x)=x c(x)=x

  30. CCE in Congestion Games [Koutsoupias, Mavronicolas, Spirakis ’02], [Czumaj, Vöcking ’02] Mixed Nash Makespan: Θ(logn/loglogn) … 1/n 1/n 1/n … c(x)=x c(x)=x c(x)=x

  31. CCE in Congestion Games [Blum, Hajiaghayi, Ligett, Roth ’08] Coarse Correlated Equilibria Makespan: Exponentially worse Ω(√n) … … c(x)=x c(x)=x c(x)=x

  32. No-Regret Algs in Congestion Games Since worst case CCE can be reproduced by worst case no-regret algs, worst case no-regret algorithms do not converge to Nash equilibria in general.

  33. (Multiplicative Weights) Algorithm in (Potential) Games • (t) is the current state of the system (this is a tuple of randomized strategies, one for each player). • Each player tosses their coins and a specific outcome is realized. • Depending on the outcome of these random events, we transition to the next state. Infinite Markov Chains with Infinite States (t+1) (t+1) O(ε) (t) O(ε) (t+1) O(ε) Δ(S)

  34. (Multiplicative Weights) Algorithm in (Potential) Games • Problem 1: Hard to get intuition about the problem, let alone analyze. • Let’s try to come up with a “discounted” version of the problem. • Ideas?? Infinite Markov Chains with Infinite States (t+1) (t+1) O(ε) (t) O(ε) (t+1) O(ε) Δ(S)

  35. (Multiplicative Weights) Algorithm in (Potential) Games • Idea 1: Analyze expected motion. Infinite Markov Chains with Infinite States (t+1) (t+1) O(ε) (t) O(ε) (t+1) O(ε) Δ(S)

  36. (Multiplicative Weights) Algorithm in (Potential) Games • The system evolution is now deterministic. (i.e. there exists a function f, s.t. • I wish to analyze this function (e.g. find fixed points). • Idea 1: Analyze expected motion. E[ (t+1)] (t) E[ (t+1)]= f ( (t),ε ) O(ε) Δ(S)

  37. (Multiplicative Weights) Algorithm in (Potential) Games • Problem 2: The function f is still rather complicated. • Idea 2: I wish to analyze the MWA dynamics for small ε. • Use Taylor expansion to find a first order approximation to f. E[ (t+1)] (t) O(ε) Δ(S) f ( (t),ε) = f ( (t),0) + ε ×f ´( (t),0) + O(ε2)

  38. (Multiplicative Weights) Algorithm in (Potential) Games • As ε→0, the equation specifies a vector on each point of our state space (i.e. a vector field). This vector field defines a system of ODEs which we are going to analyze. f ( (t),ε)-f ( (t),0) = f´( (t),0) ε (t) f´( (t),0) Δ(S)

  39. Deriving the ODE • Taking expectations: • Differentiate w.r.t. ε, take expected value: • This is the replicator dynamic studied in evolutionary game theory.

  40. Motivating Example c(x)=x c(x)=x

  41. Motivating Example • Each player’s mixed strategy is summarized by a single number. (Probability of picking machine 1.) Plot mixed strategy profile in R2. Mixed Nash Pure Nash

  42. Motivating Example • Each player’s mixed strategy is summarized by a single number. (Probability of picking machine 1.) Plot mixed strategy profile in R2.

  43. Motivating Example • Even in the simplest case of two balls, two bins with linear utility the replicator equation has a nonlinear form.

  44. The potential function • The congestion game has a potential function • Let Ψ=E[Φ]. A calculation yields • Hence Ψ decreases except when every player randomizes over paths of equal expected cost (i.e. is a Lyapunov function of the dynamics). [Monderer-Shapley ’96]. Analyzing the spectrum of the Jacobian shows that in “generic” congestion games only pure Nash are stable. [Kleinberg-Piliouras-Tardos ‘09]

  45. Cyclic Matching Pennies (Jordan’s game) Profit of 1 if you mismatch opponent; 0 otherwise [Jordan ’93] H, T Nash Equilibrium ½, ½ H, T H, T ½, ½ ½, ½ • Social Welfare of NE: 3/2

  46. Cyclic Matching Pennies (Jordan’s game) Profit of 1 if you mismatch opponent; 0 otherwise [Jordan ’93] H, T Best Response Cycle H, T H, T • Social Welfareof NE: 3/2 • (H,H,T)

  47. Cyclic Matching Pennies (Jordan’s game) Profit of 1 if you mismatch opponent; 0 otherwise [Jordan ’93] H, T Best Response Cycle H, T H, T • Social Welfareof NE: 3/2 • (H,H,T),(H,T,T)

  48. Cyclic Matching Pennies (Jordan’s game) Profit of 1 if you mismatch opponent; 0 otherwise [Jordan ’93] H, T Best Response Cycle H, T H, T • Social Welfareof NE: 3/2 • (H,H,T),(H,T,T),(H,T,H),(T,T,H),(T,H,H),(T,H,T),(H,H,T) Social Welfare: 2

  49. Cyclic Matching Pennies (Jordan’s game) [Jordan ’93] H, T Best Response Cycle H, T H, T • Social Welfareof NE: 3/2 • (H,H,T),(H,T,T),(H,T,H),(T,T,H),(T,H,H),(T,H,T),(H,H,T) Social Welfare: 2

  50. Asymmetric Cyclic Matching Pennies [Jordan ’93] H, T Best Response Cycle 1/(M+1), M/(M+1) H, T H, T 1/(M+1), M/(M+1) 1/(M+1), M/(M+1) • Social Welfareof NE: 3M/(M+1) < 3 • (H,H,T),(H,T,T),(H,T,H),(T,T,H),(T,H,H),(T,H,T),(H,H,T) Social Welfare: M+1

More Related