Communication Networks

205 Views

Download Presentation
## Communication Networks

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Communication Networks**A Second Course Jean Walrand Department of EECS University of California at Berkeley**Concave, Learning, Cooperative**• Concave Games • Learning in Games • Cooperative Games**Concave Games**Motivation • In many applications, the possible actions belong to a continuous set.For instance one chooses prices, transmission rates, or power levels. • In such situations, one specifies reward functions instead of a matrix or rewards. • We explain results on Nash equilibria for such games**Concave Games: Preliminaries**Many situations are possible: 3 NE 1 NE No NE J.B. Rosen, “ Existence and Uniqueness of Equilibrium Points forConcave N-Person Games,” Econometrica, 33, 520-534, July 1965**Concave Game**Definition: Concave Game Definition: Nash Equilibrium**Concave Game**Proof Theorem: Existence**Concave Game**Specialized Case:**Concave Game**Definition: Diagonally Strictly Concave**Concave Game**Theorem: Uniqueness**Concave Game**Theorem: Uniqueness - Bilinear Case:**Concave Game**Local Improvements**Learning in Games**Motivation Examples Models Fictitious Play Stochastic Fictitious Play Fudenberg D. and D.K. Levine (1998), The Theory of Learning in Games. MIT Press, Cambridge, Massachusetts. Chapters 1, 2, 4.**Motivation**Explain equilibrium as result of players “learning” over time (instead of as the outcome of fully rational players with complete information)**Examples: 1**Fixed Player Model If P1 is patient and knows P2 chooses her play based on her forecast of P1’s plays, then P1 should always play U to lead P2 to play R A sophisticated and patient player who faces a naïve opponent can develop a reputation for playing a fixed strategy and obtain the rewards of a Stackelberg leader Large Population Models Most of the theory avoids possibility above by assuming random pairings in a large population of anonymous users In such a situation, P1 cannot really teach much to the rest of the population, so that myopic play (D, L) is optimal Naïve play: Ignore that you affect other players’ strategies**Examples: 2**Cournot Adjustment Model Each player selects best response to other player’s strategy in previous period Converges to unique NE in this case This adjustment is a bit naïve … 2 BR1 BR2 NE 1**Models**Learning Model: Specifies rules of individual players and examines their interactions in repeated game Usually: Same game is repeated (some work on learning from similar games) Fictitious Play: Players observe result of their own match, play best response to the historical frequency of play Partial Best-Response Dynamics: In each period, a fixed fraction of the population switches to a best response to the aggregate statistics from the previous period Replicator Dynamics: Share of population using each strategy grows at a rate proportional to that strategy’s current payoff**Fictitious Play**Each player computes the frequency of the actions of the other players (with initial weights) Each player selects best response to the empirical distribution (need not be product) Theorem: Strict NE are absorbing for FP If s is a pure strategy and is steady-state for FP, then s = NE Proof: Assume s(t) = s = strict NE. Then, with a := a(t) …, p(t+1) = (1 – a)p(t) + ad(s), so that u(t+1, r) = (1 – a)u(p(t), r) + au(d(s), r),which is maximized by r = s if u(p(t), r) is maximized by r = s. Converse: If converges, this means players do not want to deviate, so limit must be NE…**Fictitious Play**Assume initial weights (1.5, 2) and (2, 1.5). Then(T, T) (1.5, 3), (2, 2.5) (T, H), (T, H) (H, H), (H, H), (H, H) (H, T)… Theorem: If under FP empirical converge, then product converges to NE Proof: If strategies converge, this means players do not want to deviate, so limit must be NE… Theorem: Under FP, empirical converge if one of the following holds 2x2 with generic payoffs Zero-sum Solvable by iterated strict dominance … Note: Empirical distributions need not converge**Fictitious Play**Assume initial weights (1, 20.5) for P1 and P2. Then(A, A) (2, 20.5) (B, B) (A, A) (B, B) (A, A), etc Empirical frequencies converge to NE However, players get 0 Correlated strategies, not independent(Fix: Randomize …)**Stochastic Fictitious Play**Motivation: Avoid discontinuity in FP Hope for a stronger form of convergence: not only of the marginals, but also of the intended plays**Stochastic Fictitious Play**Definitions: Reward of i = u(i, s) + n(i, si), n has positive support on interval BR(i, s)(si) = P[n(i, si) is s.t. si = BR to s] Nash Distribution: if si = BR(i, s), all i Harsanyi’s Purification Theorem: For generic payoffs, ND NE if support of perturbation 0. Key feature: BR is continuous and close to original BR. Matching Pennies**Stochastic Fictitious Play**Theorem (Fudenberg and Kreps, 93): Assume 2x2 game has unique mixed NE If smoothing is small enough, then NE is globally stable for SFP Theorem (K&Y 95, B&H 96) Assume 2x2 game has unique strict NE The unique intersection of smoothed BR is a global attractor for SFPAssume 2x2 game has 2 strict NE and one mixed NE. The SFP converges to one of the strict NE, w.p. 1. Note: Cycling is possible for SFP in multi-player games**Stochastic Fictitious Play**Other justification for randomization: Protection against opponent’s mistakes Learning rules should be Safe: average utility ≥ minmax Universally consistent: utility ≥ utility if frequency were known but not order of plays Randomization can achieve universal consistency (e.g., SFP)**Stochastic Fictitious Play**Stimulus-Response (Reinforcement learning): Increase probability of plays that give good results General observation: It is difficult to discriminate learning models on the basis of experimental data: SFP, SR, etc. seem all about comparable**Cooperative Games**• Motivation • Notions of Equilibrium • Nash Bargaining Equilibrium • Shapley Value**Cooperative Games: Motivation**• The Nash equilibriums may not be the most desirable outcome for the players. • Typically, players benefit by cooperating. • We explore some notions of equilibrium that players achieve under cooperation.**Cooperative Games: Nash B.E.**Definition: Nash Bargaining Equilibrium Interpretation: Fact**Cooperative Games: Nash B.E.**Example: NBE: Social:**Cooperative Games: Nash B.E.**Axiomatic Justification At NE, sum of relative increases is zero.**Shapley Value**Example: Shapley Value:**Fixed Point Theorems**Theorem (Brower):**Brower**Labels (1, 3) Labels (2, 3) One path through doors (1, 2) mustend up in triangle (1, 2, 3).[Indeed: Odd numberof boundary doors.]**Brower**Take small triangle (1, 2, 3) Divide it into triangles as before; it contains another (1, 2, 3); Continue in this way. Pick z(n) in triangle n. Let z = lim z(n). Claim: f(z) = z. Proof: If f(z) is not z, then z(n) and f(z(n)) are in different small triangles at stage n; but then z(n) cannot be in a (1, 2, 3) triangle ….**Notes on Kakutani**Theorem: