Unit III: The Evolution of Cooperation

Unit III: The Evolution of Cooperation • Can Selfishness Save the Environment? • Repeated Games: the Folk Theorem • Evolutionary Games • A Tournament • How to Promote Cooperation/Unit Review 7/28 4/6 4/14

Repeated Games Some Questions: • What happens when a game is repeated? • Can threats and promises about the future influence behavior in the present? • Cheap talk • Finitely repeated games: Backward induction • Indefinitely repeated games: Trigger strategies

The Folk Theorem Theorem: Any payoff that pareto-dominates the one-shot NE can be supported in a SPNE of the repeated game, if the discount parameter is sufficiently high. (S,T) (R,R) (P,P) (T,S)

The Folk Theorem In other words, in the repeated game, if the future matters “enough” i.e., (d > d*), there are zillions of equilibria! (S,T) (R,R) (P,P) (T,S)

The Folk Theorem • The theorem tells us that in general, repeated games give rise to a very large set of Nash equilibria. In the repeated PD, these are pareto-rankable, i.e., some are efficient and some are not. • In this context,evolution can be seen as a process that selects for repeated game strategies with efficient payoffs. “Survival of the Fittest”

Evolutionary Games Fifteen months after I had begun my systematic enquiry, I happened to read for amusement ‘Malthus on Population’ . . . It at once struck me that . . . favorable variations would tend to be preserved, and unfavorable ones to be destroyed. Here then I had at last got a theory by which to work. Charles Darwin

Evolutionary Games • Evolutionary Stability (ESS) • Hawk-Dove: an example • The Replicator Dynamic • The Trouble with TIT FOR TAT • Designing Repeated Game Strategies • Finite Automata

Evolutionary Games Biological Evolution: Under the pressure of natural selection, any population (capable of reproduction and variation) will evolve so as to become better adapted to its environment, i.e., will develop in the direction of increasing “fitness.” Economic Evolution: Firms that adopt efficient “routines” will survive, expand, and multiply; whereas others will be “weeded out” (Nelson and Winters, 1982).

Evolutionary Stability Evolutionary Stable Strategy (ESS): A strategy is evolutionarily stable if it cannot be invaded by a mutant strategy. (Maynard Smith & Price, 1973) A strategy, A, is ESS, if i) V(A/A) > V(B/A), for all B ii) either V(A/A) > V(B/A) or V(A/B) > V(B/B), for all B

Hawk-Dove: an example Imagine a population of Hawks and Doves competing over a scarce resource (say food in a given area). The share of each type in the population changes according to the payoff matrix, so that payoffs determine the number of offspring left to the next generation. v = value of the resource c = cost of fighting H/D: Hawk gets resource; Dove flees (v, 0) D/D: Share resource (v/2, v/2) H/H: Share resource less cost of fighting ((v-c)/2, (v-c)/2) (See Hargreave-Heap and Varoufakis: 195-214; Casti: 71-75.)

Hawk-Dove: an example H D v = value of resource c = cost of fighting H(v-c)/2,(v-c)/2 v,0 D0,vv/2,v/2

Hawk-Dove: an example H D v = value of resource = 4 c = cost of fighting = 6 H-1,-1 4,0 D0,4 2, 2

Hawk-Dove: an example H D NE = {(1,0);(0,1);(2/3,2/3)} unstable stable H-1,-1 4,0 D0,4 2, 2 The mixed NE corresponds to a population that is 2/3 Hawks and 1/3 Doves

Hawk-Dove: an example H D NE = {(1,0);(0,1);(2/3,2/3)} unstable stable H-1,-1 4,0 D0,4 2, 2 Is any strategy ESS?

H D A strategy, A, is ESS, if i) V(A/A) > V(B/A), for all B ii) either V(A/A) > V(B/A) or V(A/B) > V(B/B), for all B EP2(O) = 3p EP2(F) = 5-5p p* = 5/8 Hawk-Dove: an example H D -1,-1 4,0 0,4 2,2 NE = {(1,0);(0,1);(2/3,2/3)}

H D A strategy, A, is ESS, if i) V(A/A) > V(B/A), for all B In other words, to be ESS, a strategy must be a NE with itself.EP2(O) = 3p EP2(F) = 5-5p p* = 5/8 Hawk-Dove: an example H D -1,-1 4,0 0,4 2,2 NE = {(1,0);(0,1);(2/3,2/3)}

H D A strategy, A, is ESS, if i) V(A/A) > V(B/A), for all B In other words, to be ESS, a strategy must be a NE with itself. Neither H nor D is ESS. (For these payoffs.)EP2(O) = 3p EP2(F) = 5-5p p* = 5/8 Hawk-Dove: an example H D -1,-1 4,0 0,4 2,2 NE = {(1,0);(0,1);(2/3,2/3)}

H D A strategy, A, is ESS, if i) V(A/A) > V(B/A), for all B ii) either V(A/A) > V(B/A) or V(A/B) > V(B/B), for all B What about the mixed NE strategy?= 3p EP2(F) = 5-5p p* = 5/8 Hawk-Dove: an example H D -1,-1 4,0 0,4 2,2 NE = {(1,0);(0,1);(2/3,2/3)}

H D V(H/H) = -1 V(H/D) = 4 V(D/H) = 0 V(D/D) = 2 V(H/M) = 2/3V(H/H)+1/3V(H/D) = 2/3 V(M/H) = 2/3V(H/H)+1/3V(D/H) = -2/3 V(D/M) = 2/3V(D/H)+1/3V(D/D) = 2/3 V(M/D) = 2/3V(H/D)+1/3V(D/D) = 10/3 V(M/M) = 2/3V(D/H)+1/3V(D/D) = 2/3 Hawk-Dove: an example H D -1,-1 4,0 0,4 2,2 Where M is the mixed strategy 2/3 Hawk, 1/3 Dove NE = {(1,0);(0,1);(2/3,2/3)}

H D V(H/H) = -1 V(H/D) = 4 V(D/H) = 0 V(D/D) = 2 V(H/M) = 2/3V(H/H)+1/3V(H/D) = 2/3 V(M/H) = 2/3 ( -1 ) +1/3 ( 4 ) = 2/3 V(D/M) = 2/3V(D/H)+1/3V(D/D) = 2/3 V(M/D) = 2/3V(H/D)+1/3V(D/D) = 10/3 V(M/M) = 2/3V(D/H)+1/3V(D/D) = 2/3 Hawk-Dove: an example H D -1,-1 4,0 0,4 2,2 NE = {(1,0);(0,1);(2/3,2/3)}

H D V(H/H) = -1 V(H/D) = 4 V(D/H) = 0 V(D/D) = 2 V(H/M) = 2/3V(H/H)+1/3V(H/D) = 2/3 V(M/H) = 2/3V(H/H)+1/3V(D/H) = -2/3 V(D/M) = 2/3V(D/H)+1/3V(D/D) = 2/3 V(M/D) = 2/3V(H/D)+1/3V(D/D) = 10/3 V(M/M) = 4/9V(H/H)+2/9V(H/D) = 2/9V(D/H)+1/9V(D/D) = 2/3 Hawk-Dove: an example H D -1,-1 4,0 0,4 2,2 NE = {(1,0);(0,1);(2/3,2/3)}

H D To be an ESS i) V(M/M) > V(B/M), for all B ii) either V(M/M) > V(B/M) or V(M/B) > V(B/B), for all B (O) = 3p EP2(F) = 5-5p p* = 5/8 Hawk-Dove: an example H D -1,-1 4,0 0,4 2,2 NE = {(1,0);(0,1);(2/3,2/3)}

H D To be an ESS i) V(M/M) = V(H/M) = V(D/M) = 2/3 ii) either V(M/M) > V(B/M) or V(M/B) > V(B/B), for all B (O) = 3p EP2(F) = 5-5p p* = 5/8 Hawk-Dove: an example H D -1,-1 4,0 0,4 2,2 NE = {(1,0);(0,1);(2/3,2/3)}

H D To be an ESS i) V(M/M) = V(H/M) = V(D/M) = 2/3 ii) either V(M/M) > V(B/M) or V(M/B) > V(B/B), for all B (O) = 3p EP2(F) = 5-5p p* = 5/8 Hawk-Dove: an example H D -1,-1 4,0 0,4 2,2 V(M/D) > V(D/D) 10/3 > 2 V(M/H) > V(H/H) -2/3 > -1 NE = {(1,0);(0,1);(2/3,2/3)}

Evolutionary Stability in IRPD? Evolutionary Stable Strategy (ESS): A strategy is evolutionarily stable if it cannot be invaded by a mutant strategy. (Maynard Smith & Price, 1973) Is D an ESS? i) V(D/D) > V(STFT/D) ? ii) V(D/D) > V(STFT/D) or V(D/STFT) > V(STFT/STFT) ? Consider a mutant strategy called e.g., SUSPICIOUS TIT FOR TAT (STFT). STFT defects on the first round, then plays like TFT

Evolutionary Stability in IRPD? Evolutionary Stable Strategy (ESS): A strategy is evolutionarily stable if it cannot be invaded by a mutant strategy. (Maynard Smith & Price, 1973) Is D an ESS? i) V(D/D) = V(STFT/D) ii) V(D/D) = V(STFT/D) or V(D/STFT) = V(STFT/STFT) Consider a mutant strategy called e.g., SUSPICIOUS TIT FOR TAT (STFT). STFT defects on the first round, then plays like TFT D and STFT are “neutral mutants”

Evolutionary Stability in IRPD? • Axelrod & Hamilton (1981) demonstrated that D is not an ESS, opening the way to subsequent tournament studies of the game. • This is a sort-of Folk Theorem for evolutionary games: In the one-shot Prisoner’s Dilemma, DEFECT is strictly dominant. But in the repeated game, ALWAYS DEFECT (D) can be invaded by a mutant strategy, e.g., SUSPICIOUS TIT FOR TAT (STFT). • Many cooperative strategies do better than D, thus they can gain a foothold and grow as a share of the population. • Depending on the initial population, the equilibrium reached can exhibit any amount of cooperation. • Is STFT an ESS?

Evolutionary Stability in IRPD? It can be shown that there is no ESS in IRPD (Boyd & Lorberbaum, 1987; Lorberbaum, 1994). There can be stable polymorphisms among neutral mutants, whose realized behaviors are indistinguishable from one another. (This is the case, for example, of a population of C and TFT). Noise If the system is perturbed by “noise,” these behaviors become distinct and differences in their reproductive success rates are amplified. As a result, interest has shifted from the proof of the existence of a solution to the design of repeated game strategies that perform well against other sophisticated strategies.

Replicator Dynamics • Consider a population of strategies competing over a niche that can only maintain a fixed number of individuals, i.e., the population’s size is upwardly bounded by the system’s carrying capacity. • In each generation, each strategy is matched against every other, itself, & RANDOM in pairwise games. • Between generations, the strategies reproduce, where the chance of successful reproduction (“fitness”) is determined by the payoffs (i.e., payoffs play the role of reproductive rates). • Then, strategies that do better than average will grow as a share of the population and those that do worse than average will eventually die-out. . .

Replicator Dynamics There is a very simple way to describe this process. Let: x(A) = the proportion of the population using strategy A in a given generation; V(A) = strategyA’s tournament score; V = the population’s average score. Then A’s population share in the next generation is: x’(A) = x(A) V(A) V

Replicator Dynamics For any finite set of strategies, the replicator dynamic will attain a fixed-point, where population shares do not change and all strategies are equally fit, i.e., V(A) = V(B), for all B. However, the dynamic described is population-specific. For instance, if the population consists entirely of naive cooperators (ALWAYS COOPERATE), then x(A) = x’(A) = 1, and the process is at a fixed-point. To be sure, the population is in equilibrium, but only in a very weak sense. For if a single D strategy were to “invade” the population, the system would be driven away from equilibrium, and C would be driven toward extinction.

Simulating Evolution An evolutionary model includes three components: Reproduction + Selection + Variation Invasion Reproduction Variation Mechanism Selection Mechanism Population of Strategies Mutation or Learning Competition

The Trouble with TIT FOR TAT TIT FOR TAT is susceptible to 2 types of perturbations: Mutations: random Cs can invade TFT (TFT is not ESS), which in turn allows exploiters to gain a foothold. Noise: a “mistake” between a pair of TFTs induces CD, DC cycles (“mirroring” or “echo” effect). TIT FOR TAT never beats its opponent; it wins because it elicits reciprocal cooperation. It never exploits “naively” nice strategies. (See Poundstone: 242-248; Casti 76-84.)

The Trouble with TIT FOR TAT Noise in the form of random errors in implementing or perceiving an action is a common problem in real-world interactions. Such misunderstandings may lead “well-intentioned” cooperators into periods of alternating or mutual defection resulting in lower tournament scores. TFT: C C C C TFT: C C C D “mistake”

The Trouble with TIT FOR TAT Noise in the form of random errors in implementing or perceiving an action is a common problem in real-world interactions. Such misunderstandings may lead “well-intentioned” cooperators into periods of alternating or mutual defection resulting in lower tournament scores. TFT: C C C C D C D …. TFT: C C C D C D C …. “mistake” Avg Payoff = R (T+S)/2

The Trouble with TIT FOR TAT Nowak and Sigmund (1993) ran an extensive series of computer-based experiments and found the simple learning rule PAVLOV outperformed TIT FOR TAT in the presence of noise. PAVLOV(win-stay, loose-switch) Cooperate after both cooperated or both defected; otherwise defect.

The Trouble with TIT FOR TAT PAVLOV cannot be invaded by random C; PAVLOV is an exploiter (will “fleece a sucker” once it discovers no need to fear retaliation). A mistake between a pair of PAVLOVs causes only a single round of mutual defection followed by a return to mutual cooperation. PAV: C C C C D C C PAV: C C C D D C C “mistake”

Simulating Evolution Pop. Share 0.140 0.100 0.060 0.020 0 200 400 600 800 Generations 1(TFT) 3 2 6 7,9 10 4 11 5 8 No. = Position after 1st Generation 18 14,12,15 13 Source: Axelrod 1984, p. 51.

Simulating Evolution Pop. Shares PAV TFT GRIM (TRIGGER) 0.50 0.40 0.30 0.20 0.10 0.00 GTFT? R D C Generations Population shares for 6 RPD strategies (including RANDOM), with noise at 0.01 level.

Bounded Rationality In the Repeated Prisoner’s Dilemma, it has been suggested that “uncooperative behavior is the result of ‘unbounded rationality’, i.e., the assumed availability of unlimited reasoning and computational resources to the players” (Papadimitrou, 1992: 122). If players are bounded rational, on the other hand, the cooperative outcome may emerge as the result of a “muddling” process. They reason inductively and adapt (imitate or learn) locally superior strategies. Thus, not only is bounded rationality a more “realistic” approach, it may also solve some deep analytical problems, e.g., resolution of finite horizon paradoxes.

Tournament Assignment Design a strategy to play an Evolutionary Prisoner’s Dilemma Tournament. Entries will meet in a round robin tournament, with 1% noise (i.e., for each intended choice there is a 1% chance that the opposite choice will be implemented). Games will last at least 1000 repetitions (each generation), and after each generation, population shares will be adjusted according to the replicator dynamic, so that strategies that do better than average will grow as a share of the population whereas others will be driven to extinction. The winner or winners will be those strategies that survive after at least 10,000 generations.

Designing Repeated Game Strategies Imagine a very simple decision making machine playing a repeated game. The machine has very little information at the start of the game: no knowledge of the payoffs or “priors” over the opponent’s behavior. It merely makes a choice, receives a payoff, then adapts its behavior, and so on. The machine, though very simple, is able to implement a strategy against any possible opponent, i.e., it “knows what to do” in any possible situation of the game.

Designing Repeated Game Strategies A repeated game strategy is a map from a history to an action.A history is all the actions in the game thus far …. …T-3T-2T-1To C C C C D C C C C C D D C D History at time To ?

Designing Repeated Game Strategies A repeated game strategy is a map from a history to an action.A history is all the actions in the game thus far, subject to the constraint of a finite memory: …T-3T-2T-1To C C C C D C C C C C D D C C History of memory-4 ?

Designing Repeated Game Strategies TIT FOR TAT is a remarkably simple repeated game strategy. It merely requires recall of what happened in the last round (memory-1). …T-3T-2T-1To C C C C D D C C C C D D C D History of memory-1 ?

Finite Automata A FINITE AUTOMATON (FA) is a mathematical representation of a simple decision-making process. FA are completely described by: • A finite set of internal states • An initial state • An output function • A transition function The output function determines an action, C or D, in each state. The transition function determines how the FA changes states in response to the inputs it receives (e.g., actions of other FA). Rubinstein, “Finite Automata Play the Repeated PD”JET, 1986)

Finite Automata FA will implement a strategy against any possible opponent, i.e., they “know what to do” in any possible situation of the game. FA meet in 2-player repeated games and make a move in each round (either C or D). Depending upon the outcome of that round, they “decide” what to play on the next round, and so on. FA are very simple, have no knowledge of the payoffs or priors over the opponent’s behavior, and no deductive ability. They simply read and react to what happens. Nonetheless, they are capable of a crude form of “learning” — they receive payoffs that reinforce certain behaviors and “punish” others.

Finite Automata C D D C D C “TIT FOR TAT”

Finite Automata C C D D D C C D C “TIT FOR TWO TATS”

Finite Automata C,D C,D C C D Some examples: D D C D C D D START C “ALWAYS DEFECT”“TIT FOR TAT”“GRIM (TRIGGER)” C C C D C C D C D C C D D D D “PAVLOV”“M5”

Unit III: The Evolution of Cooperation