Unit III: The Evolution of Cooperation. Can Selfishness Save the Environment? Repeated Games: the Folk Theorem Evolutionary Games A Tournament How to Promote Cooperation. 7/28. 4/14. 7/31. How to Promote Cooperation. Advice to Participants/Reformers The Logic of Collective Action
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
7/28
4/14
7/31
Based on the success of TFT in his tournaments, Axelrod offers two sets of recommendations to promote cooperation (1984, pp. 199-244):
Advice to Participants - How to do well as a player in the game; the ingredients of a strategy that will do well in IRPD.
Advice to Reformers - How to change the rules of the game to promote cooperation; changing the rules changes the players payoffs and hence the game.
How to Choose Effectively
(Axelrod, 1984, pp. 109-123.)
These are intended as the ingredients of a strategy that will, in the long range and against a wide range of opponents, advance the player’s interests.
Sucker the Simple?
Recall that while TIT FOR TAT never beats is opponent, PAVLOV always defects against a naïve cooperator. Hence, the success of PAVLOV in newer tournaments may suggest it is wise to exploit the weak, both
(i) for “egoistic” benefit; and
(ii) to increase the overall fitness of the population.
Either the simple will learn (not to let themselves be exploited), or they will be winnowed.
Axelrod offers five concrete suggestions on how “the strategic setting itself can be transformed in order to promote cooperation among the players” (1984, pp. 124-141):
Repeated interactions provide the conditions necessary for cooperation by transforming the nature of the interaction in two ways:
d 1
d* = T-R
T-P
e0
Schelling’s “Errant Economics”
“The Intimate Contest for Self-Command” (1984: 57-82)
“The Mind as a Consuming Organ” (328-46)
The standard model of rational economic man is:
Schelling’s “Errant Economics”
“The Intimate Contest for Self-Command” (1984: 57-82)
“The Mind as a Consuming Organ” (328-46)
Schelling’s views are not merely critical (negative); his concerns foreshadow much current research on improving the standard model:
Experiments in “behavioral economics” have shown people routinely do not behave the way the standard model predicts:
(Camerer, 1997)
Game theory usually assumes “unbounded,” perfect, or “Olympian” rationality (Simon, 1983). Players:
But observation and experimentation with human subjects tell us that people don’t actually make decisions this way. A more realistic approach would make more modest assumptions: bounded rationality.
Game theory usually assumes players are deductively rational. Starting from certain givens (sets of actions, information, payoffs), they arrive at a choice that maximizes expected utility.
Deductiverationality assumes a high degree of constancy in the decision-makers’ environment. They may have complete or incomplete information, but they are able to form probability distributions over all possible states of the world, and these underlying distributions are themselves stable.
But in more complex environments, the traditional assumptions break down. Every time a decision is made the environment changes, sometimes in unpredictable ways, and every new decision is made in a new environment (S. Smale).
In more complicated environments, the computational requirements to deduce a solution quickly swamp the capacity of any human reasoning. Chess appears to be well beyond the ability of humans to fulfill the requirements of traditional deductive reasoning.
In today’s “fast” economy a more dynamic theory is needed. The long-run position of the economy may be affected by our predictions!
“On Learning and Adaptation in the Economy,” Arthur, 1992, p. 5
There is a peculiar form of regress which characterizes reasoning about someone else’s reasoning, which in turn, is based on assumptions about one's own reasoning, a point repeatedly stressed by Schelling (1960). In some types of games this process comes to an end in a finite number of steps . . . . Reflexive reasoning, . . . ‘folds in on itself,’ as it were, and so is not a finite process. In particular when one makes an assumption in the process of reasoning about strategies, one ‘plugs in’ this very assumption into the ‘data.’ In this way the possibilities may never be exhausted in a sequential examination. Under these circumstances it is not surprising that the purely deductive mode of reasoning becomes inadequate when the reasoners themselves are the objects of reasoning.
(Rapoport, 1966, p. 143)
In the Repeated Prisoner’s Dilemma, it has been suggested that “uncooperative behavior is the result of ‘unbounded rationality’, i.e., the assumed availability of unlimited reasoning and computational resources to the players” (Papadimitrou, 1992: 122). If players are bounded rational, on the other hand, the cooperative outcome may emerge as the result of a “muddling” process. They reason inductively and adapt (imitate or learn) locally superior stategies.
Thus, not only is bounded rationality a more “realistic” approach, it may also solve some deep analytical problems, e.g., resolution of finite horizon paradoxes.
Learning to Cooperate
The shaded area is the set of SPNE. The segment PP,RR is the set of “collectively stable” strategies,
for (d > d*).
(S,T)
(R,R)
(P,P)
(T,S)
We have seen that whereas cooperation is irrational in a one-shot Prisoner’s Dilemma, it may be rational (i.e., achieved in a SPNE), if the game is repeated and “the shadow of the future” is sufficiently large:
d > (T-R)/(T-P) (i)
Repeated interaction is a necessary but not a sufficient condition for cooperation. In addition, players must have reason to believe the other will reciprocate.
This involves judging intentions, considerations of fairness, (mis)communication, trust, deception, etc.
Learning to Cooperate
Consider two fishermen deciding how many fish to remove from a commonly owned pond. There are Y fish in the pond.
c1 = (Y – c2)/2
NE: c1 = c2 = Y/3
Social Optimum: c1 = c2 = Y/4
c2
Y/3
Y/4
c2 = (Y – c1)/2
Y/4Y/3 c1
Learning to Cooperate
Consider two fishermen deciding how many fish to remove from a commonly owned pond. There are Y fish in the pond.
c1 = (Y – c2)/2
If there are 12 fish in the pond, each will consume (Y/3) 4 in the spring and 2 in the fall in a NE. Both would be better off consuming (Y/4) 3 in the fall, leaving 3 for each in the spring.
c2
Y/3
Y/4
c2 = (Y – c1)/2
Y/4Y/3 c1
Learning to Cooperate
If there are 12 fish in the pond, each will consume (Y/3) 4 in the spring and 2 in the fall in a NE. Both would be better off consuming (Y/4) 3 in the fall, leaving 3 for each in the spring.
C D
C9, 9 7.5,10
A Prisoner’s Dilemma
What would happen if the game were repeated?
D 10,7.5 8, 8
Imagine the fisherman make the following deal: Each will Cooperate (consume only 3) in the spring as long as the other does likewise; as soon as one Defects, the other will Defect for ever, i.e., they adopt trigger strategies.
This deal will be stable if the threat of future punishment makes both unwilling to Defect, i.e., if the one period gain from Defect is not greater than the discounted future loss due to the punishment:
(T – R) < (dR/(1-d) – dP/(1-d)) (ii)
Imagine the fishermen make the following deal:
Promise: I’ll consume 3 in the spring, if you do.
Threat: I’ll consume 4, forever, if you deviate.
9 … 9 … 9 … 9 … 9 … 9 … 9 … 9 … = 9/(1-d)
9 … 9 … 9 … 9 … 10 … 8 … 8 … 8 …
If d is sufficiently high, the threat will be credible, and the pair of trigger strategies is a Nash equilibrium.
d* = 0.5
Trigger Strategy
Current gain from deviation =
10 – 9 = 1
Future gain from cooperation =
d9/(1-d) –d8/(1-d)
Imagine the fishermen make the following deal:
Promise: I’ll consume 3 in the spring, if you do.
Threat: I’ll consume 4, forever, if you deviate.
R … R … R … R … R … R … R … R … = R/(1-d)
R … R … R … R … T … P … P … P …
If d is sufficiently high, the threat will be credible, and the pair of trigger strategies is a Nash equilibrium.
d* = (T-R)/(T-P)
Trigger Strategy
Current gain from deviation =
(T – R)
Future gain from cooperation =
dR/(1-d)-dP/(1-d)
Imagine the fisherman make the following deal: Each will Cooperate (consume only 3) in the spring as long as the other does likewise; as soon as one Defects, the other will Defect for ever, i.e., they adopt trigger strategies.
This deal will be stable if the threat of future punishment makes both unwilling to Defect, i.e., if the one period gain from Defect is not greater than the discounted future loss due to the punishment:
(T – R) < (dR/(1-d) – dP/(1-d))
or d> (T-R)/(T-P)
Imagine there are many fishermen, each of whom can adopt either D(efect), C(ooperate), or T(rigger). In every generation, each fisherman plays against every other. After each generation, those that did poorly can switch to imitate those that did better. Eventually, C will die out, and the population will be dominated by either D or T, depending on the discount parameter.
Noise (miscommunication) can also affect the outcome.
Design a strategy to play an
Evolutionary Prisoner’s Dilemma Tournament.
Entries will meet in a round robin tournament, with 1% noise (i.e., for each intended choice there is a 1% chance that the opposite choice will be implemented). Games will last at least 1000 repetitions (each generation), and after each generation, population shares will be adjusted according to the replicator dynamic, so that strategies that do better than average will grow as a share of the population whereas others will be driven to extinction. The winner or winners will be those strategies that survive after at least 10,000 generations.
To design your strategy, access the programs through your fas Unix account. The Finite Automaton Creation Tool (fa) will prompt you to create a finite automata to implement your strategy. Select the number of internal states, designate the initial state, define output and transition functions, which together determine how an automaton “behaves.” The program also allows you to specify probabilistic output and transition functions. Simple probabilistic strategies such as GENEROUS TIT FOR TAT have been shown to perform particularly well in noisy environments, because they avoid costly sequences of alternating defections that undermine sustained cooperation.
After 5000 generations
(as of 4/25/02)
Avg. Score (x10)
After 5000 generations
(10pm 4/27/02)
After 20000 generations
(7am 4/28/02)
After 600 generations
(4/22/05)
After 1000 generations
(4/27/05)