- 95 Views
- Uploaded on
- Presentation posted in: General

Unit III: The Evolution of Cooperation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

- Can Selfishness Save the Environment?
- Repeated Games: the Folk Theorem
- Evolutionary Games
- A Tournament
- How to Promote Cooperation

7/28

4/14

7/31

- Advice to Participants/Reformers
- The Logic of Collective Action
- Changing the Rules of the Game
- The Problem of Trust
- Limits of Strategic Rationality
- Tournament Update

Based on the success of TFT in his tournaments, Axelrod offers two sets of recommendations to promote cooperation (1984, pp. 199-244):

Advice to Participants - How to do well as a player in the game; the ingredients of a strategy that will do well in IRPD.

Advice to Reformers - How to change the rules of the game to promote cooperation; changing the rules changes the players payoffs and hence the game.

How to Choose Effectively

(Axelrod, 1984, pp. 109-123.)

- Don’t be envious
- Don’t be the first to defect
- Reciprocate both cooperation and defection
- Don’t be too clever
These are intended as the ingredients of a strategy that will, in the long range and against a wide range of opponents, advance the player’s interests.

- Nice: Never be the first to defect. A nice strategy signals a willingness to cooperate and may induce reciprocal cooperation. Nice strategies did best in Axelrod’s tournaments.

- Forgiving: Reciprocate cooperation. Triggers may be susceptible to misunderstandings, mistakes, etc, that can lead otherwise cooperative players into spirals of alternating or mutual defection.

Sucker the Simple?

Recall that while TIT FOR TAT never beats is opponent, PAVLOV always defects against a naïve cooperator. Hence, the success of PAVLOV in newer tournaments may suggest it is wise to exploit the weak, both

(i) for “egoistic” benefit; and

(ii) to increase the overall fitness of the population.

Either the simple will learn (not to let themselves be exploited), or they will be winnowed.

Axelrod offers five concrete suggestions on how “the strategic setting itself can be transformed in order to promote cooperation among the players” (1984, pp. 124-141):

- Enlarge the “shadow of the future”
- Change the payoffs
- Teach people to care about each other
- Teach reciprocity
- Improve recognition abilities

Repeated interactions provide the conditions necessary for cooperation by transforming the nature of the interaction in two ways:

- “Enlarge the shadow of the future”
- Increase the amount of information in the system. This may reduces strategic uncertainty (e) and allow players to coordinate their expectations and behavior on mutually beneficial outcomes.

d 1

d* = T-R

T-P

e0

- Behavioral Game Theory
- Learning to Cooperate
- Summary and Conclusions

- The Limits of Homo Economicus
- Bounded Rationality
- Learning to Cooperate
- Tournament Update

Schelling’s “Errant Economics”

“The Intimate Contest for Self-Command” (1984: 57-82)

“The Mind as a Consuming Organ” (328-46)

The standard model of rational economic man is:

- Too simple
- Assumes time consistent preferences
- Susceptible to self deception and ‘sour grapes’
- Is overly consequentialist
- Ignores ‘labelling’ and ‘framing’ effects

Schelling’s “Errant Economics”

“The Intimate Contest for Self-Command” (1984: 57-82)

“The Mind as a Consuming Organ” (328-46)

Schelling’s views are not merely critical (negative); his concerns foreshadow much current research on improving the standard model:

- Behavioral economics/cognitive psychology
- Artificial Intelligence
- Learning models: Inductive reasoning

Experiments in “behavioral economics” have shown people routinely do not behave the way the standard model predicts:

- reject profitable bargains they think are unfair
- do not take full advantage of others when they can
- punish others even when costly to themselves
- contribute substantially to public goods
- behave irrationally when they expect others to behave even more irrationally
(Camerer, 1997)

Game theory usually assumes “unbounded,” perfect, or “Olympian” rationality (Simon, 1983). Players:

- have unlimited memory and computational resources.
- solve complex, interdependent maximization problems – instantaneously! – subject only to the constraint that the other player is also trying to maximize.
But observation and experimentation with human subjects tell us that people don’t actually make decisions this way. A more realistic approach would make more modest assumptions: bounded rationality.

Game theory usually assumes players are deductively rational. Starting from certain givens (sets of actions, information, payoffs), they arrive at a choice that maximizes expected utility.

Deductiverationality assumes a high degree of constancy in the decision-makers’ environment. They may have complete or incomplete information, but they are able to form probability distributions over all possible states of the world, and these underlying distributions are themselves stable.

But in more complex environments, the traditional assumptions break down. Every time a decision is made the environment changes, sometimes in unpredictable ways, and every new decision is made in a new environment (S. Smale).

In more complicated environments, the computational requirements to deduce a solution quickly swamp the capacity of any human reasoning. Chess appears to be well beyond the ability of humans to fulfill the requirements of traditional deductive reasoning.

In today’s “fast” economy a more dynamic theory is needed. The long-run position of the economy may be affected by our predictions!

“On Learning and Adaptation in the Economy,” Arthur, 1992, p. 5

- The standard model of Homo Economics break down for two reasons:
- (i) human decision making is limited by finite memory and computational resources.
- (ii)thinking about others’ thinking involves forming subjective beliefs and subjective beliefs about subjective beliefs, and so on.

There is a peculiar form of regress which characterizes reasoning about someone else’s reasoning, which in turn, is based on assumptions about one's own reasoning, a point repeatedly stressed by Schelling (1960). In some types of games this process comes to an end in a finite number of steps . . . . Reflexive reasoning, . . . ‘folds in on itself,’ as it were, and so is not a finite process. In particular when one makes an assumption in the process of reasoning about strategies, one ‘plugs in’ this very assumption into the ‘data.’ In this way the possibilities may never be exhausted in a sequential examination. Under these circumstances it is not surprising that the purely deductive mode of reasoning becomes inadequate when the reasoners themselves are the objects of reasoning.

(Rapoport, 1966, p. 143)

In the Repeated Prisoner’s Dilemma, it has been suggested that “uncooperative behavior is the result of ‘unbounded rationality’, i.e., the assumed availability of unlimited reasoning and computational resources to the players” (Papadimitrou, 1992: 122). If players are bounded rational, on the other hand, the cooperative outcome may emerge as the result of a “muddling” process. They reason inductively and adapt (imitate or learn) locally superior stategies.

Thus, not only is bounded rationality a more “realistic” approach, it may also solve some deep analytical problems, e.g., resolution of finite horizon paradoxes.

Learning to Cooperate

The shaded area is the set of SPNE. The segment PP,RR is the set of “collectively stable” strategies,

for (d > d*).

(S,T)

(R,R)

(P,P)

(T,S)

We have seen that whereas cooperation is irrational in a one-shot Prisoner’s Dilemma, it may be rational (i.e., achieved in a SPNE), if the game is repeated and “the shadow of the future” is sufficiently large:

d > (T-R)/(T-P) (i)

Repeated interaction is a necessary but not a sufficient condition for cooperation. In addition, players must have reason to believe the other will reciprocate.

This involves judging intentions, considerations of fairness, (mis)communication, trust, deception, etc.

Learning to Cooperate

Consider two fishermen deciding how many fish to remove from a commonly owned pond. There are Y fish in the pond.

- Period 1 each fishery chooses to consume (c1, c2).
- Period 2remaining fish are equally divided (Y – (c1+c2))/2).
c1 = (Y – c2)/2

NE: c1 = c2 = Y/3

Social Optimum: c1 = c2 = Y/4

c2

Y/3

Y/4

c2 = (Y – c1)/2

Y/4Y/3 c1

Learning to Cooperate

Consider two fishermen deciding how many fish to remove from a commonly owned pond. There are Y fish in the pond.

- Period 1 each fishery chooses to consume (c1, c2).
- Period 2remaining fish are equally divided (Y – (c1+c2))/2).
c1 = (Y – c2)/2

If there are 12 fish in the pond, each will consume (Y/3) 4 in the spring and 2 in the fall in a NE. Both would be better off consuming (Y/4) 3 in the fall, leaving 3 for each in the spring.

c2

Y/3

Y/4

c2 = (Y – c1)/2

Y/4Y/3 c1

Learning to Cooperate

If there are 12 fish in the pond, each will consume (Y/3) 4 in the spring and 2 in the fall in a NE. Both would be better off consuming (Y/4) 3 in the fall, leaving 3 for each in the spring.

C D

C9, 9 7.5,10

A Prisoner’s Dilemma

What would happen if the game were repeated?

D 10,7.5 8, 8

Imagine the fisherman make the following deal: Each will Cooperate (consume only 3) in the spring as long as the other does likewise; as soon as one Defects, the other will Defect for ever, i.e., they adopt trigger strategies.

This deal will be stable if the threat of future punishment makes both unwilling to Defect, i.e., if the one period gain from Defect is not greater than the discounted future loss due to the punishment:

(T – R) < (dR/(1-d) – dP/(1-d)) (ii)

Imagine the fishermen make the following deal:

Promise: I’ll consume 3 in the spring, if you do.

Threat: I’ll consume 4, forever, if you deviate.

9 … 9 … 9 … 9 … 9 … 9 … 9 … 9 … = 9/(1-d)

9 … 9 … 9 … 9 … 10 … 8 … 8 … 8 …

If d is sufficiently high, the threat will be credible, and the pair of trigger strategies is a Nash equilibrium.

d* = 0.5

Trigger Strategy

Current gain from deviation =

10 – 9 = 1

Future gain from cooperation =

d9/(1-d) –d8/(1-d)

Imagine the fishermen make the following deal:

Promise: I’ll consume 3 in the spring, if you do.

Threat: I’ll consume 4, forever, if you deviate.

R … R … R … R … R … R … R … R … = R/(1-d)

R … R … R … R … T … P … P … P …

If d is sufficiently high, the threat will be credible, and the pair of trigger strategies is a Nash equilibrium.

d* = (T-R)/(T-P)

Trigger Strategy

Current gain from deviation =

(T – R)

Future gain from cooperation =

dR/(1-d)-dP/(1-d)

Imagine the fisherman make the following deal: Each will Cooperate (consume only 3) in the spring as long as the other does likewise; as soon as one Defects, the other will Defect for ever, i.e., they adopt trigger strategies.

This deal will be stable if the threat of future punishment makes both unwilling to Defect, i.e., if the one period gain from Defect is not greater than the discounted future loss due to the punishment:

(T – R) < (dR/(1-d) – dP/(1-d))

or d> (T-R)/(T-P)

Imagine there are many fishermen, each of whom can adopt either D(efect), C(ooperate), or T(rigger). In every generation, each fisherman plays against every other. After each generation, those that did poorly can switch to imitate those that did better. Eventually, C will die out, and the population will be dominated by either D or T, depending on the discount parameter.

Noise (miscommunication) can also affect the outcome.

Design a strategy to play an

Evolutionary Prisoner’s Dilemma Tournament.

Entries will meet in a round robin tournament, with 1% noise (i.e., for each intended choice there is a 1% chance that the opposite choice will be implemented). Games will last at least 1000 repetitions (each generation), and after each generation, population shares will be adjusted according to the replicator dynamic, so that strategies that do better than average will grow as a share of the population whereas others will be driven to extinction. The winner or winners will be those strategies that survive after at least 10,000 generations.

To design your strategy, access the programs through your fas Unix account. The Finite Automaton Creation Tool (fa) will prompt you to create a finite automata to implement your strategy. Select the number of internal states, designate the initial state, define output and transition functions, which together determine how an automaton “behaves.” The program also allows you to specify probabilistic output and transition functions. Simple probabilistic strategies such as GENEROUS TIT FOR TAT have been shown to perform particularly well in noisy environments, because they avoid costly sequences of alternating defections that undermine sustained cooperation.

After 5000 generations

(as of 4/25/02)

Avg. Score (x10)

After 5000 generations

(10pm 4/27/02)

After 20000 generations

(7am 4/28/02)

After 600 generations

(4/22/05)

After 1000 generations

(4/27/05)