evolving strategies for the prisoner s dilemma l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Evolving Strategies for the Prisoner’s Dilemma PowerPoint Presentation
Download Presentation
Evolving Strategies for the Prisoner’s Dilemma

Loading in 2 Seconds...

play fullscreen
1 / 30

Evolving Strategies for the Prisoner’s Dilemma - PowerPoint PPT Presentation


  • 180 Views
  • Uploaded on

Evolving Strategies for the Prisoner’s Dilemma. Jennifer Golbeck University of Maryland, College Park Department of Computer Science July 23, 2002. Overview. Previous Research Prisoner’s Dilemma The Genetic Algorithm Results Conclusions. Previous Research. Axelrod.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Evolving Strategies for the Prisoner’s Dilemma' - Ava


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
evolving strategies for the prisoner s dilemma

Evolving Strategies for the Prisoner’s Dilemma

Jennifer Golbeck

University of Maryland, College Park

Department of Computer Science

July 23, 2002

overview
Overview
  • Previous Research
  • Prisoner’s Dilemma
  • The Genetic Algorithm
  • Results
  • Conclusions
axelrod
Axelrod
  • Robert Axelrod’s experiments of the 1980’s served as the starting point for this research
  • Implementation closely adheres to the configuration of his experiments
  • Same model for the Prisoner’s Dilemma
  • Minor variation in the implementation of the Genetic Algorithm
the prisoner s dilemma model
The Prisoner’s Dilemma Model
  • The basic two-player prisoner’s Dilemma
  • Both players are arrested for the same crime
  • Each has a choice
    • Confess - Cooperate with the authorities (admit to doing the crime)
    • Deny - Defect against the other player (claim the other person is responsible)
  • No knowledge of “opponent’s” action
payoff matrix
Payoff Matrix
  • Optimization
  • If both players cooperate, they each receive 3 points
  • If both players Defect, each receives 1 point
  • If there is a mixed outcome, the Defector gets 5 points and the cooperator gets 0 points
iterated game
Iterated Game
  • In simulation, the endpoint of the game is unknown to the players, making it essentially an infinitely iterated game
  • Each player has a memory of the previous three rounds on which to base his strategy
  • Strategies are deterministic - for a given history h players will always make the same move
  • With 4 possible configurations in each round and a history of 3, each strategy is comprised of 43 = 64 moves
previous results
Previous Results
  • Axelrod tournaments
  • Using the three-round history model, teams submitted strategies to be competed in a round-robin tournament
  • Tit for Tat
  • Pavlov strategy, developed after these tournaments, was shown to be an effective strategy as well.
the model
The Model
  • Darwinian Survival of the Fittest
  • Genetic representation of entities
  • Fitness function
  • Select most fit individuals to reproduce
  • Mutate
  • Traits of most fit will be passed on
  • Over time, the population will evolve to be more fit, optimal
ga s and the prisoner s dilemma
GA’s and the Prisoner’s Dilemma
  • Population: 20 individuals
  • Chromosome: 64-bit string where each bit represents the Cooperate or Defect move played for a specific strategy
ga s and pd ii
GA’s and PD II
  • Fitness: Each player competes against every other for 64 consecutive rounds, and a cumulative score is maintained
  • Selection:Roulette Wheel selection
  • Reproduction: Random point crossover with replacement
  • Mutation rate 0.001
  • Generations: 200,000 generations
hypothesis
Hypothesis
  • Past research has looked at which strategy was “best”. This research looks as what makes a “good” strategy.
  • Tit for Tat and Pavlov both perform very well, and share two traits
    • Defend against Defectors
    • Cooperate with other cooperators
hypothesis17
Hypothesis
  • All populations evolve over time to possess and exhibit these two traits
  • This behavior evolves regardless of the initial makeup of the population
experiment i
Five Initial Populations

All “Always Cooperate (Confess)” (AllC)

All “Always Defect (Deny)” (AllD)

All Tit for Tat

All Pavolv

All Randomly generated (independently)

Experiment I
experiment ii
Experiment II
  • Controls: Tit for Tat and Pavolv
    • Statistically equal performance
  • Support the hypothesis by showing:
    • Traits are not present in other initial populations
    • Over time, populations evolve to exhibit those traits and perform as well as Tit For Tat and Pavlov
experiment ii20
Experiment II
  • To show that the hypothesized traits evolve, populations must demonstrate
    • In the presence of Defectors, evolved populations perform identically to the controls
    • In the presence of cooperators, evolved populations perform identically to controls
part 1 defend against defectors i
Part 1:Defend Against Defectors I
  • Mix each initial population with a small set of AllD
    • Tit for Tat and Pavolv (controls) perform at about 80% of maximum
    • All others perform significantly worse that Tit For Tat and Pavolv
    • AllC and Random populations perform significantly worse than their normal behavior
    • This shows that a priori, the AllC and random populations cannot defend against Defectors
part 1 defend against defectors ii
Part 1: Defend against Defectors II
  • Evolve each population and then mix with small set of AllD
    • All populations now perform equally as well as each other, and as well as the TFT and Pavlov controls
    • Fitness at about 80% maximum
part 2 cooperate with cooperators
Part 2: Cooperate with Cooperators
  • As before, each startup population is mixed with a small set of AllC
    • TFT, Pavlov, do very well
    • AllC does exceptionally well
    • Others do significantly worse
  • Evolve and then add AllC
    • All populations perform equally as well as each other
    • Identical performance to TFT and Pavlov
conclusions i
Conclusions I
  • Performance measures show that AllC, AllD, and random populations do not generally possess defensive or cooperative traits a priori
  • After evolution, all populations have changed to incorporate both traits
  • Evolved strategies perform as well as TFT and Pavlov, traditional “best” strategies
conclusions ii
Conclusions II
  • In both experiments there is no statistical difference between the performance of evolved populations before and after the introduction of AllC or AllD players
  • Indicates that not only do the populations exhibit hypothesized traits in experimental conditions, but it is their normal behavior to do so.
non deterministic players
Non-deterministic Players
  • This work shows results for players with deterministic strategies
  • Much previous research has been done on stochastic strategies
  • Preliminary results show that the results presented here apply to stochastic strategies as well, but a formal study is necessary.