Dynamics of Cooperation in Spatial Prisoner’s Dilemma of Memory-Based Players

Dynamics of Cooperation in Spatial Prisoner’s Dilemma of Memory-Based Players Chenna Reddy Cotla Department of Computational Social Science George Mason University Fairfax, VA, USA

Research Question • What are the emergent dynamics of cooperation in a population of memory based agents whose interaction is spatially constrained to local neighborhoods on a square lattice?

Prisoner’s Dilemma • Prisoner’s Dilemma is an extensively studied Social Dilemma that characterizes the problem of cooperation. • Not enough evidence to convict two suspects of armed robbery, enough for theft of getaway car • Both confess (3 years each), both stay quiet (1 years each), one tells (0 years) the other doesn’t (5 years) • Stay quiet= cooperate (C) ; confess = defect (D) • Payoff Matrix: • T > R > P > S R is REWARD for mutual cooperation = 3 SSUCKER’s payoff = 0 TTEMPTATION to defect = 5 PPUNISHMENT for mutual defection = 1

Cooperation in a finite population:Evolutionary Spatial Prisoner’s Dilemma • Evolutionary dynamics along with non-random localized strategic interactions on a spatial structure can support cooperation. • Players are located on a lattice and play bilateral PD with their neighbors depending upon the notion of neighborhood. • Sum payoff is gathered in each generation. • Each player imitates the strategy of best scoring neighbor in the next generation. Moore Neighbors Von Neumann Neighbors Nowak, M. and May, R. (1992). Evolutionary games and spatial chaos. Nature, 359(6398):826–829.

Evolutionary SPD Fraction of cooperators in evolutionary SPD independent of initial configuration for Von Neumann neighborhood Nowak, M. and May, R. (1992). Evolutionary games and spatial chaos. Nature, 359(6398):826–829.

Evolutionary SPD: Significance • Regular lattices represent a limiting case of interaction topologies in the real world. • Emphasizes the importance of localized interaction in the emergence of cooperation.

Why we may need alternative models? • Very simplistic assumptions about players: no memory or reasoning. • Players are pure imitators. • May be more suitable for biological context than social context. • Recent experiments with human subjects have shown that humans do not unconditionally imitate the best scoring neighbor as assumed in evolutionary SPD (Traulsen et al., 2009, Grujic et al., 2010 ). Traulsen, A., Semmann, D., Sommerfeld, R., Krambeck, H., and Milinski, M. (2010). Human strategy updating in evolutionary games. Proceedings of the National Academy of Sciences, 107(7):2962. Grujic, J., Fosco, C., Araujo, L.,Cuesta, A.,and Sanchez, A. (2010). Social experiments in the meso scale: Humans playing a spatial prisoner’s dilemma, PLOS ONE, 5(11):e13749.

SPD with ACT-R agents • SPD with agents that make use of memory model embodied in ACT-R cognitive architecture. • Why ACT-R? • ACT-R memory model was able to reproduce important empirical dynamics in two person Prisoner’s Dilemma (Lebiereet al., 2000). Lebiere, C., Wallach, D., and West, R. L. (2000). A memory-based account of the prisoner’s dilemma and other 2× 2 games. In Proceedings of the 3rd International Conference on Cognitive Modeling, pages 185–193.,

Higher Level Decision Making in ACT-R: Knowledge Representation • In ACT-R Procedural and Declarative Knowledge interact to produce higher level cognition. • Procedural Memory: • Implemented as a set of productions. • Basically IF THEN conditions that specify when a particular production rule apply and what it does. • Example : A production to add two numbers IF the goal is to add n1 and n2 and n1 + n2 = n3 THEN set as subgoal to write n3 as a result. • Declarative Memory • Memory of facts represented as a collection of declarative structures called chunks. • Example: An Addition fact Fact 3+4 isaadditionfact addend1 three addend2 four sum seven

Higher Level Decision Making in ACT-R: Knowledge Deployment • In ACT-R Procedural and Declarative Knowledge interact to produce higher level cognition. • The production rules are selected based on their utility and the declarative chunks are retrieved based on sub symbolic quantities called activation levels. • Activation of a chunk: • In ACT-R the activation levels of declarative chunks reflect frequency and recency of usage and the contextual similarity. Also, noise is added to account for stochastic component in decision making. • Base level activation Bireflects the general usefulness of a given chunk based on how recently and frequently it was used in the past for achieving a goal. It is calculated using the following equation: • d is forgetting rate and tj is the time since jth access.

Representational Details of Game Playing Mechanism • Basic idea is to store game outcomes as chunks and use a single production rule that captures decision making. • If we consider a neighborhood of size n there are 2n+1outcomes. • Exponential computational time and space requirements. • We may run out of memory as n increases and we may need to wait for years to compute according to current computational standards. • To simplify matters a totalistic representation is used based on the total payoff. • Chunk C-kCis used to represent outcome when the given player cooperated and k neighbors have also cooperated. D-kCrepresent the same scenario with the given player defecting. • Only 2n+2 chunks are needed. • Activation is calculated using following computationally efficient approximation (Petrov, 2006): Petrov, A. (2006). Computationally efficient approximation of the base-level learning equation in act-r. In Proceedings of the Seventh International Conference on Cognitive Modeling, pages, 391–392.

Chunks and Production Rule • Declarative Memory when a neighborhood of size nis considered: (2n+2) Chunks (C-nCisa outcome p-action C N-confignCpayoff nR) (C-(n-1)C isa outcome p-action C N-config (n-1)C payoff (n-1)R + S) … (C-kCisa outcome p-actionC N-configkCpayoffkR + (n − k)S) … (C-1C isa outcome p-actionC N-config1C payoffR + (n − 1)S) (C-0C isa outcome p-actionC N-config0C payoff nS) (D-nCisa outcome p-actionD N-confignCpayoffnT ) (D-(n − 1)C isa outcome p-actionD N-config(n − 1)C payoff(n − 1)T + P … (D-kCisa outcome p-actionD N-configkCpayoffkT + (n − k)P ) … (D-1C isa outcome p-actionD N-config1C payoffT + (n − 1)P ) (D-0C isa outcome p-actionD N-config0C payoffnP)

Chunks and Production Rule • Production Rule: IF the goal is to play Spatial Prisoner’s Dilemma and the most likely outcome of C is OutcomeC and the most likely outcome of D is OutcomeD THEN make the move associated with the largest payoff of OutcomeC and OutcomeD Observe the opponent move and push new goal to make the next play.

Illustrative Scenario: Von Neumann Neighborhood • Declarative Memory when V-N neighborhood is considered: 10 Chunks (C-4C isa outcome p-action C N-config 4C payoff 4R) (C-3C isa outcome p-action C N-config 3C payoff3R + S) (C-2C isa outcome p-action C N-config 2C payoff2R + 2S) (C-1C isa outcome p-action C N-config 1C payoff R + 3S) (C-0C isa outcome p-action C N-config 0C payoff4S) (D-4C isa outcome p-action D N-config4C payoff 4T) (D-3C isa outcome p-action D N-config3C payoff 3T + P) (D-2C isa outcome p-action D N-config 2C payoff 2T + 2P) (D-1C isa outcome p-action D N-config 1C payoff T + 3P) (D-0C isa outcome p-action D N-config 0C payoff4P) • Production Rule: IF the goal is to play Spatial Prisoner’s Dilemma and the most likely outcome of C is OutcomeC and the most likely outcome of D is OutcomeD THEN make the move associated with the largest payoff of OutcomeC and OutcomeD Observe the opponent move and push new goal to make the next play.

Illustrative Scenario: Von Neumann Neighborhood Chunks associated with C Chunks associated with D Chunks and payoffs

Global Cooperation in SPD with Memory Based Agents (Synchronous Updating) • Lattice size: 100 × 100 • Pay off matrix considered: { T = 5, R = 3, P = 1, S = 0} • Cooperation levels fluctuate around 0.3214

Continued.

Effect of Neighborhood Size on Global Cooperation Levels • In SPD with memory based agents spatial structure may not always support higher cooperation levels than well-mixed scenario. • Asymptotic cooperation levels in well-mixed case: 0.2447

Different Activation Regimes and Size of the Lattice • Asynchronous updating (Axtell, 2001) • Asymptotic cooperation levels for Uniform Activation with V-N neighbors: 0.3238 • Asymptotic cooperation levels for Random Activation with V-N neighbors: 0.3220 • Size of the lattice • Global cooperation level is almost invariant with size of the lattice. Axtell, R. (2001). Effects of interaction topology and activation regime in several multi-agent systems. Multi-Agent-Based Simulation, pages 33–48.

Conclusion and Further Research • Partial cooperation levels are sustained in a SPD with memory based agents. • Spatial structure may not always support higher cooperation levels than well-mixed case as in the evolutionary framework. • Decision making mechanism that is grounded in a cognitive architecture may present a convincing middle-way between the strict rationality assumptions of behavior and the overly simplistic characterization of behavior in Evolutionary game theory. • Future Directions: • Validation of model output using experimental results. • Effect of interaction topology on emergent cooperation levels. Properties of Interaction topologies in real world lie between that of random networks and regular lattices.

Questions?

Dynamics of Cooperation in Spatial Prisoner’s Dilemma of Memory-Based Players