How can bounded rationality be modelled in games?

Learning and teaching in games: Statistical models of human play in experimentsColin F. Camerer, Social Sciences Caltech (camerer@hss.caltech.edu)Teck Ho, Berkeley (Haas Business School)Kuan Chong, National Univ Singapore • How can bounded rationality be modelled in games? • Theory desiderata: Precise, general, useful (game theory), and cognitively plausible, empirically disciplined (cog sci) • Three components: • Cognitive hierarchy thinking model (one parameter, creates initial conditions) • Learning model (EWA, fEWA) - Sophisticated teaching’ model (repeated games) Shameless plug: Camerer, Behavioral Game Theory (Princeton, Feb ’03) or see website hss.caltech.edu/~camerer

Behavioral models use some game theory principles, and weaken other principles Principle equilibriumThinking Learning Teaching concept of a game     strategic thinking     best response   mutual consistency  learning   strategic foresight  

(Typical) experimental economics methods • Repeated matrix stage game (Markov w/ 1 state) • Repeated with “one night stand” (“stranger”) rematching protocol & feedback (to allow learning without repeated-game reputation-building) • Game is described abstractly, payoffs are public knowledge (e.g., read out loud) • Subjects paid $ according to choices (~$12/hr) • Why this style? Basic question is whether S’s can “compute” equiilibrium*, not meant to be realistic • Establish regularity across S’s, different game structures • Statistical fitting: Parsimonious (1+ parameters) models, fit (in sample) & predict (out of sample) & compute economic value *Question now answered (No): Would be useful to move to low-information MAL designs

Beauty contest game: Pick numbers [0,100] closest to (2/3)*(average number) wins

“Beauty contest” game (Ho, Camerer, Weigelt Amer Ec Rev 98): Pick numbers xi [0,100] Closest to (2/3)*(average number) wins $20

EWA learning • Attraction Aij (t) for strategy j updated by A ij (t) =(Aij (t-1) + i[si(t),s-i(t)]/ ((1-)+1) (chosen j) A ij (t) =(A ij (t-1) +  i[sij,s-i(t)]/ ((1- )+1) (unchosen j) logit response (softmax) Pij(t)=e^{A ij (t)}/[Σke^{A ik (t)}] • key parameters:  imagination (weight on foregone payoffs)  decay (forgetting) or change-detection  growth rate of attractions (=0  averages; =1 cumulations; =1 “lock-in” after exploration) • “In nature a hybrid [species] is usually sterile, but in science the opposite is often true”-- Francis Crick ’88 Weighted fictitious play (=1, =0) Simple choice reinforcement (=0)

Studies comparing EWA and other learning models

20 estimates of learning model parameters

Functional EWA learning (“EWA Lite”) • Use functions of experience to create parameter values (only free parameter ) i(t) is a change detector: i(t)=1-.5[k( s-ik (t) - =1t ss-ik()/t ) 2 ] Compares average of past freq’s s-i(1), s-i(2)…with s-i(t)  Decay old experience (low ) if change is detected =1 when other players always repeat strategies  falls after a “surprise” falls more if others have been highly variable falls less if others have been consistent =/( of Nash strategies) (creates low  in mixed games) Questions: (now) Do functional values pick up differences across games? (Yes.) (later) Can function changes create sensible, rapid switching in stochastic games?

Example: Price matching with loyalty rewards (Capra, Goeree, Gomez, Holt AER ‘99) • Players 1, 2 pick prices [80,200] ¢ Price is P=min(P1,,P2) Low price firm earns P+R High price firm earns P-R • What happens? (e.g., R=50)

Teaching in repeated (partner) games • Finitely-repeated trust game (Camerer & Weigelt Econometrica ‘88)borrower action repay default lender loan 40,60 -100,150 no loan 10,10 • 1 borrower plays against 8 lenders A fraction (p(honest)) borrowers prefer to repay (controlled by experimenter)

Empirical results (conditional frequencies of no loan and default)

Teaching in repeated trust games (Camerer, Ho, Chong J Ec Theory 02) • Some (=89%) borrowers know lenders learn by fEWA Actions in t “teach” lenders what to expect in t+1 •  (=.93) is “peripheral vision” weight E.g. entering period 4 of sequence 17 Seq. period 16 1 2 3 4 5 6 7 8 Repay Repay Repay Default .....  look “peripherally” ( weight) 17 1 2 3  look back Repay No loan Repay • Teaching: Strategies have reputations • Bayesian-Nash equilibrium: Borrowers have reputations (types)

Heart of the model: Attraction of sophisticated Borrower strategy j after sequence k before period tJt+1 is possible sequence of choices by borrowerFirst term is expected (myopic) payoff from strategy jSecond term is summation of expected payoffs in the future (undiscounted) given effect of j and optimal planned future choices (Jt+1)

Empirical results (top) and teaching model (bottom)

Conclusions • Learning ( response sensitivity) Hybrid fits & predicts well (20+ games) One-parameter fEWA fits well, easy to estimate Well-suited to Markov games because Φ means players can “relearn” if new state is quite different? • Teaching ( fraction of teaching) Retains strategic foresight in repeated games with partner matching Fits trust, entry deterrence better than softmax Bayesian-Nash (aka QRE) • Next?Field applications, explore low-information Markov domains…

Thinking steps (parameter ) • Parametric EWA learning (E’metrica ‘99) • free parameters , , , , N(0) • Functional EWA learning • functions for parameters • parameter () • Strategic teaching (JEcTheory ‘02) • Reputation-building w/o “types” • Two parameters (, )

How can bounded rationality be modelled in games?