Create Presentation
Download Presentation

Download Presentation
## Science of JDM as an Efficient Game of Mastermind

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Science of JDM as an EfficientGame of Mastermind**Michael H. Birnbaum California State University, Fullerton Bonn, July 26, 2013**Mastermind Game**• Goal to find secret code of colors in positions. In “basic” game, there are 4 positions and 6 colors, making 64 = 1296 hypotheses. • Each “play” of the game is an experiment that yields feedback as to the accuracy of an hypothesis. • For each “play”, feedback = 1 black peg for each color in correct position and 1 white peg for each correct color in wrong position.**Play Mastermind Online**• http://www.web-games-online.com/mastermind/index.php • (Mastermind is a variant of “Bulls and Cows”, an earlier code-finding game.)**Analogies**• EXPERIMENTS yield results, from which we revise our theories. • RECORD of experiments and results is preserved. • Experiments REDUCE THE SPACE of compatible with evidence. • Hypotheses can be PARTITIONED with respect to components.**Science vs. Mastermind**• In Mastermind, feedback is 100% accurate; in science, feedback contains “error” and “bias.” Repeat/revise the “same” experiment, different results. • In Mastermind, we can specify the space of hypotheses exactly, but in science, the set of theories under contention expands as people construct new theories. • In Mastermind, we know when we are done; science is never done.**Analogies**• EFFICIENT Mastermind is the goal: Find the secret code with fewest experiments. • If FEEDBACK IS NOT PERFECT, results are fallible, and it would be a mistake to build theory on such fallible results. • REPLICATION is needed, despite the seeming loss of efficiency.**Hypothesis Testing vs. Mastermind**• Suppose we simply tested hypotheses, one at a time and a significance tests says “reject” or “retain”? • With 1296 hypotheses, we get closer to truth with each rejection--BARELY. • Now suppose that 50% of the time we fail to reject false theories and 5% of the time we reject a true theory. • Clearly, significance testing this way is not efficient. More INFORMATIVE FEEDBACK needed.**Experiments that Divide the Space of Hypotheses in Half**• Basic game = 1296 Hypotheses • Suppose each experiment cuts space in half: 1296, 648, 324, 162, 81, 40.5, 20.25, 10.1, 5.1, 2.5, 1.3, done. 11 moves. • But typical game with 1296 ends after 4 or 5 moves, infrequently 6. • So, Mastermind is more efficient than “halving” of the space.**Index of Fit Informative?**• Suppose we assign numbers to each color, R = 1, G = 2, B = 3, etc. and calculate a correlation coefficient between the code and the experimental results? • This index could be highly misleading, it depends on the coding and experiment. • Fit could be higher for “worse” theories. (Devil rides again 1970s).**Psychology vs. Mastermind**• Mastermind: only ONE secret code. • In Psychology, we allow that different people might have different individual difference parameters. • Even more complicated: Perhaps different people have different models. • As if, different experiments in the game have DIFFERENT secret codes.**Testing Critical Properties**• Test properties that do not depend on parameters. • Such properties partition the space of hypotheses, like the test of all REDs. • For example: CPT (including EU) implies STOCHASTIC DOMINANCE. This follows for any set of personal parameters (any utility/value function and any prob. weighting function).**Critical Tests are Theorems of One Model that are Violated**by Another Model • This approach has advantages over tests or comparisons of fit. • It is not the same as “axiom testing.” • Use model-fitting to rival model to predict where to find violations of theorems deduced from model tested.**Outline**• I will discuss critical properties that test between nonnested theories: CPT and TAX. • Lexicographic Semiorders vs. family of transitive, integrative models (including CPT and TAX). • Integrative Contrast Models (e.g., Regret, Majority Rule) vs. transitive, integrative models.**“Prior” TAX Model**Assumptions:**TAX Parameters**For 0 < x < $150 u(x) = x Gives a decent approximation. Risk aversion produced by d. d = 1 .**TAX and CPT nearly identical for binary (two-branch) gambles**• CE (x, p; y) is an inverse-S function of p according to both TAX and CPT, given their typical parameters. • Therefore, there is little point trying to distinguish these models with binary gambles.**Testing CPT**TAX:Violations of: • Coalescing • Stochastic Dominance • Lower Cum. Independence • Upper Cumulative Independence • Upper Tail Independence • Gain-Loss Separability**Testing TAX Model**CPT: Violations of: • 4-Distribution Independence (RS’) • 3-Lower Distribution Independence • 3-2 Lower Distribution Independence • 3-Upper Distribution Independence (RS’) • Res. Branch Indep (RS’)**Stochastic Dominance**• A test between CPT and TAX: G = (x, p; y, q; z) vs. F = (x, p – s; y’, s; z) Note that this recipe uses 4 distinct consequences: x > y’ > y > z > 0; outside the probability simplex defined on three consequences. CPT choose G, TAX choose F Test if violations due to “error.”**Error Model Assumptions**• Each choice pattern in an experiment has a true probability, p, and each choice has an error rate, e. • The error rate is estimated from inconsistency of response to the same choice by same person in a block of trials. The “true”p is then estimated from consistent (repeated) responses to same question.**Violations of Stochastic Dominance**122 Undergrads: 59% TWO violations (BB) 28% Pref Reversals (AB or BA) Estimates: e = 0.19; p = 0.85 170 Experts: 35% repeated violations 31% Reversals Estimates: e = 0.20; p = 0.50**42 Studies of Stochastic Dominance, n = 12,152***• Large effects of splitting vs. coalescing of branches • Small effects of education, gender, study of decision science • Very small effects of 15 probability formats and request to justify choices. • Miniscule effects of event framing (framed vs unframed) * (as of 2010)**Summary: Prospect Theories not Descriptive**• Violations of Coalescing • Violations of Stochastic Dominance • Violations of Gain-Loss Separability • Dissection of Allais Paradoxes: viols of coalescing and restricted branch independence; RBI violations opposite of Allais paradox; opposite CPT.**Results: CPT makes wrong predictions for all 12 tests**• Can CPT be saved by using different formats for presentation? • Violations of coalescing, stochastic dominance, lower and upper cumulative independence replicated with 14 different formats and ten-thousands of participants. • Psych Review 2008 & JDM 2008 “new tests” of CPT and PH.**Lexicographic Semiorders**• Intransitive Preference. • Priority heuristic of Brandstaetter, Gigerenzer & Hertwig is a variant of LS, plus some additional features. • In this class of models, people do not integrate information or have interactions such as the probability X prize interaction in family of integrative, transitive models (CPT, TAX, GDU, EU and others)**LPH LS: G = (x, p; y) F = (x’, q; y’)**• If (y –y’ > D) choose G • Else if (y ’- y > D) choose F • Else if (p – q > d) choose G • Else if (q – p > d) choose F • Else if (x – x’ > 0) choose G • Else if (x’ – x > 0) choose F • Else choose randomly**Family of LS**• In two-branch gambles, G = (x, p; y), there are three dimensions: L = lowest outcome (y), P = probability (p), and H = highest outcome (x). • There are 6 orders in which one might consider the dimensions: LPH, LHP, PLH, PHL, HPL, HLP. • In addition, there are two threshold parameters (for the first two dimensions).**Testing Lexicographic Semiorder Models**Violations of Transitivity Violations of Priority Dominance Integrative Independence Interactive Independence LS Allais Paradoxes CPT TAX EU**New Tests of Independence**• Dimension Interaction: Decision should be independent of any dimension that has the same value in both alternatives. • Dimension Integration: indecisive differences cannot add up to be decisive. • Priority Dominance: if a difference is decisive, no effect of other dimensions.**Family of LS**• 6 Orders: LPH, LHP, PLH, PHL, HPL, HLP. • There are 3 ranges for each of two parameters, making 9 combinations of parameter ranges. • There are 6 X 9 = 54 LS models. • But all models predict SS, RR, or ??.**Analysis of Interaction**• Estimated probabilities: • P(SS) = 0 (prior PH) • P(SR) = 0.75 (prior TAX) • P(RS) = 0 • P(RR) = 0.25 • Priority Heuristic: Predicts SS**Probability Mixture Model**• Suppose each person uses a LS on any trial, but randomly switches from one order to another and one set of parameters to another. • But any mixture of LS is a mix of SS, RR, and ??. So no LS mixture model explains SR or RS.**Results: Dimension Integration**• Data strongly violate independence property of LS family • Data are consistent instead with dimension integration. Two small, indecisive effects can combine to reverse preferences. • Observed with all pairs of 2 dims. • Birnbaum, in J. math Psych, 2010.**New Studies of Transitivity**• LS models violate transitivity: A > B and B > C implies A > C. • Birnbaum & Gutierrez (2007) tested transitivity using Tversky’s gambles, using typical methods for display of choices. • Text displays and pie charts with and without numerical probabilities. Similar results with all 3 procedures.**Replication of Tversky (‘69) with Roman Gutierrez**• 3 Studies used Tversky’s 5 gambles, formatted with tickets and pie charts. • Exp 1, n = 251, tested via computers.**Three of Tversky’s (1969) Gambles**• A = ($5.00, 0.29; $0) • C = ($4.50, 0.38; $0) • E = ($4.00, 0.46; $0) Priority Heurisitc Predicts: A preferred to C; C preferred to E, But E preferred to A. Intransitive. TAX (prior): E > C > A**Comments**• Results were surprisingly transitive. • Differences: no pre-test, selection; • Probability represented by # of tickets (100 per urn); similar results with pies. • Birnbaum & Gutierrez, 2007, OBHDP • Regenwetter, Dana, & Davis-Stober also conclude that evidence against transitivity is weak., Psych Review, 2011. • Birnbaum & Bahra: most Ss transitive. JDM, 2012.