A dapting Proper Scoring Rules for Measuring Subjective Beliefs to Nonexpected Utility

Adapting Proper Scoring Rules for Measuring Subjective Beliefs to Nonexpected Utility A Truth Serum for Non-Bayesians Joep Sonnemans Theo Offerman Gijs van de Kuilen Peter P. Wakker March 14, 2007Nottingham Topic:Subjective chance estimates, e.g.: - chance of rain tomorrow; - chance of strategy choice of opponent. Application considered first: grading of students.

2  E 1 0 not-E 0 1 Say, you grade a multiple choice exam in geography to test students' knowledge about Statement E: Capital of North Holland = Amsterdam. Reward: if E true if not E true Assume: Two kinds of students. Those who know. They answer correctly. Those who don't know. Answer 50-50. Problem: Correct answer does not completely identify student's knowledge. Some correct answers, and high grades, are due to luck. There is noise in the data.

3 One way out: oral exam, or ask details. Too time consuming. Now comes a perfect way out: Exactly identify state of knowledge of each student. Still multiple choice, taking no more time! Best of all worlds!

Reward: if E true if not E true  E 1 0 not-E 0 1 4 don't know 0.75 0.75 For those of you who do not know answer to E. What would you do? Best to say "don't know." System perfectly well discriminates between students!

5 New Assumption: Students have all kinds of degrees of knowledge. Some are almost sure E is true but are not completely sure; others have a hunch; etc. Above system does not discriminate perfectly well between such various states of knowledge. One solution (too time consuming): Oral exams etc. Second solution is to let them make many binary choices:Expanation in some detail follows; solution will be too time consuming.

6 Reward: if E true if not E true Reward: if E true if not E true Reward: if E true if not E true Reward: if E true if not E true  E  E  E  E choice 2 choice 9 choice 3 choice 1 1 0 1 0 1 0 1 0 don't know don't know don't know don't know 0.10 0.10 0.30 0.30 0.90 0.90 0.20 0.20 . . .

7 Etc. Can get approximation of true subjective probability p, i.e. degree of belief (!?). Binary-choice ("preference") solutions are popular in decision theory. This method works under subjective expected value (linear utility!). Justifiable by de Finetti's (1931) famous book making argument. Too time consuming for us. Rewards for students are somewhat noisy this way.

8 Reward: if E true if not E true  E 1 0  partly know E, to degree r r r Third solution[introspection] Student chooses number r ("reported probability"). Then: Just ask students for "indifference r," i.e. ask: "Which r makes lower row equally good as upper row?" Problems: Why would student give true answer r = p? What at all is the reward (grade?) for the student? Does reward, whatever it is, give an incentive to give true answer?

9 I now promise a perfect way out: de Finetti's dream-incentives. Exactly identifies state of knowledge of each student, no matter what it is. Still multiple choice, taking no more time. Rewards students fairly, with little noise. Best of all worlds. For the full variety of degrees of knowledge. Student can choose reported probability r for E from the [0,1] continuum, as follows.

r=1 degree of belief in E r : r=0.5: 0.75 0.75 r=0 10 Reward: if E true if not E true : (E = sure!?) 1 0 1 – (1–r)2 1–r2 (don't know!?) : (not-E is sure!?) 0 1 Claim: Under "subjective expected value," optimal reported probability r = true subjective probability p.

11 Figure:

To help memory: Reward: if E true if not E true degree of belief in E 1 – (1–r)2 1–r2  r: 12 Proof of claim. p true probability; r reported probability. Optimize EV = p(1 – (1–r)2) + (1–p)(1–r2). 1st order optimality: 2p(1–r) – 2r(1–p) = 0. r = p! Figure!? 

13 Easy in algebraic sense. Conceptually: !!! Wow !!! Incentive compatible ... Many implications ... de Finetti (1962) and Brier (1950) were the first neuro-economists!

14 "Bayesian truth serum" (Prelec, Science, 2005). Superior to elicitations through preferences . Superior to elicitations through indifferences ~ (BDM). Widely used: Hanson (Nature, 2002), Prelec (Science 2005). In accounting (Wright 1988), Bayesian statistics (Savage 1971), business (Stael von Holstein 1972), education (Echternacht 1972), finance (Shiller, Kon-Ya, & Tsutsui 1996), medicine (Spiegelhalter 1986), psychology (Liberman & Tversky 1993; McClelland & Bolger 1994), experimental economics (Nyarko & Schotter 2002). We want to introduce these very nice features into theories for risk and ambiguity.

15 Survey Part I. Deriving reported prob. r from theories: expected value; expected utility;  nonexpected utility for probabilities;  nonexpected utility for ambiguity. Part II. Deriving theories from observed r. In particular: Derive beliefs/ambiguity attitudes. Will be surprisingly easy. Proper scoring rules <==> risk & ambiguity:Mutual benefits. Part III. Implementation in an experiment.

16 Part I. Deriving r from Theories (EV, and then 3 deviations). Event E: Blair will keep his promise and will step down before September. We quantitatively measure your subjective belief about this event (subjective probability?), i.e. how much you trust Blair.

17 Let us assume that you have doubts about Blair's promise (politicians …) Your "true" subj. prob.(E: Blair keeps this promise) = 0.75. EV: Then your optimal rE = 0.75.

18 1 R(p) rEV rEU 0.69 0.61 rnonEU 0.50 0.75 rnonEUA nonEU EU 0.25 EV 0 1 0 0.25 0.50 0.75 p reward: if E true if not E true EV  rEV=0.75 0.94 0.44 0.8125  rEU=0.69 go to p. 21, Example EU 0.91 0.52 0.8094 go to p. 29, Example nonEUA go to p. 25, Example nonEU  rnonEU=0.61 0.85 0.63 0.7920 next p.  rnonEUA=0.52 0.77 0.73 0.7596 Reported probability R(p) = rE as function of true probability p, under: (a) expected value (EV); (b) expected utility with U(x) = x (EU); (c) nonexpected utility for known probabilities, with U(x) = x0.5 and with w(p) as common; rnonEUA: nonexpected utility for unknown probabilities ("Ambiguity").

19 So far we assumed EV (as does everyone using proper scoring rules, but as no-one does in modern risk-ambiguity theories ...) Deviation 1 from EV: EU with U nonlinear Now optimize pU(1 – (1– r)2) + (1 – p)U(1 – r2) r = p need no more be optimal.

(1–p) p + U´(1–r2) U´(1–r2) U´(1–(1–r)2) U´(1–(1–r)2) r p = (1–r) r + 20 Theorem. Under expected utility with true probability p, p r = Reversed (and explicit) expression: A corollary to distinguish from EV: r is nonadditive as soon as U is nonlinear.

go to p. 18, with figure of R(p) 21 How bet on Blair? [Expected Utility]. EV: rEV = 0.75. Expected utility, U(x) = x: rEU = 0.69. You now bet less on Blair. Closer to safety(Winkler & Murphy 1970).

22 Deviation 2from EV: nonexpected utility for probabilities (Allais 1953, Machina 1982, Kahneman & Tversky 1979, Quiggin 1982, Schmeidler 1989, Gilboa 1987, Gilboa & Schmeidler 1989, Gul 1991, Luce & Fishburn 1991, Tversky & Kahneman 1992, Birnbaum 2005; survey: Starmer 2000) Fortwo-gainprospects, virtually all those theories areas follows: For r  0.5, nonEU(r) = w(p)U(1 – (1–r)2) + (1–w(p))U(1–r2). r < 0.5, symmetry; soit! Different treatment of highest and lowest outcome: "rank-dependence."

1 w(p) .51 1/3 1 0 1/3 p 2/3 Figure.The common weighting function w. w(p) = exp(–(–ln(p))) for  = 0.65. w(1/3)  1/3; w(2/3)  .51 23

w(p) r = (1–w(p)) w(p) + U´(1–r2) U´(1–r2) U´(1–(1–r)2) U´(1–(1–r)2) Reversed (explicit) expression: ) ( r w–1 p = (1–r) r + 24 Theorem. Under nonexpected utility with true probability p,

go to p. 18, with figure of R(p) 25 How bet on Blair now? [nonEU with probabilities]. EV: rEV = 0.75. EU: rEU = 0.69. Nonexpected utility, U(x) = x, w(p) = exp(–(–ln(p))0.65). rnonEU = 0.61. You bet even less on Blair. Again closer to safety. Deviations were at level of behavior so far, not of be-liefs. Now for something different; more fundamental.

26 3rd violation of EV: Ambiguity (unknown probabilities; belief/decision-attitude? Yet to be settled). No objective data on probabilities. How deal with unknown probabilities? Have to give up Bayesian beliefs descriptively. According to some even normatively.

Instead of additive beliefs p = P(E), nonadditive beliefs B(E) (Dempster&Shafer, Tversky&Koehler; etc.) 27 All currently existing decision models: For r  0.5, nonEU(r) = W(E)U(1–(1–r)2) + (1–W(E))U(1–r2). Is '92 prospect theory, = Schmeidler (1989). Can always writeB(E) = w–1(W(E)),so W(E) = w(B(E)). Then w(B(E))U(1–(1–r)2) + (1–w(B(E)))U(1–r2). For binary gambles: Pfanzagl 1959; Luce ('00 Chapter 3); Ghirardato & Marinacci ('01, "biseparable").

w(B(E)) rE = (1–w(B(E))) w(B(E)) + Reversed (explicit) expression: U´(1–r2) U´(1–r2) ) U´(1–(1–r)2) U´(1–(1–r)2) ( w–1 B(E) = r (1–r) r + 28 Theorem. Under nonexpected utility with ambiguity,

go to p. 18, with figure of R(p) 29 How bet on Blair now?[Ambiguity, nonEUA]. rEV = 0.75. rEU = 0.69. rnonEU = 0.61. Similarly, rnonEUA = 0.52 (under plausible assumptions). r's are close to always saying fifty-fifty. "Belief" component B(E) = w–1(W) = 0.62.

30 B(E): ambiguity attitude /=/ beliefs?? Before entering that debate, first: How measure B(E)? Our contribution: through proper scoring rules with "risk correction." Other proposals for measuring B and W considered in the literature (that we will improve upon):

31 Proposal 1 (common in decision theory): Measure U,W, and w from behavior, and derive B(E) = w–1(W(E)) from it. Problem: Much and difficult work!!! Proposal 2 (common in decision analysis of the 1960s, and in modern experimental economics): measure canonical probabilities, that is, for E, find event Ep with objective probability p such that (E:100) ~ (Ep:100) = (p:100). Then B(E) = p. Problem: measuring indifferences is difficult. Proposal 3 (common in proper scoring rules): Calibration … Problem: Need many repeated observations.

) ( w–1 p = We reconsider reversed explicit expressions: ) ( U´(1–r2) U´(1–r2) w–1 B(E) = U´(1–(1–r)2) U´(1–(1–r)2) r r (1–r) (1–r) r + r + Part II. Deriving Theoretical Things from Empirical Observations of r 32 Corollary.p = B(E) if related to the same r!!

33 Our proposal takes the best of several worlds! Need not measure U,W, and w. Get "canonical probability" without measuring indifferences (BDM …; Holt 2006). Calibration without needing many repeated observations. Do all that with no more than simple proper-scoring-rule questions.

stock 20, CSM certificates dealing in sugar and bakery-ingredients.Reported probability: r = 0.75 91 91 Example (participant 25) 34 For objective probability p=0.70, also reports r = 0.75. Conclusion:B(elief) of ending in bar is 0.70! We simply measure the R(p) curves, and use their inverses: is risk correction.

35 Mutual benefits prospect theory <==> proper scoring rules We bring insights of modern prospect theory to proper scoring rules. Make them empirically more realistic. (EV is not credible in 2006 …) We bring insights of proper scoring rules to prospect theory. Make B very easy to measure and analyze. Directly implementable empirically. We did so in an experiment, and found plausible results.

36 Part III. Experimental Test of Our Correction Method

37 Method Participants. N = 93 students. Procedure. Computarized in lab. Groups of 15/16 each. 4 practice questions.

38 Stimuli 1. First we did proper scoring rule for unknown probabilities. 72 in total. For each stock two small intervals, and, third, their union. Thus, we test for additivity.

39 Stimuli 2. Known probabilities: Two 10-sided dies thrown. Yield random nr. between 01 and 100. Event E: nr.  75 (p = 3/4 = 15/20) (etc.). Done for all probabilities j/20. Motivating subjects. Real incentives. Two treatments. 1. All-pay. Points paid for all questions. 6 points = €1. Average earning €15.05. 2. One-pay (random-lottery system). One question, randomly selected afterwards, played for real. 1 point = €20. Average earning: €15.30.

40 Results

Average correction curves. 41

0.7 0.6 0.5 treatment one 0.4 0.3 0.2 treatment all 0.1 0 42 F( ρ ) 1 0.9 Individual corrections 0.8 ρ -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

43 uncorrected uncorrected Figure 9.1. Empirical density of additivity bias for the two treatments Fig. b. Treatment t=ALL Fig. a. Treatment t=ONE 160 160 140 140 corrected 120 120 100 100 80 80 60 60 corrected 40 40 20 20 0 0 0.2 0.6 0.4 0 0.2 0.4 0.6 0.6 0.4 0.2 0 0.2 0.4 0.6 For each interval [(j2.5)/100, (j+2.5)/100] of length 0.05 around j/100, we counted the number of additivity biases in the interval, aggregated over 32 stocks and 89 individuals, for both treatments. With risk-correction, there were 65 additivity biases between 0.375 and 0.425 in the treatment t=ONE, and without risk-correction there were 95 such; etc.

Summary and Conclusion • Modern risk&ambiguity theories: proper scoring rules are heavily biased. • We correct for those biases. Benefits for proper-scoring rule community and for risk- and ambiguity theories. • Experiment: correction improves quality; reduces deviations from ("rational"?) Bayesian beliefs. • Do not remove all deviations from Bayesian beliefs. Beliefs are genuinely nonadditive/ nonBayesian/sensitive-to-ambiguity. • Proper scoring rules: the post-neuroeconomics approach for mind-reading. 44

A dapting Proper Scoring Rules for Measuring Subjective Beliefs to Nonexpected Utility

A dapting Proper Scoring Rules for Measuring Subjective Beliefs to Nonexpected Utility

Presentation Transcript

Measuring Preference: A Brief Summary of Utility Theory

OECD Guidelines on Measuring Subjective Well-being

Market Scoring Rules

CPS 196.2 Proper scoring rules

Generalized Scoring Rules

A UK Perspective of Measuring Subjective Well-being

A dapting toys with cerebral palsy

Subjective Measurement Issues: Utility/Value and Probabilities

6 Rules to Proper Colon Usage

Hanson’s Market Scoring Rules

Utility Scoring of Product Reviews

Zip Code Scoring: Targeting EITC Outreach to Delinquent Utility Customers

6 Rules to Proper Colon Usage

Evolution and Scoring Rules

Measuring Subjective Expectations in Surveys

Adult Sleep Stage Scoring Rules

A dapting de Finetti's proper scoring rules for Eliciting Bayesian Beliefs to Prospect Theory

A subjective account

Class 3: Estimating Scoring Rules for Sequence Alignment

Scoring Rules for Competitive Tendering

A dapting Proper Scoring Rules for Measuring Subjective Beliefs

Scoring Rules, Generalized Entropy, and Utility Maximization