You may believe you are a Bayesian But you are probably wrong

You may believe you are a BayesianBut you are probably wrong Stephen Senn

Outline • The four systems of statistical inference • An example of where it is good to be Bayesian • Fisher’s argument against the Neyman-Pearson approach • Examples of experts applying ‘the Bayesian’ approach • Adrian Smith and colleagues 1987 • Lindley, 1993 • Howson and Urbach, 1989 • Some theoretical reasons for hesitation • Conclusion • Why I shall (probably) still be using mongrel statistics after this conference

Warning • This talk should not be taken as an attack on the subjective Bayesian approach to statistical inference • I do not claim it is a bad approach • I do claim it can be very difficult and perhaps dangerous to rely on it as the only approach

Four systems (Barnard) • Fisherian • Neyman-Pearson • Jeffreys • Bayesian (Ramsey-De Finetti-Savage) George Barnard’s advice was to be familiar with all four

A two dimensional view of the four systems Inverse probability De Finetti Use of semi-objective prior distributions to produce inverse probabilities Jeffreys Use of subjective expectation via utility Fiducial inference Likelihood Significance tests Fisher Direct Probability Pearson Neyman Inferences Decisions

TGN1412 • A monoclonal antibody • First-in-man study on 13 March 2006 carried out by Parexel on behalf of TeGenero • In first cohort 8 volunteers • Six allocated TGN1412 and two allocated placebo • All six given TGN1412 suffered a cytokine storm

See. Senn SJ. Lessons from TGN1412. Applied Clinical Trials 2007;16(6):18-22.

A Conventional Analysis FISHER'S EXACT TEST Statistic based on the observed 2 by 2 table(x) : P(X) = Hypergeometric Prob. of the table = 0.0357 FI(X) = Fisher statistic = 6.095 Asymptotic p-value: (based on Chi-Square distribution with 1 df ) Two-sided:Pr{FI(X) .GE. 6.095} = 0.0136 One-sided:0.5 * Two-sided = 0.0068 Exact p-value and point probabilities : Two-sided:Pr{FI(X) .GE. 6.095}= Pr{P(X) .LE. 0.0357}= 0.0357 Pr{FI(X) .EQ. 6.095}= Pr{P(X) .EQ. 0.0357}= 0.0357 One-sided:Let y be the value in Row 1 and Column 1 y =6 min(Y) =4 max(Y) =6 mean(Y) = 4.500 std(Y) = 0.5669 Pr { Y .GE. 6 } = 0.0357 Pr { Y .EQ. 6 } = 0.0357

A Slightly Less Conventional Analysis Datafile: C:\Program Files\Numerical\StatXact-4.0.1\Files\Research\TGN1412.cy3 BARNARD'S UNCONDITIONAL TEST FOR DIFFERENCE OF TWO BINOMIAL PROPORTIONS Statistic based on the observed 2 by 2 table : Binomial proportion for column <Yes > : pi_1 = 1.000 Binomial proportion for column <No > : pi_2 = 0.0000 Difference of binomial proportions : Delta = pi_2 - pi_1 = -1.000 Standardized difference of binomial proportions : Delta/Stdev = -2.828 Results: ------------------------------------------------------------------------- Method P-value(1-sided) P-value( 2-sided) ------------------------------------------------------------------------- Asymp 0.0023 (Left Tail) 0.0047 Exact 0.0111 (Left Tail) 0.0113

Conclusions • “If you need statistics to prove it, I don’t believe it” • Here the problem is the reverse • You can’t prove it with statistics but everybody believes • So does this mean statistics is irrelevant? • Not if you look more closely…

Further information • Timing of adverse events • Increasing interest in using this feature in epidemiological studies • Case series methodology • Farrington and Whitaker (2006) • Also if we use background knowledge of risk of cytokine storm we come to quite different conclusions • But this is to be rather Bayesian

Fisher on Neyman-Pearson ‘Their method only leads to definite results when mathematical postulates are introduced which could only be justified as a result of extensive experience.’ Fisher to Chester Bliss 6 October 1938 (Published in Bennett, 1990) What Fisher is pointing to here is that although a null hypothesis may be more primitive than a test statistic, the same is not true of an alternative hypothesis. Thus the alternative hypothesis cannot be made the justification for choosing the test statistic

Three Examples Provided by Expert Bayesians • Two involve choice of prior distribution followed by formal Bayesian updating • Racine-Poon, Grieve, Fluehler and Smith 1987 • Lindley 1993 • One involves an intuitive assertion of the posterior result, which is claimed to be Bayesian • Howson and Urbach

Racine et al • This is a fine paper with many examples as to how the Bayesian approach can be applied in drug development • I shall just look at one of these • The analysis of the Martin and Browning (1985) Data of metoprolol • Actually, this paper is not cited by Racine et al but this is the relevant citation

Design Period 1 6 weeks Period 2 6 weeks 100 mg once daily 200 mg once daily 4 weeks Run in Randomisation 100 mg once daily 200 mg once daily 31 patients aged 65+ with diastolic blood pressure in excess of 100mmHg randomised to these sequences. DBP measured after 6 weeks and 4-8 hours after last dose.

Carry-over Problem • The period 2 values could still be being influenced by the period 1 treatment • Hence a comparison of period 1 and period 2 results would provide a biased measurement of the effect of treatment • However, if we knew what the magnitude carry-over was we could take account of it • Hence carry-over is a nuisance parameter and a prime candidate for the Bayesian approach

Unfortunately • None of the authors noted that the carry-over effect has to last for six weeks • Nor did any of the discussants whether Bayesian or frequentist • However the treatment effect only has to last for 4-8 hours • The ratio of one to the other is at least 126 • You cannot use an uninformative prior for carry-over and be coherent

Anyone who is not shocked by quantum theory has not understood a single word. Niels Bohr Anyone who is not shocked by the Bayesian theory of statistical inference has not understood a single word Stephen Senn

A Bayesian Lady Tasting Wine Paper by Lindley. Lindley, D. The Analysis of Experimental Data, Teaching Statistics, 15, 22-25 (1993) “The lady is a wine expert, testified by her being a Master (sic) of Wine, MW. She was given 6 pairs of glasses (not cups). One member of each pair contained some French claret. The other had a Californian Cabernet Sauvignon Merlot Blend.” see also Lindley, D. A Bayesian lady tasting tea. In Statistics an Appreciation, David and David (ed) Iowa State University Press (1984).

Lindley’s Prior for Wine Tasting

‘At this point I can only speak for myself though I hope many will agree with me. You may freely disagree and still be sensible.’ Lindley I do disagree Either the Lady knows something about wine or she hasn’t a clue. If she has, I think that she can repeat the trick of correct identification with high probability. If she is a charlatan, there is a small probability that she may have a fine palate

The Difference between Mathematical and Applied Statistics Mathematical statistics is full of lemmas whereas applied statistics is full of dilemmas.

Senn’s Prior for Wine Tasting

Place Your Bets • Imagine the lady has to distinguish between 20 pairs of glasses. • You are given £100,000 to place at evens either for or against the following • The lady will choose correctly in 12, 13,14, 15 or 16 pairs. • How do you choose?

An Example of Howson and Urbach’s • Consider example of die rolled 600 times • Results are • 100, sixes, fives, fours and threes • 123 twos • 77 ones • Pearson-Fisher chi-square statistic is 10.58

Howson and Urbach’s Conclusion ...one is, therefore, under no obligation to reject the null hypothesis, even though that hypothesis has pretty clearly got it badlywrong, in particular, in regard to the outcomes two and one” (p136, my italics). From the second edition

An Analysis Using Good’s Approach • Lump of probability on fair die • Symmetric Dirichlet prior over alternative • Do not commit yourself to particular value of k, (Dirichlet parameter) • Instead plot Bayes factor as function of K • This is a sort of Type II likelihood

Bayes factor as function of prior Parameter of symmetric Dirichlet

Conclusion • If you had witnessed the die being rolled you would not necessarily conclude it was unfair • If you were asked to decide whether these were results from a real die or one some philosophers had written down in a book you might decide on the latter • This is because the Dirichlet distribution could not model your prior distribution • It is somewhat unfair of H&U to claim that the frequentist approach has pretty clearly got it badly wrong • I think that they would have great difficulty honestly specifying a prior distribution that allowed them to ‘get it right’ for this example and not look foolish for others

Am I being unfair? • Yes, if my aim is to claim that Bayesian methods are particularly bad • They are not • We all make errors in our search for errors and I am no exception • No, if my aim is to counter the claim that Bayesian statistics is uniquely good • In particular if the argument is that the only requirement for inference is coherence

Perfection and Goodness • The DeFinetti theory is a theory of how to remain perfect • You have a prior probability of all possible sequences of events • As events unfold you strike out the sequences that did not occur and renormalise • This is not, however a theory of how to become good

If You are not Already a Bayesian You have a collection of priors which do not form a coherent set. You can only become Bayesian by trashing some of the priors until those that are left are coherent. But if this is a legitimate thing to do, it seems to me that it must remain a legitimate thing to do in the future. This is then a license to continue not being Bayesian. This then means that the Dutch book argument loses much of its force.

The Date of Information Problem Statistician: Here is the result of the analysis of the trial you asked me to look at. I have added the likelihood to your prior. This is the posterior distribution. Physician: Excellent! Now could you please take the results of the previous trials and do a meta-analysis? Statistician. (after a pause) There is no need. The result I gave you is the meta-analysis. The previous trials are in your prior. Physician. (after a pause). If the previous trials are in my prior, they got into my prior without your help at all. Why did I need you to help with producing the posterior?

The Bayesian Meta-Analyst’s Dilemma In general Pn-1 + Dn Pn Step 1: P0 + D1 P1, Step 2: P1 + D2P2 , Or equivalently, P0 + D1 + D2 P2 But suppose P0 already includes D1 then this analysis would be illegitimate (like analysing 50 values using a chi-square on the percentages).

The Dilemma Continued So use step 2 only. But suppose that P1 does not include D1. This would be equivalent to analysing a contingency table of 200 observations using a chi-square on the percentages. Then the principle of total information has been violated. (Note, however that according to David Miller the principle of total information seems to be an independent principle which cannot be derived from maximising expected posterior utility except by imposing very artificial additional conditions.)

Theory Elegant development based on coherence Claim that it is the only way to behave Claim to integrate all sources of information Requires (in my view) perfect temporal coherence Practice A rag-bag of computational tools Use of Bayes theorem but not therefore coherent Often surprisingly poor treatment of prior distributions Back to the drawing board allowed The Two Faces of (subjective) Bayes

My conclusion • It is highly doubtful that the strong claims for Bayesian theory are a justification for Bayesian practice • This does not mean that Bayesian statistics as practised is not useful • The applied statistician needs a method that is useful in practice and not just in theory • I remain sceptical of its claims to be the only useful statistical approach not least because admitting this to be true would still leave you sorely puzzled to do in practice

The Robot Turtle in the Corner • When the robot gets stuck the scientist gets up and gives it a kick • The robot does not know it is stuck • To avoid being stuck in the inferential corner it is useful for us to have different ways of making inferences • Where they disagree there is a warning that it is time to do some creative thinking

Where this leaves me • Bayesian approach is excellent when you have to make decisions • If you are going to uses frequentist approaches to decision making you may need to use stopping rules • However, stopping rule adjustments are not a good way to summarise evidence • And the same is true of Bayesian analyses • I like randomisation but don’t make a fetish of it • I like the likelihood principle but don’t make a fetish of it • No (current) single approach to statistical inference seems to fit my needs as a jobbing statistician • I like (to the limits of my lesser ability) following George Barnard’s advice of being prepared to consider four

Finally Frequentists think that it is the thought that counts whereas Bayesians count the thoughts.

You may believe you are a Bayesian But you are probably wrong

You may believe you are a Bayesian But you are probably wrong

Presentation Transcript

Are You Saved? Are You Sure?

Are You

Are You

ARE YOU

Are you a Skunk? Are you a Turtle?

ARE YOU . . .

Are you

You are doing it wrong. Maybe.

Are You Sure You Are A Christian?

Are you a

Are You

…are you?

ARE YOU:

ARE YOU…

Are you

If you are wrong Admit it

Are You

YOU ARE BETTER THAN YOU THINK YOU ARE

Are You

Are You Prepared ? Are You Sure ?

5 Eyebrow Mistakes You Are Probably Making