1 / 27

Developments in Bayesian Priors

Developments in Bayesian Priors. Roger Barlow Manchester IoP meeting November 16 th 2005. Plan. Probability Frequentist Bayesian Bayes Theorem Priors Prior pitfalls (1): Le Diberder Prior pitfalls (2): Heinrich Jeffreys’ Prior Fisher Information Reference Priors: Demortier.

gaius
Download Presentation

Developments in Bayesian Priors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developments in Bayesian Priors Roger Barlow Manchester IoP meeting November 16th 2005

  2. Plan • Probability • Frequentist • Bayesian • Bayes Theorem • Priors • Prior pitfalls (1): Le Diberder • Prior pitfalls (2): Heinrich • Jeffreys’ Prior • Fisher Information • Reference Priors: Demortier Roger Barlow: Developments in Bayesian Priors

  3. Probability Probability as limit of frequency P(A)= Limit NA/Ntotal Usual definition taught to students Makes sense Works well most of the time- But not all Roger Barlow: Developments in Bayesian Priors

  4. Frequentist probability “It will probably rain tomorrow.” “ Mt=174.3±5.1 GeV means the top quark mass lies between 169.2 and 179.4, with 68% probability.” “The statement ‘It will rain tomorrow.’ is probably true.” “Mt=174.3±5.1 GeV means: the top quark mass lies between 169.2 and 179.4, at 68% confidence.” Roger Barlow: Developments in Bayesian Priors

  5. Bayesian Probability P(A) expresses my belief that A is true Limits 0(impossible) and 1 (certain) Calibrated off clear-cut instances (coins, dice, urns) Roger Barlow: Developments in Bayesian Priors

  6. Frequentist versus Bayesian? Two sorts of probability – totally different. (Bayesian probability also known as Inverse Probability.) Rivals? Religious differences? Particle Physicists tend to be frequentists. Cosmologists tend to be Bayesians No. Two different tools for practitioners Important to: • Be aware of the limits and pitfalls of both • Always be aware which you’re using Roger Barlow: Developments in Bayesian Priors

  7. Bayes Theorem (1763) P(A|B) P(B) = P(A and B) = P(B|A) P(A) P(A|B)=P(B|A) P(A) P(B) Frequentist use eg Čerenkov counter P( | signal)=P(signal | ) P() / P(signal) Bayesian use P(theory |data) = P(data | theory) P(theory) P(data) Roger Barlow: Developments in Bayesian Priors

  8. Bayesian Prior P(theory) is the Prior Expresses prior belief theory is true Can be function of parameter: P(Mtop), P(MH), P(α,β,γ) Bayes’ Theorem describes way prior belief is modified by experimental data But what do you take as initial prior? Roger Barlow: Developments in Bayesian Priors

  9. Uniform Prior General usage: choose P(a) uniform in a (principle of insufficient reason) Often ‘improper’: ∫P(a)da =∞. Though posterior P(a|x) comes out sensible BUT! If P(a) uniform, P(a2) , P(ln a) , P(√a).. are not Insufficient reason not valid (unless a is ‘most fundamental’ – whatever that means) Statisticians handle this: check results for ‘robustness’ under different priors Roger Barlow: Developments in Bayesian Priors

  10. Example – Le Diberder Sad Story Fitting CKM angle αfrom B 6 observables 3 amplitudes: 6 unknown parameters (magnitudes, phases) αis the fundamentally interesting one Roger Barlow: Developments in Bayesian Priors

  11. Results Frequentist Bayesian Set one phase to zero Uniform priors in other two phases and 3 magnitudes Roger Barlow: Developments in Bayesian Priors

  12. More Results Bayesian Parametrise Tree and Penguin amplitudes Bayesian 3 Amplitudes: 3 real parts, 3 Imaginary parts Roger Barlow: Developments in Bayesian Priors

  13. Interpretation • B shows same (mis)behaviour • Removing all experimental info gives similar P(α) • The curse of high dimensions is at work Uniformity in x,y,z makes P(r) peak at large r This result is not robust under changes of prior Roger Barlow: Developments in Bayesian Priors

  14. Example - Heinrich CDF statistics group looking at problem of estimating signal cross section S in presence of background and efficiency. N= εS+b Efficiency and Background from separate calibration experiments (sidebands or MC). Scaling factors κ, ω are known. Everything done using Bayesian methods with uniform priors and Poisson statistics formula. Calibration experiments use uniform prior for εand for b, yielding posteriors used for S P(N|S)=(1/N!)∫∫e-(εS+b) (εS+b )N P(ε) P(b) dε db Check coverage – all fine Roger Barlow: Developments in Bayesian Priors

  15. But it all goes pear shaped.. If particle decays in several channels Hγγ H τ+τ- Hbb Each channel with different b and ε: total 2N+1 parameters, 2N+1 experiments Heavy undercoverage! e.g. with 4 channels, all ε=25±10%, b=0.75±0.25 For s≈10 get ’90% upper limit’ above s in only 80% of cases 100% 90% 10 20 S Roger Barlow: Developments in Bayesian Priors

  16. The curse strikes again Uniform prior in ε: fine Uniform prior in ε1, ε2… εN • εN-1 prior in total ε Prejudice in favour of high efficiency Signal size downgraded Roger Barlow: Developments in Bayesian Priors

  17. Happy ending Effect avoided by using Jeffreys’ Priors instead of uniform priors for εand b Not uniform but like 1/ε, 1/b Not entirely realistic but interesting Uniform prior in S is not a problem – but maybe should consider 1/√S? Coverage (a very frequentist concept) is a useful tool for Bayesians Roger Barlow: Developments in Bayesian Priors

  18. Fisher Information An informative experiment is one for which a measurement of x will give precise information about the parameter a. Quantify: I(a)= -<2 ln L/a2> (Second derivative – curvature) P(x,a): everything P(x)|a is the pdf P(a)|x is the likelihood L(a) Roger Barlow: Developments in Bayesian Priors

  19. Jeffreys’ Prior A prior may be uniform in a – but if I(a) depends on a it’s still not ‘flat’: special values of a give better measurements Transform aa’ such that I(a’) is constant. Then choose a uniform prior • location parameter – uniform prior OK • scale parameter – a’ is ln a. prior 1/a • Poisson mean – prior 1/√a Roger Barlow: Developments in Bayesian Priors

  20. Objective Prior? Jeffreys called this an ‘objective’ prior as opposed to ‘subjective’ or straight guesswork, but not everyone was convinced For statisticians ‘flat prior’ means Jeffreys prior. For physicists it means uniform prior Prior depends on likelihood. Your ‘prior belief’ P(MH) (or whatever) depends on the analysis Equivalent to a prior proportional to √I Roger Barlow: Developments in Bayesian Priors

  21. Reference Priors (Demortier) 4 steps • Intrinsic Discrepancy Between two PDFs δ{P1(z),P2(z)}=Min{∫P1(z)ln(P1(z)/P2(z)) dz, ∫P2(z)ln(P2(z)/P1(z))dz} Sensible measure of difference δ=0 iff P1(z) & P2(z) are the same, else +ve Invariant under all transformations of z Roger Barlow: Developments in Bayesian Priors

  22. Reference Priors (2) 2) Expected Intrinsic Information Measurement M: x is sampled from p(x|a) Parameter a has a prior p(a) Joint distribution p(x,a)=p(x|a) p(a) Marginal distribution p(x)=∫p(x|a) p(a) da I(p(a),M)=δ{p(x,a),p(x)p(a)} Depends on (i) x-a relationship and (ii) breadth of p(a) Expected Intrinsic (Shannon) Information from measurement M about parameter a Roger Barlow: Developments in Bayesian Priors

  23. Reference Priors (3) 3) Missing information Measurement Mk – k samples of x Enough measurements fix a completely Limitk∞ I(p(a),Mk) is the difference between knowledge encapsulated in prior p(a) and complete knowledge of a. Hence Missing Information given p(a). Roger Barlow: Developments in Bayesian Priors

  24. Reference Priors(4) 4) Family of priors P (e.g. Fourier series, polynomials, histogram). p(a)P Ignorance principle: choose the least informative (dumbest) prior in the family: the one for which the missing information Limitk∞ I(p(a),Mk) is largest. Technical difficulties in taking k limit and integrating over infinite range of a Roger Barlow: Developments in Bayesian Priors

  25. Family of Priors (Google) Roger Barlow: Developments in Bayesian Priors

  26. Reference Priors Do not represent subjective belief – in fact the opposite (like a jury selection). Allow the most input to come from the data. Formal consensus practitioners can use to arrive at sensible posterior Depend on measurement p(x|a) – cf Jeffreys Also require family of P of possible priors May be improper but this doesn’t matter (do not represent…). For 1 parameter (if measurement is asymptoticallly Gaussian, which the CLT usually secures) give Jeffreys prior But can also (unlike Jeffreys) work for several parameters Roger Barlow: Developments in Bayesian Priors

  27. Summary • Probability • Frequentist • Bayesian • Bayes Theorem • Priors • Prior pitfalls (1): Le Diberder • Prior pitfalls (2): Heinrich • Jeffreys’ Prior • Fisher Information • Reference Priors: Demortier Roger Barlow: Developments in Bayesian Priors

More Related