1 / 29

MSc Methods XX: YY

MSc Methods XX: YY. Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel: 7670 0592 Email: mdisney@ucl.geog.ac.uk www.geog.ucl.ac.uk /~ mdisney. Lecture outline. Two parameter estimation Some stuff Uncertainty & linear approximations

tavita
Download Presentation

MSc Methods XX: YY

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MSc Methods XX: YY Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel: 7670 0592 Email: mdisney@ucl.geog.ac.uk www.geog.ucl.ac.uk/~mdisney

  2. Lecture outline • Two parameter estimation • Some stuff • Uncertainty & linear approximations • parameter estimation, uncertainty • Practical – basic Bayesian estimation • Linear Models • parameter estimation, uncertainty • Practical – basic Bayesian estimation

  3. Parameter estimation continued • Example: signal in the presence of background noise • Very common problem: e.g. peak of lidar return from forest canopy? Presence of a star against a background? Transitioning planet? A B 0 x See p 35-60 in Sivia & Skilling

  4. Gaussian peak + background • Data are e.g. photon counts in a particular channel, so expect count in kth channel Nk to be where A, B are signal and background • Assume peak is Gaussian (for now), width w, centered on xo so ideal datum Dk then given by • Where n0 is constant (integration time). Unlike Nk, Dk not a whole no., so actual datum some integer close to Dk • Poisson distribution is pdf which represents this property i.e.

  5. Aside: Poisson distribution • Poisson: prob. of N events occurring over some fixed time interval if events occur at a known rate independently of time of previous event • If expected number over a given interval is D, prob. of exactly N events • Used in discrete counting experiments, particularly cases where large number of outcomes, each of which is rare (law of rare events) e.g. • Nuclear decay • No. of calls arriving at a call centre per minute – large number arriving BUT rare from POV of general population…. See practical page for poisson_plot.py

  6. Gaussian peak + background • So likelihood for datum Nk is • Where I includes reln. between expected counts Dk and A, B i.e. for our Gaussian model, xo, w, no are given (as is xk). • IF data are independent, then likelihood over all M data is just product of probs of individual measurements i.e. • As usual, we want posterior pdf of A, B given {Nk}, I

  7. Gaussian peak + background • Prior? Neither A, nor B can be –ve so most naïve prior pdf is • To calc constant we need Amax, Bmax but we may assume they are large enough not to cut off posterior pdf i.e. Is effectively 0 by then • So, log of posterior • And, as before, we want A, B to maximise L • Reliability is width of posterior about that point

  8. Gaussian peak + background • ‘Generate’ experimental data (see practical) • n0 chosen to give max expectation Dk = 100. Why do Nk > 100?

  9. Gaussian peak + background • Posterior PDF is now 2D • Max L A=1.11, B=1.89 (actual 1.09, 1.93)

  10. Gaussian peak + background • Changing the experimental setup? • E.g. reducing counts per bin (SNR) e.g. because of shorter integration time, lower signal threshold etc. Same signal, but data look much noisier – broader PDF Truncated at 0 – prior important

  11. Gaussian peak + background • Changing the experimental setup? • Increasing number of bins (same count rate, but spread out over twice measurement range) Much narrower posterior PDF BUT reduction mostly in B

  12. Gaussian peak + background • More data, so why uncertainty in A, B not reduced equally? • Data far from origin only tell you about background • Conversely – restrict range of x over which data are collected (fewer bins) it is hard to distinguish A from B (signal from noise) • Skewed & high correlation between A, B

  13. Marginal distribution • If only interested in A then according to marginalisation rule integrate joint posterior PDF wrt B i.e. • So • See previous experimental cases….. 2 1 15 bins, ~10 counts maximum 15 bins, ~100 counts maximum

  14. Marginal distribution • Marginal conditional • Marginal pdf: takes into account prior ignorance of B • Conditional pdf: assumes we know B e.g. via calibration • Least difference when measurements made far from A (3) • Most when data close to A (4) 4 3 7 bins, ~100 counts maximum 31 bins, ~100 counts maximum

  15. Uncertainty Max?? • Posterior L shows reliability of parameters & we want optimal • For parameters {Xj}, with post. • Optimal {Xoj} is set of simultaneous eqns • For i = 1, 2, …. Nparams • So for log of P i.e. and for 2 parameters we want • where Sivia & Skilling (2006) Chapter 3, p 35-51

  16. Uncertainty • To estimate reliability of best estimate we want spread of P about (Xo, Yo) • Do this using Taylor expansion i.e. • Or • So for the first three terms (to quadratic) we have • Ignore (X-Xo) and (Y-Yo) terms as expansion is about maximum Sivia & Skilling (2008) Chapter 3, p 35-51 http://en.wikipedia.org/wiki/Taylor_series

  17. Uncertainty • So mainly concerned with quadratic terms. Rephrase via matrices • For quadratic term, Q • Where Y • Contour of Q in X-Y plane i.e. line of constant L • Orientation and eccentricity determined by A, B, C • Directions e1 and e2 are the eigenvectors of 2nd derivative matrices A, B, C e2 Q=k Yo e1 X Xo Sivia & Skilling (2008) Chapter 3, p 35-51

  18. Uncertainty • So (x,y) component of e1 and e2 given by solutions of • Where eigenvalues λ1 and λ2 are 1/k2 (& k1,2 are widths of ellipse along principal directions) • If (Xo, Yo) is maximum then λ1 and λ2< 0 • So A < 0, B < 0 and AB > C2 • So if C ≠ 0 then ellipse not aligned to axes, and how do we estimate error bars on Xo, Yo? • We can get rid of parameters we don’t want (Y for e.g.) by integrating i.e. Sivia & Skilling (2008) Chapter 3, p 35-51

  19. Uncertainty • And then use Taylor again & • So (see S&S p 46 & Appendix) • And so marginal distn. for X is just Gaussian with best estimate (mean) Xo and uncertainty (SD) • So all fine and we can calculate uncertainty……right? Sivia & Skilling (2008) Chapter 3, p 35-51

  20. Uncertainty e2 • Note AB-C2 is determinant of and is λ1 x λ2 • So if λ1or λ2 0 then AB-C20 and σX and σY∞ • Oh dear…… • So consider variance of posterior • Where μ is mean • For a 1D normal distribution this gives • For 2D case (X,Y) here • Which we have from before. Same for Y so….. e1 Sivia & Skilling (2008) Chapter 3, p 35-51

  21. Uncertainty • Consider covariance σ2XY • Which describes correlation between X and Y and if estimate of X has little/no effect on estimate of Y then • And, using Taylor as before • So in matrix notation • Where we remember that Sivia & Skilling (2008) Chapter 3, p 35-51

  22. Uncertainty • Covariance (or variance-covariance) matrix describes covariance of error terms • When C = 0, σ2XY= 0 and no correlation, and e1 and e2 aligned with axes • If C increases (relative to A, B), posterior pdf becomes more skewed and elliptical - rotated at angle ± tan-1(√A/B) Large, +ve correlation Large, -ve correlation C=0, X, Y uncorrelated After Sivia & Skilling (2008) fig 3.7 p. 48

  23. Uncertainty • As correlation grows, if C =(AB)1/2then contours infinitely wide in one direction (except for prior bounds) • In this case σX and σY v. large (i.e. very unreliable parameter estimates) • BUT large off-diagonals in covariance matrix mean we can estimate linear combinations of parameters • For –ve covariance, posterior wide in direction Y=-mX, where m=(A/B)1/2 but narrow perpendicular to axis along Y+mX = c • i.e. lot of information aboutY+mXbut little about Y – X/m • For +vecorrelation most info. on Y-mXbut not Y + X/m After Sivia & Skilling (2008) fig 3.7 p. 48

  24. Uncertainty • Seen the 2 param case, so what about generalisation of Taylor quadratic approximation to M params? • Remember, we want {Xoj} to maximise L, (log) posterior pdf • Rephrase in matrix form Xo i.e. for i = 1, 2, …. M we want • Extension of Taylor expansion to M variables is • So if X is an M x 1 column vector and ignoring higher terms, exponential of posterior pdf is Sivia & Skilling (2008) Chapter 3, p 35-51

  25. Uncertainty • Where is a symmetric M x M matrix of 2nd derivatives • And (X-Xo)Tis the transpose of (X-Xo) (a row vector) • So this is generalisation of Q from 2D case • And contour map from before is just a 2D slice through our now M dimensional parameter space • Constant of proportionality is Sivia & Skilling (2008) Chapter 3, p 35-51

  26. Uncertainty • So what are the implications of all of this?? • Maximum of M parameter posterior PDF is Xo & we know • Compare to 2D case & see is analogous to -1/σ2 • Can show that generalised case for covariance matrix σ2 is • Square root of diagonals (i=j) give marginal error bars and off-diagonals (i≠j) decribe correlations between parameters • So covariance matrix contains most information describing model fit AND faith we have in parameter estimates Sivia & Skilling (2008) Chapter 3, p 35-51

  27. Uncertainty • Sivia & Skilling make the important point (p50) that inverse of diagonal elements of matrix≠ diagonal of inverse of matrix • i.e. do NOT try and estimate value / spread of one parameter in M dim case by holding all others fixed at optimal values Incorrect ‘best fit’ σii Xj • Need to include marginalisation to get correct magnitude for uncertainty • Discussion of multimodal and asymmetric posterior PDF for which Gaussian is not good approx • S&S p51…. Xoj σii Xi After Sivia & Skilling (2008) p50.

  28. Summary • We have seen that we can express condition for best estimate of set of M parameters {Xj} very compactly as • Where jth element of is (log posterior pdf) evaluated at (X=Xo) • So this is set of simultaneous equations, which, IF they are linear i.e. • Then can use linear algebra methods to solve i.e. • This is the power (joy?) of linearity! Will see more on this later • Even if system not linear, we can often approximate as linear over some limited domain to allow linear methods to be used • If not, then we have to use other (non-linear) methods…..

  29. Summary • Two parameter eg: Gaussian peak + background • Solve via Bayes’ T using Taylor expansion (to quadratic) • Issues over experimental setup • Integration time, number of bins, size etc. • Impact on posterior PDF • Can use linear methods to derive uncertainty estimates and explore correlation between parameters • Extend to multi-dimensional case using same method • Be careful when dealing with uncertainty • KEY: not always useful to look for summary statistics – if in doubt look at the posterior PDF – this gives full description

More Related