1 / 29

# MSc Methods XX: YY - PowerPoint PPT Presentation

MSc Methods XX: YY. Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel: 7670 0592 Email: [email protected] www.geog.ucl.ac.uk /~ mdisney. Lecture outline. Two parameter estimation Some stuff Uncertainty & linear approximations

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' MSc Methods XX: YY' - tavita

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### MSc Methods XX: YY

Dr. Mathias (Mat) Disney

UCL Geography

Office: 113, Pearson Building

Tel: 7670 0592

Email: [email protected]

www.geog.ucl.ac.uk/~mdisney

• Two parameter estimation

• Some stuff

• Uncertainty & linear approximations

• parameter estimation, uncertainty

• Practical – basic Bayesian estimation

• Linear Models

• parameter estimation, uncertainty

• Practical – basic Bayesian estimation

• Example: signal in the presence of background noise

• Very common problem: e.g. peak of lidar return from forest canopy? Presence of a star against a background? Transitioning planet?

A

B

0

x

See p 35-60 in Sivia & Skilling

• Data are e.g. photon counts in a particular channel, so expect count in kth channel Nk to be where A, B are signal and background

• Assume peak is Gaussian (for now), width w, centered on xo so ideal datum Dk then given by

• Where n0 is constant (integration time). Unlike Nk, Dk not a whole no., so actual datum some integer close to Dk

• Poisson distribution is pdf which represents this property i.e.

• Poisson: prob. of N events occurring over some fixed time interval if events occur at a known rate independently of time of previous event

• If expected number over a given interval is D, prob. of exactly N events

• Used in discrete counting experiments, particularly cases where large number of outcomes, each of which is rare (law of rare events) e.g.

• Nuclear decay

• No. of calls arriving at a call centre per minute – large number arriving BUT rare from POV of general population….

See practical page for poisson_plot.py

• So likelihood for datum Nk is

• Where I includes reln. between expected counts Dk and A, B i.e. for our Gaussian model, xo, w, no are given (as is xk).

• IF data are independent, then likelihood over all M data is just product of probs of individual measurements i.e.

• As usual, we want posterior pdf of A, B given {Nk}, I

• Prior? Neither A, nor B can be –ve so most naïve prior pdf is

• To calc constant we need Amax, Bmax but we may assume they are large enough not to cut off posterior pdf i.e. Is effectively 0 by then

• So, log of posterior

• And, as before, we want A, B to maximise L

• Reliability is width of posterior about that point

• ‘Generate’ experimental data (see practical)

• n0 chosen to give max expectation Dk = 100. Why do Nk > 100?

• Posterior PDF is now 2D

• Max L A=1.11, B=1.89 (actual 1.09, 1.93)

• Changing the experimental setup?

• E.g. reducing counts per bin (SNR) e.g. because of shorter integration time, lower signal threshold etc.

Same signal, but data look much noisier – broader PDF

Truncated at 0 – prior important

• Changing the experimental setup?

• Increasing number of bins (same count rate, but spread out over twice measurement range)

Much narrower posterior PDF

BUT reduction mostly in B

• More data, so why uncertainty in A, B not reduced equally?

• Data far from origin only tell you about background

• Conversely – restrict range of x over which data are collected (fewer bins) it is hard to distinguish A from B (signal from noise)

• Skewed & high correlation between A, B

• If only interested in A then according to marginalisation rule integrate joint posterior PDF wrt B i.e.

• So

• See previous experimental cases…..

2

1

15 bins, ~10 counts maximum

15 bins, ~100 counts maximum

• Marginal conditional

• Marginal pdf: takes into account prior ignorance of B

• Conditional pdf: assumes we know B e.g. via calibration

• Least difference when measurements made far from A (3)

• Most when data close to A (4)

4

3

7 bins, ~100 counts maximum

31 bins, ~100 counts maximum

Max??

• Posterior L shows reliability of parameters & we want optimal

• For parameters {Xj}, with post.

• Optimal {Xoj} is set of simultaneous eqns

• For i = 1, 2, …. Nparams

• So for log of P i.e. and for 2 parameters we want

• where

Sivia & Skilling (2006) Chapter 3, p 35-51

• To estimate reliability of best estimate we want spread of P about (Xo, Yo)

• Do this using Taylor expansion i.e.

• Or

• So for the first three terms (to quadratic) we have

• Ignore (X-Xo) and (Y-Yo) terms as expansion is about maximum

Sivia & Skilling (2008) Chapter 3, p 35-51

http://en.wikipedia.org/wiki/Taylor_series

• So mainly concerned with quadratic terms. Rephrase via matrices

• Where

Y

• Contour of Q in X-Y plane i.e. line of constant L

• Orientation and eccentricity determined by A, B, C

• Directions e1 and e2 are the eigenvectors of 2nd derivative matrices A, B, C

e2

Q=k

Yo

e1

X

Xo

Sivia & Skilling (2008) Chapter 3, p 35-51

• So (x,y) component of e1 and e2 given by solutions of

• Where eigenvalues λ1 and λ2 are 1/k2 (& k1,2 are widths of ellipse along principal directions)

• If (Xo, Yo) is maximum then λ1 and λ2< 0

• So A < 0, B < 0 and AB > C2

• So if C ≠ 0 then ellipse not aligned to axes, and how do we estimate error bars on Xo, Yo?

• We can get rid of parameters we don’t want (Y for e.g.) by integrating i.e.

Sivia & Skilling (2008) Chapter 3, p 35-51

• And then use Taylor again &

• So (see S&S p 46 & Appendix)

• And so marginal distn. for X is just Gaussian with best estimate (mean) Xo and uncertainty (SD)

• So all fine and we can calculate uncertainty……right?

Sivia & Skilling (2008) Chapter 3, p 35-51

e2

• Note AB-C2 is determinant of and is λ1 x λ2

• So if λ1or λ2 0 then AB-C20 and σX and σY∞

• Oh dear……

• So consider variance of posterior

• Where μ is mean

• For a 1D normal distribution this gives

• For 2D case (X,Y) here

• Which we have from before. Same for Y so…..

e1

Sivia & Skilling (2008) Chapter 3, p 35-51

• Consider covariance σ2XY

• Which describes correlation between X and Y and if estimate of X has little/no effect on estimate of Y then

• And, using Taylor as before

• So in matrix notation

• Where we remember that

Sivia & Skilling (2008) Chapter 3, p 35-51

• Covariance (or variance-covariance) matrix describes covariance of error terms

• When C = 0, σ2XY= 0 and no correlation, and e1 and e2 aligned with axes

• If C increases (relative to A, B), posterior pdf becomes more skewed and elliptical - rotated at angle ± tan-1(√A/B)

Large, +ve correlation

Large, -ve correlation

C=0, X, Y uncorrelated

After Sivia & Skilling (2008) fig 3.7 p. 48

• As correlation grows, if C =(AB)1/2then contours infinitely wide in one direction (except for prior bounds)

• In this case σX and σY v. large (i.e. very unreliable parameter estimates)

• BUT large off-diagonals in covariance matrix mean we can estimate linear combinations of parameters

• For –ve covariance, posterior wide in direction Y=-mX, where m=(A/B)1/2 but narrow perpendicular to axis along Y+mX = c

• For +vecorrelation most info. on Y-mXbut not Y + X/m

After Sivia & Skilling (2008) fig 3.7 p. 48

• Seen the 2 param case, so what about generalisation of Taylor quadratic approximation to M params?

• Remember, we want {Xoj} to maximise L, (log) posterior pdf

• Rephrase in matrix form Xo i.e. for i = 1, 2, …. M we want

• Extension of Taylor expansion to M variables is

• So if X is an M x 1 column vector and ignoring higher terms, exponential of posterior pdf is

Sivia & Skilling (2008) Chapter 3, p 35-51

• Where is a symmetric M x M matrix of 2nd derivatives

• And (X-Xo)Tis the transpose of (X-Xo) (a row vector)

• So this is generalisation of Q from 2D case

• And contour map from before is just a 2D slice through our now M dimensional parameter space

• Constant of proportionality is

Sivia & Skilling (2008) Chapter 3, p 35-51

• So what are the implications of all of this??

• Maximum of M parameter posterior PDF is Xo & we know

• Compare to 2D case & see is analogous to -1/σ2

• Can show that generalised case for covariance matrix σ2 is

• Square root of diagonals (i=j) give marginal error bars and off-diagonals (i≠j) decribe correlations between parameters

• So covariance matrix contains most information describing model fit AND faith we have in parameter estimates

Sivia & Skilling (2008) Chapter 3, p 35-51

• Sivia & Skilling make the important point (p50) that inverse of diagonal elements of matrix≠ diagonal of inverse of matrix

• i.e. do NOT try and estimate value / spread of one parameter in M dim case by holding all others fixed at optimal values

Incorrect ‘best fit’ σii

Xj

• Need to include marginalisation to get correct magnitude for uncertainty

• Discussion of multimodal and asymmetric posterior PDF for which Gaussian is not good approx

• S&S p51….

Xoj

σii

Xi

After Sivia & Skilling (2008) p50.

• We have seen that we can express condition for best estimate of set of M parameters {Xj} very compactly as

• Where jth element of is (log posterior pdf) evaluated at (X=Xo)

• So this is set of simultaneous equations, which, IF they are linear i.e.

• Then can use linear algebra methods to solve i.e.

• This is the power (joy?) of linearity! Will see more on this later

• Even if system not linear, we can often approximate as linear over some limited domain to allow linear methods to be used

• If not, then we have to use other (non-linear) methods…..

• Two parameter eg: Gaussian peak + background

• Solve via Bayes’ T using Taylor expansion (to quadratic)

• Issues over experimental setup

• Integration time, number of bins, size etc.

• Impact on posterior PDF

• Can use linear methods to derive uncertainty estimates and explore correlation between parameters

• Extend to multi-dimensional case using same method

• Be careful when dealing with uncertainty

• KEY: not always useful to look for summary statistics – if in doubt look at the posterior PDF – this gives full description