- 86 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' MSc Methods XX: YY' - tavita

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### MSc Methods XX: YY

Dr. Mathias (Mat) Disney

UCL Geography

Office: 113, Pearson Building

Tel: 7670 0592

Email: [email protected]

www.geog.ucl.ac.uk/~mdisney

- Two parameter estimation
- Some stuff

- Uncertainty & linear approximations
- parameter estimation, uncertainty
- Practical – basic Bayesian estimation

- Linear Models
- parameter estimation, uncertainty
- Practical – basic Bayesian estimation

Parameter estimation continued

- Example: signal in the presence of background noise
- Very common problem: e.g. peak of lidar return from forest canopy? Presence of a star against a background? Transitioning planet?

A

B

0

x

See p 35-60 in Sivia & Skilling

- Data are e.g. photon counts in a particular channel, so expect count in kth channel Nk to be where A, B are signal and background
- Assume peak is Gaussian (for now), width w, centered on xo so ideal datum Dk then given by
- Where n0 is constant (integration time). Unlike Nk, Dk not a whole no., so actual datum some integer close to Dk
- Poisson distribution is pdf which represents this property i.e.

- Poisson: prob. of N events occurring over some fixed time interval if events occur at a known rate independently of time of previous event
- If expected number over a given interval is D, prob. of exactly N events

- Used in discrete counting experiments, particularly cases where large number of outcomes, each of which is rare (law of rare events) e.g.
- Nuclear decay
- No. of calls arriving at a call centre per minute – large number arriving BUT rare from POV of general population….

See practical page for poisson_plot.py

- So likelihood for datum Nk is
- Where I includes reln. between expected counts Dk and A, B i.e. for our Gaussian model, xo, w, no are given (as is xk).
- IF data are independent, then likelihood over all M data is just product of probs of individual measurements i.e.
- As usual, we want posterior pdf of A, B given {Nk}, I

- Prior? Neither A, nor B can be –ve so most naïve prior pdf is
- To calc constant we need Amax, Bmax but we may assume they are large enough not to cut off posterior pdf i.e. Is effectively 0 by then
- So, log of posterior
- And, as before, we want A, B to maximise L
- Reliability is width of posterior about that point

- ‘Generate’ experimental data (see practical)
- n0 chosen to give max expectation Dk = 100. Why do Nk > 100?

- Posterior PDF is now 2D
- Max L A=1.11, B=1.89 (actual 1.09, 1.93)

- Changing the experimental setup?
- E.g. reducing counts per bin (SNR) e.g. because of shorter integration time, lower signal threshold etc.

Same signal, but data look much noisier – broader PDF

Truncated at 0 – prior important

- Changing the experimental setup?
- Increasing number of bins (same count rate, but spread out over twice measurement range)

Much narrower posterior PDF

BUT reduction mostly in B

- More data, so why uncertainty in A, B not reduced equally?
- Data far from origin only tell you about background
- Conversely – restrict range of x over which data are collected (fewer bins) it is hard to distinguish A from B (signal from noise)
- Skewed & high correlation between A, B

- If only interested in A then according to marginalisation rule integrate joint posterior PDF wrt B i.e.
- So
- See previous experimental cases…..

2

1

15 bins, ~10 counts maximum

15 bins, ~100 counts maximum

- Marginal conditional
- Marginal pdf: takes into account prior ignorance of B
- Conditional pdf: assumes we know B e.g. via calibration
- Least difference when measurements made far from A (3)
- Most when data close to A (4)

4

3

7 bins, ~100 counts maximum

31 bins, ~100 counts maximum

Max??

- Posterior L shows reliability of parameters & we want optimal
- For parameters {Xj}, with post.
- Optimal {Xoj} is set of simultaneous eqns
- For i = 1, 2, …. Nparams
- So for log of P i.e. and for 2 parameters we want
- where

Sivia & Skilling (2006) Chapter 3, p 35-51

- To estimate reliability of best estimate we want spread of P about (Xo, Yo)
- Do this using Taylor expansion i.e.
- Or
- So for the first three terms (to quadratic) we have
- Ignore (X-Xo) and (Y-Yo) terms as expansion is about maximum

Sivia & Skilling (2008) Chapter 3, p 35-51

http://en.wikipedia.org/wiki/Taylor_series

- So mainly concerned with quadratic terms. Rephrase via matrices
- For quadratic term, Q
- Where

Y

- Contour of Q in X-Y plane i.e. line of constant L
- Orientation and eccentricity determined by A, B, C
- Directions e1 and e2 are the eigenvectors of 2nd derivative matrices A, B, C

e2

Q=k

Yo

e1

X

Xo

Sivia & Skilling (2008) Chapter 3, p 35-51

- So (x,y) component of e1 and e2 given by solutions of
- Where eigenvalues λ1 and λ2 are 1/k2 (& k1,2 are widths of ellipse along principal directions)
- If (Xo, Yo) is maximum then λ1 and λ2< 0
- So A < 0, B < 0 and AB > C2
- So if C ≠ 0 then ellipse not aligned to axes, and how do we estimate error bars on Xo, Yo?
- We can get rid of parameters we don’t want (Y for e.g.) by integrating i.e.

Sivia & Skilling (2008) Chapter 3, p 35-51

- And then use Taylor again &
- So (see S&S p 46 & Appendix)
- And so marginal distn. for X is just Gaussian with best estimate (mean) Xo and uncertainty (SD)
- So all fine and we can calculate uncertainty……right?

Sivia & Skilling (2008) Chapter 3, p 35-51

e2

- Note AB-C2 is determinant of and is λ1 x λ2
- So if λ1or λ2 0 then AB-C20 and σX and σY∞
- Oh dear……
- So consider variance of posterior
- Where μ is mean
- For a 1D normal distribution this gives
- For 2D case (X,Y) here
- Which we have from before. Same for Y so…..

e1

Sivia & Skilling (2008) Chapter 3, p 35-51

- Consider covariance σ2XY
- Which describes correlation between X and Y and if estimate of X has little/no effect on estimate of Y then
- And, using Taylor as before
- So in matrix notation
- Where we remember that

Sivia & Skilling (2008) Chapter 3, p 35-51

- Covariance (or variance-covariance) matrix describes covariance of error terms
- When C = 0, σ2XY= 0 and no correlation, and e1 and e2 aligned with axes
- If C increases (relative to A, B), posterior pdf becomes more skewed and elliptical - rotated at angle ± tan-1(√A/B)

Large, +ve correlation

Large, -ve correlation

C=0, X, Y uncorrelated

After Sivia & Skilling (2008) fig 3.7 p. 48

- As correlation grows, if C =(AB)1/2then contours infinitely wide in one direction (except for prior bounds)
- In this case σX and σY v. large (i.e. very unreliable parameter estimates)
- BUT large off-diagonals in covariance matrix mean we can estimate linear combinations of parameters
- For –ve covariance, posterior wide in direction Y=-mX, where m=(A/B)1/2 but narrow perpendicular to axis along Y+mX = c
- i.e. lot of information aboutY+mXbut little about Y – X/m
- For +vecorrelation most info. on Y-mXbut not Y + X/m

After Sivia & Skilling (2008) fig 3.7 p. 48

- Seen the 2 param case, so what about generalisation of Taylor quadratic approximation to M params?
- Remember, we want {Xoj} to maximise L, (log) posterior pdf
- Rephrase in matrix form Xo i.e. for i = 1, 2, …. M we want
- Extension of Taylor expansion to M variables is
- So if X is an M x 1 column vector and ignoring higher terms, exponential of posterior pdf is

Sivia & Skilling (2008) Chapter 3, p 35-51

- Where is a symmetric M x M matrix of 2nd derivatives
- And (X-Xo)Tis the transpose of (X-Xo) (a row vector)
- So this is generalisation of Q from 2D case
- And contour map from before is just a 2D slice through our now M dimensional parameter space
- Constant of proportionality is

Sivia & Skilling (2008) Chapter 3, p 35-51

- So what are the implications of all of this??
- Maximum of M parameter posterior PDF is Xo & we know
- Compare to 2D case & see is analogous to -1/σ2
- Can show that generalised case for covariance matrix σ2 is
- Square root of diagonals (i=j) give marginal error bars and off-diagonals (i≠j) decribe correlations between parameters
- So covariance matrix contains most information describing model fit AND faith we have in parameter estimates

Sivia & Skilling (2008) Chapter 3, p 35-51

- Sivia & Skilling make the important point (p50) that inverse of diagonal elements of matrix≠ diagonal of inverse of matrix
- i.e. do NOT try and estimate value / spread of one parameter in M dim case by holding all others fixed at optimal values

Incorrect ‘best fit’ σii

Xj

- Need to include marginalisation to get correct magnitude for uncertainty
- Discussion of multimodal and asymmetric posterior PDF for which Gaussian is not good approx
- S&S p51….

Xoj

σii

Xi

After Sivia & Skilling (2008) p50.

- We have seen that we can express condition for best estimate of set of M parameters {Xj} very compactly as
- Where jth element of is (log posterior pdf) evaluated at (X=Xo)
- So this is set of simultaneous equations, which, IF they are linear i.e.
- Then can use linear algebra methods to solve i.e.
- This is the power (joy?) of linearity! Will see more on this later
- Even if system not linear, we can often approximate as linear over some limited domain to allow linear methods to be used
- If not, then we have to use other (non-linear) methods…..

- Two parameter eg: Gaussian peak + background
- Solve via Bayes’ T using Taylor expansion (to quadratic)
- Issues over experimental setup
- Integration time, number of bins, size etc.
- Impact on posterior PDF

- Can use linear methods to derive uncertainty estimates and explore correlation between parameters
- Extend to multi-dimensional case using same method
- Be careful when dealing with uncertainty
- KEY: not always useful to look for summary statistics – if in doubt look at the posterior PDF – this gives full description

Download Presentation

Connecting to Server..