- By
**asta** - Follow User

- 118 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Lecture 6' - asta

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Lecture 6

- Objective functions for model fitting:
- Sum of squared residuals (=> the ‘method of least squares’).
- Likelihood
- Hypothesis testing

Model fitting – reminder of the terminology:

- We have datayi at samples of some independent variable xi.
- The model is our estimate of the parent or truth function.
- Let’s express the model m(xi) as a function of a few parametersθ1, θ2 .. θM.
- Finding the ‘best fit’ model then just means best estimates of the θ. (Bold – shorthand for a list)
- Knowledge of physics informs choice of m, θ.

The parent function –

what we’d like to

find out (but never can,

exactly).

Naive best fit calculation:

- The residuals for a particular model = yi-mi.
- To ‘thread the model through the middle of the noise’, we want the magnitudes of all residuals to be small.
- A reasonable way (not the only way) to achieve this is to define a sum of squared residuals as our objective function:
- Fitting by minimizing this objective function is called the method of least squares. It is extremely common.
- NOTE! This approach IGNORES possible uncerts in x.

But what if the noise is not homogeneous?

- Some bits clearly have higher σ than others.
- Answer: weight by 1/σ2i
- This form of U is sometimes called χ2(pronounced kai squared).
- To use it, we need to know the σi.

Simple example: mi = θ1 + θ2si

Model – red is si, green the flat background.

Contour map of Uls.

The data yi:

Truth values!

An even simpler example:

- Last lecture, I noted that there do exist cases in which we can directly invert
- For least squares, this happens if the model is a polynomial function of the parameters θi.
- Expansion of grad U in this case gives a set of M linear equations in the M parameters called the normal equations.
- It is easy to solve these to get the θi.

Simplest example of all: fitting a straight line.

- Called linear regression by the statisticians.
- There is a huge amount of literature on it.
- Normal equations for a line model turn out to be:
- Polynomial is an easy extension to this.

χ2 for Poisson data – possible, but problematic.

- Choose data yi as estimator for σi2?
- No - can have zero values in denominator.
- Choose (evolving) model as estimator for σi2?
- No - gives a biased result.
- Better: Mighell formula
- Unbiased, but no good for goodness-of-fit.
- Use Mighell to fit θ then standard U for “goodness of fit” (GOF).

Mighell K J, Ap J 518, 380 (1999)

Another choice of U: likelihood.

- Likelihood is best illustrated by Poisson data.
- Consider a single Poisson random variable y: its PDF is

where m here plays the role of the expectation value of y.

- We’re used to thinking of this as a function just of one variable, ie y;
- but it is really a function of both y and m.

The likelihood function.

- Before, we thought “given m, let us apply the PDF to obtain the probability of getting between y and y+dy.”
- Now we are saying “well we know y, we just measured it. We don’t know m. But surely the PDF taken as a function of mindicates the probability density for m.”
- Problems with this:
- Likelihood function is not necessarily normalized, like a ‘proper’ PDF;
- What assurance do we have that the true PDF for m has this shape??

Likelihood continued.

- Usually we have many (N) samples yi. Can we arrive at a single likelihood for all samples taken together?
- (Note that we’ve stopped talking just about Poisson data now – this expression is valid for any form of p.)
- Sometimes easier to deal with the log-likelihoodL:

Likelihood continued

- To get the best-fit model m, we need to maximize the likelihood (or equivalently, the log likelihood).
- If we want an objective function to minimize, it is convenient to choose –L.
- Can show that for Gaussian data, minimizing –L is equivalent to minimizing the variance-weighted sum of squared residuals (=chi squared) given before.
- Proof left as an exercise!

Poissonian/likelihood version of slide 3

Model – red is si, green the flat background.

Map of the joint likelihood L.

The data yi:

What if also errors in xi?

- Tricky… Bayes better in this case.

What next?

- In fitting a model, we want (amplifying a bit on lecture 4):
- The best fit values of the parameters;
- Then we want to know if these values are good enough!
- If not: need to go back to the drawing board and choose a new model.
- If the model passes, want uncertainties in the best-fit parameters.
- (I’ll put this off to a later lecture…)
- Number 1 is accomplished. √

How to tell if our model is correct.

- Supposing our model is absolutely accurate.
- The U value we calculate is, nevertheless, a random variable: each fresh set of data will give rise to a slightly different value of U.
- In other words, U, even in the case of a perfectly accurate model, will have some spread – in fact, like any other random variable, it will have a PDF.
- This PDF is sometimes calculable from first principles (if not, one can do a Monte Carlo to estimate it).

How to tell if our model is correct.

- The procedure is:
- First calculate the PDF for U in the ‘perfect fit’ case;
- From this curve, obtain the value of the PDF at our best-fit value of U;
- If p(Ubest fit) is very small, it is unlikely that our model is correct.
- Note that both χ2 and –L have the property that they cannot be negative.
- A model which is a less than ideal match to the truth function will always generate U values with a PDF displaced to higher values of U.

Perfect vs. imperfect p(U):

A perfect model gives this

shape PDF

PDF for imperfect model

is ALWAYS displaced

to higher U.

Goodness of model continued

- Because plausible Us are >=0; and because an imperfect model always gives higher U: we prefer to
- generate the survival function for the perfect model;
- that tells us the probability of a perfect model giving us the measured value of Uor higher.
- This procedure is called hypothesis testing.
- Because we make the hypothesis:
- “Suppose our model is correct. What sort of U value should we expect to find?”
- We’ll encounter the technique again next lecture when we turn to enquire if there is any signal at all buried in the noise.

Perfect-model p(U)s:

- If we use the least-squares U (also known as χ2), this is easy, because p(U) is known for this:

where

- Г is the gamma function
- and υ is called the degrees of freedom.
- Note: the PDF has a peak at U~υ.

What are degrees of freedom?

- The easiest way to illustrate what degrees of freedom is, is to try fitting a polynomial of higher and higher order to a set of noisy data.
- The more orders we include, the nearer the model will fit the data, and the smaller the sum of squared residuals (χ2) will be, until…
- when M=N (ie the number of parameters, polynomial orders in this case, equals the number of data points), the model will go through every point exactly. χ2 will equal 0.

Degrees of freedom

- Defined as N-M: number of data points minus number of parameters fitted.
- It is sometimes convenient to define a reduced chi squared
- PDF for χ2reduced should of course peak at about 1.
- There is no advantage in using this for minimization rather than the ‘raw’ χ2.

‘Survival function’ for U.

- Remember the survival function of a PDF is defined as
- For χ2 this is
- where Г written with 2 arguments like this is called the incomplete gamma function:

Download Presentation

Connecting to Server..