1 / 26

# Lecture 6 - PowerPoint PPT Presentation

Lecture 6. Objective functions for model fitting: Sum of squared residuals (=> the ‘method of least squares’). Likelihood Hypothesis testing. Model fitting – reminder of the terminology:. We have data y i at samples of some independent variable x i .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Lecture 6' - asta

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

• Objective functions for model fitting:

• Sum of squared residuals (=> the ‘method of least squares’).

• Likelihood

• Hypothesis testing

• We have datayi at samples of some independent variable xi.

• The model is our estimate of the parent or truth function.

• Let’s express the model m(xi) as a function of a few parametersθ1, θ2 .. θM.

• Finding the ‘best fit’ model then just means best estimates of the θ. (Bold – shorthand for a list)

• Knowledge of physics informs choice of m, θ.

The parent function –

what we’d like to

find out (but never can,

exactly).

• The residuals for a particular model = yi-mi.

• To ‘thread the model through the middle of the noise’, we want the magnitudes of all residuals to be small.

• A reasonable way (not the only way) to achieve this is to define a sum of squared residuals as our objective function:

• Fitting by minimizing this objective function is called the method of least squares. It is extremely common.

• NOTE! This approach IGNORES possible uncerts in x.

• Some bits clearly have higher σ than others.

• This form of U is sometimes called χ2(pronounced kai squared).

• To use it, we need to know the σi.

Simple example: mi = θ1 + θ2si

Model – red is si, green the flat background.

Contour map of Uls.

The data yi:

Truth values!

• Last lecture, I noted that there do exist cases in which we can directly invert

• For least squares, this happens if the model is a polynomial function of the parameters θi.

• Expansion of grad U in this case gives a set of M linear equations in the M parameters called the normal equations.

• It is easy to solve these to get the θi.

• Called linear regression by the statisticians.

• There is a huge amount of literature on it.

• Normal equations for a line model turn out to be:

• Polynomial is an easy extension to this.

χ2 for Poisson data – possible, but problematic.

• Choose data yi as estimator for σi2?

• No - can have zero values in denominator.

• Choose (evolving) model as estimator for σi2?

• No - gives a biased result.

• Better: Mighell formula

• Unbiased, but no good for goodness-of-fit.

• Use Mighell to fit θ then standard U for “goodness of fit” (GOF).

Mighell K J, Ap J 518, 380 (1999)

Another choice of U: likelihood.

• Likelihood is best illustrated by Poisson data.

• Consider a single Poisson random variable y: its PDF is

where m here plays the role of the expectation value of y.

• We’re used to thinking of this as a function just of one variable, ie y;

• but it is really a function of both y and m.

PDF for y vs likelihood for θ.

Probability p(y|θ) = θye–θ / y!

Likelihood p(y|θ) = θye–θ / y!

• Before, we thought “given m, let us apply the PDF to obtain the probability of getting between y and y+dy.”

• Now we are saying “well we know y, we just measured it. We don’t know m. But surely the PDF taken as a function of mindicates the probability density for m.”

• Problems with this:

• Likelihood function is not necessarily normalized, like a ‘proper’ PDF;

• What assurance do we have that the true PDF for m has this shape??

• Usually we have many (N) samples yi. Can we arrive at a single likelihood for all samples taken together?

• (Note that we’ve stopped talking just about Poisson data now – this expression is valid for any form of p.)

• Sometimes easier to deal with the log-likelihoodL:

• To get the best-fit model m, we need to maximize the likelihood (or equivalently, the log likelihood).

• If we want an objective function to minimize, it is convenient to choose –L.

• Can show that for Gaussian data, minimizing –L is equivalent to minimizing the variance-weighted sum of squared residuals (=chi squared) given before.

• Proof left as an exercise!

Model – red is si, green the flat background.

Map of the joint likelihood L.

The data yi:

• Tricky… Bayes better in this case.

• In fitting a model, we want (amplifying a bit on lecture 4):

• The best fit values of the parameters;

• Then we want to know if these values are good enough!

• If not: need to go back to the drawing board and choose a new model.

• If the model passes, want uncertainties in the best-fit parameters.

• (I’ll put this off to a later lecture…)

• Number 1 is accomplished. √

• Supposing our model is absolutely accurate.

• The U value we calculate is, nevertheless, a random variable: each fresh set of data will give rise to a slightly different value of U.

• In other words, U, even in the case of a perfectly accurate model, will have some spread – in fact, like any other random variable, it will have a PDF.

• This PDF is sometimes calculable from first principles (if not, one can do a Monte Carlo to estimate it).

• The procedure is:

• First calculate the PDF for U in the ‘perfect fit’ case;

• From this curve, obtain the value of the PDF at our best-fit value of U;

• If p(Ubest fit) is very small, it is unlikely that our model is correct.

• Note that both χ2 and –L have the property that they cannot be negative.

• A model which is a less than ideal match to the truth function will always generate U values with a PDF displaced to higher values of U.

A perfect model gives this

shape PDF

PDF for imperfect model

is ALWAYS displaced

to higher U.

• Because plausible Us are >=0; and because an imperfect model always gives higher U: we prefer to

• generate the survival function for the perfect model;

• that tells us the probability of a perfect model giving us the measured value of Uor higher.

• This procedure is called hypothesis testing.

• Because we make the hypothesis:

• “Suppose our model is correct. What sort of U value should we expect to find?”

• We’ll encounter the technique again next lecture when we turn to enquire if there is any signal at all buried in the noise.

Perfect-model p(U)s:

• If we use the least-squares U (also known as χ2), this is easy, because p(U) is known for this:

where

• Г is the gamma function

• and υ is called the degrees of freedom.

• Note: the PDF has a peak at U~υ.

• The easiest way to illustrate what degrees of freedom is, is to try fitting a polynomial of higher and higher order to a set of noisy data.

• The more orders we include, the nearer the model will fit the data, and the smaller the sum of squared residuals (χ2) will be, until…

• when M=N (ie the number of parameters, polynomial orders in this case, equals the number of data points), the model will go through every point exactly. χ2 will equal 0.

• Defined as N-M: number of data points minus number of parameters fitted.

• It is sometimes convenient to define a reduced chi squared

• PDF for χ2reduced should of course peak at about 1.

• There is no advantage in using this for minimization rather than the ‘raw’ χ2.

• Remember the survival function of a PDF is defined as

• For χ2 this is

• where Г written with 2 arguments like this is called the incomplete gamma function: