Lecture 6

1 / 26

Lecture 6 - PowerPoint PPT Presentation

Lecture 6. Objective functions for model fitting: Sum of squared residuals (=&gt; the ‘method of least squares’). Likelihood Hypothesis testing. Model fitting – reminder of the terminology:. We have data y i at samples of some independent variable x i .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about ' Lecture 6' - asta

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Lecture 6
• Objective functions for model fitting:
• Sum of squared residuals (=> the ‘method of least squares’).
• Likelihood
• Hypothesis testing
Model fitting – reminder of the terminology:
• We have datayi at samples of some independent variable xi.
• The model is our estimate of the parent or truth function.
• Let’s express the model m(xi) as a function of a few parametersθ1, θ2 .. θM.
• Finding the ‘best fit’ model then just means best estimates of the θ. (Bold – shorthand for a list)
• Knowledge of physics informs choice of m, θ.

The parent function –

what we’d like to

find out (but never can,

exactly).

Naive best fit calculation:
• The residuals for a particular model = yi-mi.
• To ‘thread the model through the middle of the noise’, we want the magnitudes of all residuals to be small.
• A reasonable way (not the only way) to achieve this is to define a sum of squared residuals as our objective function:
• Fitting by minimizing this objective function is called the method of least squares. It is extremely common.
• NOTE! This approach IGNORES possible uncerts in x.
But what if the noise is not homogeneous?
• Some bits clearly have higher σ than others.
• This form of U is sometimes called χ2(pronounced kai squared).
• To use it, we need to know the σi.
Simple example: mi = θ1 + θ2si

Model – red is si, green the flat background.

Contour map of Uls.

The data yi:

Truth values!

An even simpler example:
• Last lecture, I noted that there do exist cases in which we can directly invert
• For least squares, this happens if the model is a polynomial function of the parameters θi.
• Expansion of grad U in this case gives a set of M linear equations in the M parameters called the normal equations.
• It is easy to solve these to get the θi.
Simplest example of all: fitting a straight line.
• Called linear regression by the statisticians.
• There is a huge amount of literature on it.
• Normal equations for a line model turn out to be:
• Polynomial is an easy extension to this.
χ2 for Poisson data – possible, but problematic.
• Choose data yi as estimator for σi2?
• No - can have zero values in denominator.
• Choose (evolving) model as estimator for σi2?
• No - gives a biased result.
• Better: Mighell formula
• Unbiased, but no good for goodness-of-fit.
• Use Mighell to fit θ then standard U for “goodness of fit” (GOF).

Mighell K J, Ap J 518, 380 (1999)

Another choice of U: likelihood.
• Likelihood is best illustrated by Poisson data.
• Consider a single Poisson random variable y: its PDF is

where m here plays the role of the expectation value of y.

• We’re used to thinking of this as a function just of one variable, ie y;
• but it is really a function of both y and m.
PDF for y vs likelihood for θ.

Probability p(y|θ) = θye–θ / y!

Likelihood p(y|θ) = θye–θ / y!

The likelihood function.
• Before, we thought “given m, let us apply the PDF to obtain the probability of getting between y and y+dy.”
• Now we are saying “well we know y, we just measured it. We don’t know m. But surely the PDF taken as a function of mindicates the probability density for m.”
• Problems with this:
• Likelihood function is not necessarily normalized, like a ‘proper’ PDF;
• What assurance do we have that the true PDF for m has this shape??
Likelihood continued.
• Usually we have many (N) samples yi. Can we arrive at a single likelihood for all samples taken together?
• (Note that we’ve stopped talking just about Poisson data now – this expression is valid for any form of p.)
• Sometimes easier to deal with the log-likelihoodL:
Likelihood continued
• To get the best-fit model m, we need to maximize the likelihood (or equivalently, the log likelihood).
• If we want an objective function to minimize, it is convenient to choose –L.
• Can show that for Gaussian data, minimizing –L is equivalent to minimizing the variance-weighted sum of squared residuals (=chi squared) given before.
• Proof left as an exercise!
Poissonian/likelihood version of slide 3

Model – red is si, green the flat background.

Map of the joint likelihood L.

The data yi:

What if also errors in xi?
• Tricky… Bayes better in this case.
What next?
• In fitting a model, we want (amplifying a bit on lecture 4):
• The best fit values of the parameters;
• Then we want to know if these values are good enough!
• If not: need to go back to the drawing board and choose a new model.
• If the model passes, want uncertainties in the best-fit parameters.
• (I’ll put this off to a later lecture…)
• Number 1 is accomplished. √
How to tell if our model is correct.
• Supposing our model is absolutely accurate.
• The U value we calculate is, nevertheless, a random variable: each fresh set of data will give rise to a slightly different value of U.
• In other words, U, even in the case of a perfectly accurate model, will have some spread – in fact, like any other random variable, it will have a PDF.
• This PDF is sometimes calculable from first principles (if not, one can do a Monte Carlo to estimate it).
How to tell if our model is correct.
• The procedure is:
• First calculate the PDF for U in the ‘perfect fit’ case;
• From this curve, obtain the value of the PDF at our best-fit value of U;
• If p(Ubest fit) is very small, it is unlikely that our model is correct.
• Note that both χ2 and –L have the property that they cannot be negative.
• A model which is a less than ideal match to the truth function will always generate U values with a PDF displaced to higher values of U.
Perfect vs. imperfect p(U):

A perfect model gives this

shape PDF

PDF for imperfect model

is ALWAYS displaced

to higher U.

Goodness of model continued
• Because plausible Us are >=0; and because an imperfect model always gives higher U: we prefer to
• generate the survival function for the perfect model;
• that tells us the probability of a perfect model giving us the measured value of Uor higher.
• This procedure is called hypothesis testing.
• Because we make the hypothesis:
• “Suppose our model is correct. What sort of U value should we expect to find?”
• We’ll encounter the technique again next lecture when we turn to enquire if there is any signal at all buried in the noise.
Perfect-model p(U)s:
• If we use the least-squares U (also known as χ2), this is easy, because p(U) is known for this:

where

• Г is the gamma function
• and υ is called the degrees of freedom.
• Note: the PDF has a peak at U~υ.
What are degrees of freedom?
• The easiest way to illustrate what degrees of freedom is, is to try fitting a polynomial of higher and higher order to a set of noisy data.
• The more orders we include, the nearer the model will fit the data, and the smaller the sum of squared residuals (χ2) will be, until…
• when M=N (ie the number of parameters, polynomial orders in this case, equals the number of data points), the model will go through every point exactly. χ2 will equal 0.
Degrees of freedom
• Defined as N-M: number of data points minus number of parameters fitted.
• It is sometimes convenient to define a reduced chi squared
• PDF for χ2reduced should of course peak at about 1.
• There is no advantage in using this for minimization rather than the ‘raw’ χ2.
‘Survival function’ for U.
• Remember the survival function of a PDF is defined as
• For χ2 this is
• where Г written with 2 arguments like this is called the incomplete gamma function: