Lecture 5 • Minimization continued • Interpolation & regridding.
Minimization issues: • Robust setting of starting bounds: how to make sure the minimum is within the box. • Once you know that, you can, in the words of Press et al, “hunt it down like a scared rabbit.” • For only 1 parameter, there are algorithms to find a bracket which must contain the minimum (Press et al ch. 10.1). • This is more difficult to arrange in more than 1 dimension because one has then to deal with a bounding surface, possibly of complicated shape, rather than 2 scalar values. • Speed – if you have to find a series of minima in multi-dimensional functions, a good choice of algorithm (and its settings) can give a valuable speed dividend. • Eg the XMM-Newton source-fitting task – fits 7 or more parameters per source – it’s one of the slowest tasks in the pipeline. Could have been a lot worse though with a poor choice of algorithm (it uses Levenberg-Marquardt).
Minimization issues continued: • Constraints. • Stability. • There is often a trade-off between speed and stability. • When to stop – ie convergence tests. • Basically the time to stop is when the step size per iteration is smaller than the dimensions of the uncertainty ellipsoid. • Is it a local minimum or a global one? • Many algorithms will be perfectly happy stopping in a tiny local dip in the function. • Extra intelligence is needed to hunt around and make sure your minimum is the best in the neighbourhood. Annealing algorithms.
Minimization with 1 parameter Can ∂U/∂θ=0 be directly inverted? Do so. QED. no yes Can U’ and U’’be evaluated? Newton-Raphson: find root of U’ no yes Brent’s method
1-parameter methods: • Newton-Raphson method (Press et al ch 9.4). • Finding a minimum in U is the same as finding a root of U’. • Brent’s method (Press et al ch 10.2). This consists of: • Parabolic fit to three samples of U; • Golden-section search when the parabolic fit is not stable.
An example: • A problem in x-ray source detection gave rise to the following requirement: • Find θ that minimizes subject to the constraint that for all k. The bk are known background values and sk ≡sij are known PSF values. See I Stewart (yes, me) A&A (2009) for a detailed description of the problem.
continued... • The minimum is the place at which ∂U/∂θ=0. Ie, θ at the minimum is solution of the equation This can’t (I don’t think) be solved ‘in closed form’ but it is easy to solve via Newton’s method. • Note there are in general MANY solutions of the equation (see how many roots there are in the following graph!) but only one (at most) which satisfies the constraint.
A trick to help the solution: • Newton’s method works best if the function doesn’t curve too wildly. The present hyperbola-like function is not very promising material. • But! If we define a coordinate transform • The result is much more docile:
Bounds and special cases: • A little thought shows that θ is bounded as follows: • Either bound can be taken as a starting value. • Special cases: • If there are no events, there is no solution to the equation: from the physics of the problem one can deduce that θ=-(b/s)min. • If there is just one event, the equation is trivially solvable.
Minimization with >1 parameter • The basic idea is to • Measure the gradient, • then head down-hill. You’re bound to hit at least a local minimum eventually. • But how far should you go at each step, before stopping to sample U and ▼U again? • Suppose you find you gone wayyy too far - have climbed through a gully and higher up the opposite wall than the height you started? • Or suppose you have only crept a few inches down a slope which seems to extend for miles? • Solution: treat each linear advance like a 1-D minimization.
Steepest descent • This method of steepest descent is pretty fool-proof – it will generally get there in the end. • It isn’t very efficient though - you can find yourself engaged in time-consuming ricochets back and forth across a gully. From Press et al, “Numerical Recipes”
Powell’s solution to this: (i) Cycle through each of a set of N directions, finding the minimum in each direction. (i) Discard the direction with the longest step, and replace it with the direction of the vector sum. New direction
Levenberg method: • It’s a bit like Newton’s method – it keeps trying until it converges. • The problem of course is that, like Newton’s method, it may not converge. • Marquardt’s modification is extremely cunning. By means of a single ‘gain’ variable, his algorithm veers between a rapid but undependable Levenberg search and the slower but reliable Steepest Descent. • You get both the hare and the tortoise.
Levenberg-Marquardt • At each iteration, the algorithm solves where I is the identity matrix and λ is a gain parameter. • If the next U is larger than previous, this means the Levenberg assumption was too bold: λ is increased to make the algorithm more like steepest descent (more tortoise). • But if U is smaller than previous, λ is reduced, to make the algorithm more Levenberg-like (more hare).
Interpolation • The difference between interpolation and fitting is the comparison between the number Npar of parameters number Ndata of data points. • If Ndata > Npar, you fit. • If Ndata = Npar, you interpolate. • If Ndata < Npar, you either kill off some parameters... or use Singular Value Decomposition... or resort to Bayesian black magic. • Because interpolation is only loosely constrained by the data (think of a strait jacket with most of the buttons undone), it can be unstable. See eg in a few slides.
An example technique: cubic splines. • FYI: a spline originally was a flexible bit of metal used by boat designers to draw smooth curves between pegs. • The space between any two points is interpolated by a cubic function. • A cubic has 4 parameters; thus you need 4 pieces of information to specify it. • These are obtained from the 4 neighbouring function values. • with special treatment of the ends, where only 3 points are available. • Cubic interpolation matches f, f’ and f” at the points.
Cubic Splines algorithm • See Press et al chapter 3.3 for details. • Diagrammatic definition of the xs and ys: • In the formula which is simplest to calculate, the cubic is heavily disguised: yj+1 yj xj+1 yj-1 yj+2 xj xj-1 xj+2
Cubic Splines algorithm • To evaluate this, obviously we need to calculate A, B, C and D, plus y”j and y”j+1:
Cubic Splines algorithm • All the y”s are obtained at one time by solving a tri-diagonal system of equations specified by: with (at its simplest) y”1 and y”N both set to zero. • For examples of instability....
But splines are Evil! Credit: Andy Read, U Leics
But splines are Evil! Credit: Andy Read, U Leics
Regridding • Suppose you have data at irregularly spaced samples which you want to have on regular samples (ie with the same space between all adjacent samples). • You have to regrid the data – somehow interpolate between the input sample points and resample on the regular grid. • One way is to convolve the samples with a regridding function, then sample the resulting smooth function on the regular grid. • This is often done in radio interferometry. • The reason is to allow a discrete Fourier transform to be used (because it is faster). • Thus, much attention is paid to chosing a regridding function which has a well-behaved Fourier transform.
A trap when regridding data which is already regularly sampled: Original binned data: bin widths are 2 units. Re-binned data: bin widths are 3.5 units. Moiré effect causes a dip every 4th bin (since 4 is the smallest integer n such that nx3.5 is exactly divisible by 2).
Also happens in 2D images. One solution: dithering.