1 / 30

"The Variable Projection method: some applications to Physics"

"The Variable Projection method: some applications to Physics". V. Pereyra Weidlinger Associates 399 El Camino Real #200 Mountain View, CA 94040 Yale University International Workshop on Numerical Analysys and QCD May 1-3, 2007. Plan. Separable Nonlinear Least Squares Problems

gavril
Download Presentation

"The Variable Projection method: some applications to Physics"

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. "The Variable Projection method: some applications to Physics" V. Pereyra Weidlinger Associates 399 El Camino Real #200 Mountain View, CA 94040 Yale University International Workshop on Numerical Analysys and QCD May 1-3, 2007

  2. Plan • Separable Nonlinear Least Squares Problems • The Variable Projection Method • Properties • Applications • Mossbauer Spectroscopy • Nuclear Magnetic Resonance Spectroscopy • Lattice Quantum Chromodynamics • Highly Accurate Approximation of Difficult Functions • Neural Networks • Surrogates • Example Application: Optimal Design of a SupersonicBusiness Jet

  3. Separable Nonlinear Least Squares Problems • In 1973, Gene Golub (Stanford University) and I wrote a paper introducing the method of Variable Projections for the solution of separable nonlinear least squares problems: where the vector 'a' contains the "linear" parameters, 'w' holds the "nonlinear" parameters; 'x' is the vector of input variables and 'y' is the output variable. The pairs (xj,yj) are the data to be fitted in the least squares sense. • Thus, these models: aiI(x;w), are characterized for being linear combinations of nonlinear functions. • This method had a remarkable number and variety of applications in the last 30 years (see references below) and I would like to comment briefly on some of them that may be of interest to this audience. "The differentiation of pseudo-inverses and nonlinear least squares whose variables separate". SIAM J. Numer. Anal. 10:413-432 (1973). "Separable nonlinear least squares: the Variable Projection method and its applications". Inverse Problems 19:R1-R26 (2003).

  4. Variable Projections Method • The basic idea of the Variable Projection method is to solve "analytically" for the linear parameters in order to eliminate them from the problem: a = +(w)y, • + is the pseudoinverse of. • Sinceis the projector on the space orthogonal to the columns of (w), we call this the Variable Projections functional. • The method consists of minimizing the modified problem in the nonlinear parameters 'w' first and then obtaining the linear parameters 'a' using the above formula. This implies that no guessed values are needed for the linear parameters and the problem dimension is diminished. • It turns out that the reduced problem is always better conditioned than the original one and the same iterative method converges faster for it. If properly implemented, the iterations have a similar cost, so the VP method is always faster. • An additional problem that was solved in the original paper comes from the fact that, for the above functional, when using optimization algorithms that require gradients implies that one needs to differentiate the pseudo-inverse of a matrix where the elements are functions of several variables.

  5. A Comparison of Methods (in original paper)Optimization Methods Used in Comparisons • A1: Minimization without derivatives (R. Brent, PRAXIS). • A2: Minimization by Gauss-Newton (G-N) with step control. • A3: Minimization by Marquardt's regularization of G-N. • The method was implemented in a publicly available computer program: VARPRO. • A more modern version was produced at Bell Labs. (Gay and Kaufman) and it is available at www.nanet.org • An interactive, well documented version was produced at the National Bureau of Standards and has been in extensive use during many years.

  6. Applications: Mossbauer Spectroscopy

  7. Nuclear Magnetic Resonance Spectroscopy • In NMR Spectroscopy, a sample is placed in an external magnetic field and a transient field from an RF coil is used to temporarily drive the various nuclei into a non-equilibrium distribution of magnetic spin states. • As the sample relaxes back to its equilibrium distribution, each type of excited nuclei radiates at a characteristic frequency fk. The sum of all these radiations gives rise to a signal of the form: yn = akexpikexp(-dk+2fki)tn+ en. • Generally the frequencies are known (or can be picked in a pre-processing step) and one wants to determine the amplitudes ak and the damping parameters dk. There are a number of other a priori constraints that reduce the number of free parameters. • The problem of fitting this data is separable, with a model that is a linear combination of complex exponentials.

  8. In vivo NMRS and VP • In 1988 van der Veen et al, at Delft University and Phillips Medical Systems published a very influential paper on the accurate quantification of in vivo NMR data using the Variable Projection method. By August 2002 this paper had more than 200 citations and as a result VARPRO became the standard tool in this area. • There is a center in Barcelona, with more than 300 paying customers, where a modern version of this software is located and used routinely in many different applications. J. van der Veen, R. de Beer, P. Luyten, and D. van Ormondt, 'Accurate quantification of in vivo PNMR signals using the Variable Projection method and prior knowledge'. Magnetic Resonance in Medicine6:92-98 (1988).

  9. Lattice Quantum Chromo Dynamics G. T. Fleming, What can lattice QCD theorists learn from NMR spectroscopists? Proceedings of the Third International Workshop on Numerical Analysis and Lattice QCD, Edinburgh, Scotland, 2003.

  10. QCD (continued) • This is again a separable problem involving linear combinations of exponentials. • These problems are ubiquitous and have proven to be very hard, since they are usually ill-conditioned. • The great merit of VP is that the reduced problem is always better conditioned than the original one (see reference). J. Sjoberg and M. Viberg, 'Separable non-linear least squares minimization - possible improvements for neural net fitting'. IEEE Workshop in Neural Networks for Signal Processing. Amelia Island Plantation, FL (1997).

  11. Some Challenging Data Fitting Problems From Gregory Beylkin (UC, Boulder) • Beylkin and Monzon have developed some very accurate ways of approximating difficult functions • Their method combines quadrature formulas, moments, and other techniques, but in some cases the approximants consist of linear combinations of complex exponentials or Gaussians • I show below how VARPRO performs in some of these tasks. G. Beylkin and L. Monzon, " On Approximation of Functions by Exponential Sums" (2004). Applied and Computational Harmonic Analysis 19(2005)

  12. Approximating 1/x in [0.01,1] • By linear combinations of real exponentials; time is on a PC Pentium 4, 1.5 GHz, running LINUX

  13. Approximating Bessel's Function J0 in[0,20] • By a linear combination of 28 damped exponentials (i.e., real part of complex exponentials), using 2000 uniformly chosen samples. Exact and approximate are superposed and undistinguishable.

  14. J0 Fitting Error

  15. Neural Networks: Single Hidden Layer, Fully Connected Perceptron i(x1,x2)=1/(1+exp (w0i+w1ix1+w2ix2)) x1 x2 y(x1,x2) = i(x1,x2;wi) i=1,...,3

  16. Neural Networks (really) • Neural Networks provide universal approximation functions of n variables that are linear combinations of nonlinear functions, such as sigmoids or arctanh: V. Pereyra, G. Scherer and F. Wong, 'Variable Projections Neural Network Training'. Mathematics and Computers in Simulation 73:231-243 (2006).

  17. Training the Network • Training the network means finding the linear and nonlinear parameters in order to best fit (usually in the l2 sense) a set of input and output values, the training set: T = {xs, ys), s=1,...,m, with xs  Rn • So,this is also a separable problem, amenable to VARPRO. • Usually one keeps part of the data separately for testing the resulting Network. • One problem is to decide how many basis functions are necessary for a given accuracy. We choose to run repeatedly with an increasing number of nodes until we detect overfitting. • This process is sometimes refered to as "learning by example".

  18. Surrogates • In applications (such as optimal design) in which one wants to evaluate many times an expensive function (say, the product of a large scale simulation), it is convenient to first create a data base of inputs/outputs and use it to generate a faithful approximation that can be evaluated rapidly. • These approximations are called response surfaces, proxies or surrogates and the process is called a fast model evaluation. • Neural Networks can be used to provide a general surrogate, given a data base of inputs and outputs that can be used to train and test the network.

  19. Example Design Task • Design requirements for a supersonic aircraft (objectives and constraints in red) • 8,000 lb payload • Cruise Mach number = 1.8 • Take-off and landing field lengths < 6,000 ft • Ultimate load factor = 2.5 for pull-up maneuver • 2 modern turbofan engines • Design parameterization • Wing parameters: sweep, aspect ratio, position, reference area • Mission parameters: Maximum Take-Off Gross Weight, Initial and final cruise altitudes

  20. Example Design Task (Cont’d) • Multi-objective optimization problem. By varying the parameters of the problem, simultaneously: • Minimize the maximum sonic boom perceived loudness (in dBA with rise time modification) • Maximize the range (in nmi) of the aircraft • Mission module accounts for cruise, subsonic climb, supersonic acceleration and take-off and landing segments. • Multiphysics simulation: • Aerodynamics • Aeroelasticity (structural analysis) • Control

  21. Inputs: ----------------------- Wing reference area, Wing aspect ratio, Longitudinal position of the wing, Wing sweep angle, Lift coefficient at initial cruise, Lift coefficient at final cruise, Altitude at initial cruise, Altitude at final cruise. Outputs ---------------------------------- Drag coefficient at initial cruise, Sonic boom initial rise at initial cruise, Sonic boom sound level at initial cruise w/out rise time modification, Sonic boom sound level at initial cruise w/ first type rise time modification, Sonic boom sound level at initial cruise w/ third type rise time modification, Drag coefficient at final cruise, Sonic boom initial rise at final cruise, Sonic boom sound level at final cruise w/out rise time modification, Sonic boom sound level at final cruise w/ first type rise time modification, Sonic boom sound level at final cruise w/ third type rise time modification. Surrogate Functions

  22. Results of Training for Output #3 • The NN results for output #3, "Sonic boom sound level at initial cruise w/out rise time modification" are shown in the next table. • We look for an increase in either the RMS of the training or testing set when increasing the number of nodes, but also monitor the maximum error and the values of the linear coefficients of the network, for a good quality fit. • The values of the output values are in the range 85-95. We scale the input variables to [-1, 1]. • 40 random initial vectors of nonlinear parameters are chosen, and VARPRO is run for each one of them, since the problem is non-convex. The solution with the smallest rms is selected.

  23. Results of Training for Output #3 • Results on a 3 GHz Intel Xeon 64 bits dual processor

  24. Visualization • An unresolved problem (as far as I know) is the meaningful visualization of functions of many variables (say more than 5!, and in this case 8). • In the next slide we show some views of the response surface #3 (sonic boom sound level at initial cruise) as a function of pairs of input variables, keeping the others fixed. This a notoriously nonlinear function with sharp discontinuities (shock fronts). • The NN chosen has 10 sigmoids (i.e., 90 nonlinear parameters and 11 linear ones).

  25. We recall ... • Multi-objective optimization problem. By varying the parameters of the problem, simultaneously: • Minimize the maximum sonic boom perceived loudness (in dBA with rise time modification) • Maximize the range (in nmi) of the aircraft • The take off and landing length are restricted: Take-off and landing field lengths < 6,000 ft

  26. Optimization • We use an evolutionary algorithm to get Pareto equilibrium points for the bi-objective optimization problem. • The costly aerodynamics and boom calculations are replaced by the surrogate functions • We use Deb's algorithm that tends to locate points in the boundary of the Pareto set. • These point are then analyzed and an appropriate compromise is chosen. • This compromise design is then verified with the high fidelity tools. • Some new algorithms under development promise to be two orders of magnitude faster V. Pereyra,Fast Computation of Equispaced Pareto Manifolds and Pareto Fronts for Multi-objective Optimization Problems. To appear in Mathematics and Computers in Simulation, 2007. Deb, K., Pratap, A., Agrawal, S., & and Meyarivan T., A Fast and elitist multi-objective genetic algorithm: NSGA-II. Technical Report No. 2000001. Kanpur: Indian Institute of Technology Kanpur, India, 2000. J. J. Alonso, P. LeGresley and V. Pereyra, 'Aircraft design optimization'. To appear in Mathematics and Computers in Simulation (2007).

  27. Pareto Fronts • Open circles (64): initial population; only blue circles are feasible; • Asterisks: Pareto equilibrium front (blue for feasible designs).

  28. By the way … • This paper has an early proof of the convergence of nested iterations. It gives the recipe for quadratic convergence of inexact Newton for large problems, where one uses an inner iterative method to solve the linear systems. This is usually attributed to R.S DEMBO, S.C. EISENSTAT, and T. STEIHAUG. Inexact Newton methods. SIAM J. Numer. Anal., 19:400--408, 1982.

  29. And by the way … • This contains an stable O(n**2) method for solving (square) Vandermonde systems.

More Related