Hanso and hifoo two new matlab codes
1 / 41

HANSO and HIFOO Two New MATLAB Codes - PowerPoint PPT Presentation

  • Uploaded on

HANSO and HIFOO Two New MATLAB Codes. Michael L. Overton Courant Institute of Mathematical Sciences New York University Singapore, Jan 2006. Two New Codes. HANSO: Hybrid Algorithm for Nonsmooth Optimization

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'HANSO and HIFOO Two New MATLAB Codes' - JasminFlorian

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Hanso and hifoo two new matlab codes l.jpg


Michael L. Overton

Courant Institute of Mathematical Sciences

New York University

Singapore, Jan 2006

Two new codes l.jpg
Two New Codes

  • HANSO: Hybrid Algorithm for Nonsmooth Optimization

    • Aimed at finding local minimizers of general nonsmooth, nonconvex optimization problems

    • Joint with Jim Burke and Adrian Lewis

  • HIFOO: H-Infinity Fixed-Order Optimization

    • Aimed at solving specific nonsmooth, nonconvex optimization problems arising in control

    • Built on HANSO

    • Joint with Didier Henrion

Many optimization objectives are l.jpg
Many optimization objectives are

  • Nonconvex

  • Nonsmooth

  • Continuous

  • Differentiable almost everywhere

  • With gradients often available at little additional cost beyond that of computing function

  • Subdifferentially regular

  • Sometimes, non-Lipschitz

Numerical optimization of nonsmooth nonconvex functions l.jpg
Numerical Optimization of Nonsmooth, Nonconvex Functions

  • Steepest descent: jams

  • Bundle methods: better, but these are mainly intended for nonsmooth convex functions

  • We developed a simple method for nonsmooth, nonconvex minimization based on Gradient Sampling

  • Intended for continuous functions that are differentiable almost everywhere, and for which the gradient can be easily computed when it is defined

  • User need only write routine to return function value and gradient – and need not worry about nondifferentiable cases, e.g., ties for a max when coding the gradient

  • Very recently, found BFGS is often very effective when implemented correctly - and much less expensive

Slide5 l.jpg

  • Standard quasi-Newton method for optimizing smooth functions, using gradient differences to update an inverse Hessian matrix H, defining a local quadratic model of the objective function

  • Conventional wisdom: jams on nonsmooth functions

  • Amazingly, works very well as long as implemented right (weak Wolfe line search is essential)

  • It builds an excellent, extremely ill-conditioned quadratic model

  • Often, runs until cond(H) is 1016 before breaking down, taking steplengths of around 1

  • Often converges linearly (not superlinearly)

  • By contrast, steepest descent and Newton’s method usually jam

  • Not very reliable, and no convergence theory

Minimizing a product of eigenvalues l.jpg
Minimizing a Product of Eigenvalues

  • An application due to Anstreicher and Lee

  • Min log of product of K largest eigenvalues of AoX

    subject to X positive semidefinite and diag(X)=1

    (A, X symmetric) (we usually set K=N/2)

  • Would be equivalent to SDP if replace product by sum

  • Number of variables is n = N(N-1)/2 where N is dimension of X

  • Following Burer, substitute Y Y’ for X where Y is N by r so n is reduced to rN and normalize Y so its rows have norm 1: thus both constraints eliminated

  • Typically objective is not smooth at minimizer, because the K-th largest eigenvalue is multiple

  • Results for N=10 and rank r = 2 through 10…

N 10 r 6 eigenvalues of optimal a o x l.jpg
N=10, r=6: eigenvalues of optimal AoX











The u and v spaces l.jpg
The U and V Spaces

  • The condition number of H, the BFGS approximation to inverse Hessian, typically reaches 1016 !!

  • Let’s look at eigenvalues of H:

    H = QDQ* (* denotes transpose)

    Search direction is d = -Hg= -QDQ*g (where g = gradient)

  • Next plot shows

    diag(D), sorted into ascending order

    Components of |Q*g|, using same ordering

    Components of |DQ*g|, using same ordering (search direction expanded in eigenvector basis)

  • Tiny eigenvalues of H correspond to “nonsmooth” V-space

  • Other eigenvalues of H correspond to “smooth” U-space

  • From matrix theory, dim(V-space) = m(m+1)/2 – 1, where m is multiplicity of Kth eigenvalue of AoX

More on the u and v spaces l.jpg
More on the U and V Spaces

  • H seems to be an excellent approximation to the “limiting inverse Hessian” and evidently gives us bases for the optimal U and V spaces

  • Let’s look at plots of f along random directions in the U and V spaces, as defined by eigenvectors of H, passing through minimizer (for the N=10, r=6 example)

Line search for bfgs l.jpg
Line Search for BFGS

  • Sufficient decrease:

    for 0 < c1< 1,

    f (x + td) ≤ f (x) + c1 t(grad f)(x)*d

  • Weak Wolfe condition on derivative

    for c1< c2 < 1,

    (grad f)(x + td)*d ≥ c2 (grad f)(x)*d

  • Not strong Wolfe condition on derivative

    | (grad f)(x + t d)*d | ≤ - c2 (grad f)(x)*d

  • Essential when f is nonsmooth

  • No reason to use strong Wolfe even if f is smooth

  • Much simpler to implement

  • Why ever use strong Wolfe?

But how to check if final x is locally optimal l.jpg
But how to check if final x is locally optimal?

  • Extreme ill-conditioning of H is a strong clue, but is expensive to compute and proves nothing

  • If x is locally optimal, running a local bundle method from x establishes nonsmooth stationarity: this involves repeated null-step line searches and building a bundle of gradients obtained via the line searches

  • When line search returns null step, it must also return gradient at a point lying across the discontinuity

  • Then new d = – arg min { ||d||: dÎconvG }, where G is the set of gradients in the bundle

  • Terminate if ||d|| is small

  • If ||d|| is 0 and line search steps were 0, x is Clarke stationary; more realistically it is “close” to Clarke stationary

  • I’ll call this Clarke Quay Stationary since it is a Quay Idea

First order local optimality l.jpg
First-order local optimality

  • Is this implied by Clarke stationarity of f at x ?

  • No, for example f(x) = -|x|is Clarke stationary at 0

  • Yes, in sense that f’(x; d) ≥ 0 for all directions d, when f is regular at x (subdifferentially regular, Clarke regular)

  • Most of the functions that we have studied areregular everywhere, although this is sometimes hard to prove

  • Regularity generalizes smoothness and convexity

Harder problems l.jpg
Harder Problems

  • Eigenvalue product problems are interesting, but the Lipschitz constant L for f is not large

  • Harder problems:

    • Chebyshev Exponential Approximation

    • Optimization of Distance to Instability

    • Pseudospectral Abscissa Optimization (arbitarily large L)

    • Spectral Abscissa Optimization (not even Lipschitz)

  • In all cases, BFGS turns out to be substantially less reliable than gradient sampling

Gradient sampling algorithm initialize e and x repeat l.jpg
Gradient Sampling Algorithm Initialize e and x. Repeat

  • Get G, a set of gradients of function f evaluated at x and at points near x(sampling controlled by e)

  • Let d = – arg min { ||d||: dÎconvG }

  • Line search: replace x by x+ t d, with f(x + t d) < f(x)(if d is not 0)

until d = 0(or ||d|| £ tol)

Then reduce e and repeat.

Convergence theory for gradient sampling method siopt 2005 l.jpg
Convergence Theory for Gradient Sampling Method (SIOPT, 2005)

  • Suppose

    • f is locally Lipschitz and coercive

    • f is continuously differentiable on an open dense subset of its domain

    • number of gradients sampled near each iterate is greater than problem dimension

    • line search uses an appropriate sufficient decrease condition

  • Then, with probability one and for fixed sampling parameter e, algorithm generates a sequence of points with a cluster point xthat is e-Clarke stationary

  • If f has a unique Clarke stationary point x, then the set of all cluster points generated by the algorithm converges to x as e is reduced to zero

  • Kiwiel has already improved on this! (For a slightly different, as yet untested, version of the algorithm.)

Hanso l.jpg
HANSO 2005)

  • User provides routine to compute f and its gradient at any given x (do not worry about nonsmooth cases)

  • BFGS is then run from many randomly generated or user-provided starting points. Quit if gradient at best point found is small.

  • Otherwise, a local bundle method is run from best point found, attempting to verify local stationarity.

  • If this is not successful, gradient sampling is initiated.

  • Regardless, return a final x along with a set of nearby points, the corresponding gradients, and the d which is the smallest vector in the convex hull of these gradients

  • If the nearby points are near enough and d is small enough, f is Clarke Quay Stationary at x

  • This is an approximate first-order local optimality condition if f is regular at x

Optimization in control l.jpg
Optimization in Control 2005)

  • Often, not many variables, as in low-order controller design

  • Reasons:

    • engineers like simple controllers

    • nonsmooth, nonconvex optimization problems in even a few variables can be very difficult

  • Sometimes, more variables are added to the problem in an attempt to make it more tractable

  • Example: adding a Lyapunov matrix variable P and imposing stability on A by AP + PA*£ 0

  • When A depends linearly on the original parameters, this results in a bilinear matrix inequality (BMI), which is typically difficult to solve

  • Our approach: tackle the original problem

  • Looking for locally optimal solutions

H norm of a transfer function l.jpg
H 2005) Norm of a Transfer Function

  • Transfer function G(s)=C (sI – A)-1 B + D

  • SS representation

  • Hnorm is  if A is not stable (stable means all eigenvalues have negative real part) and otherwise, sup||G(s)||2 over all imaginary s

  • Standard way to compute it is norm(SS,inf) in Matlab control toolbox

  • LINORM from SLICOT is much faster

H fixed order controller design l.jpg
H 2005) Fixed-Order Controller Design

  • Choose matrices a, b, c, d to minimize the H norm of the transfer function

  • The dimension of a is k by k (order, between 0 and n=dim(A))

  • The dimension of b is k by p (number of system inputs)

  • The dimension of c is m (number of system outputs) by k

  • The dimension of d is m by p

  • Total number of variables is (k+m)(k+p)

H fixed order controller design32 l.jpg
H 2005) Fixed-Order Controller Design

  • Choose matrices a, b, c, d to minimize the H norm of the transfer function

  • The dimension of a is k by k (order, between 0 and n=dim(A))

  • The dimension of b is k by p (number of system inputs)

  • The dimension of c is m (number of system outputs) by k

  • The dimension of d is m by p

  • Total number of variables is (k+m)(k+p)

  • The case k=0 is static output feedback

H fixed order controller design33 l.jpg
H 2005) Fixed-Order Controller Design

  • Choose matrices a, b, c, d to minimize the H norm of the transfer function

  • The dimension of a is k by k (order, between 0 and n=dim(A))

  • The dimension of b is k by p (number of system inputs)

  • The dimension of c is m (number of system outputs) by k

  • The dimension of d is m by p

  • Total number of variables is (k+m)(k+p)

  • The case k=0 is static output feedback

  • When B1,C1 are empty, all I/O channels are in performance measure

Hifoo h fixed order optimization l.jpg
HIFOO: H 2005) Fixed-Order Optimization

  • Aims to find a, b, c, d for which H norm is locally optimal

  • Begins by minimizing the spectral abscissa max(real(eig(A-block))) until finds a, b, c, d for which A-block is stable (and therefore H norm is finite)

  • Then locally minimizes H norm

  • Calls HANSO to carry out both optimizations

  • Alternative objectives:

    • Optimize the spectral abscissa instead of quitting when stable

    • Optimize the pseudospectral abscissa

    • Optimize the distance to instability (complex stability radius)

  • The output a,b,c,d can then be input to optimize for larger order k, which cannot give worse result

  • Accepts various input formats

Hifoo provides the gradients l.jpg
HIFOO Provides the Gradients 2005)

  • For H norm, it combines

    • left and right singular vector info at point on imaginary axis where sup achieved

    • chain rule

  • For spectral abscissa max(real(eig)), it combines

    • left and right eigenvector info for eigenvalue achieving max real part

    • chain rule

  • Do not have to worry about ties

  • Function is continuous, but gradient is not

  • However, function is differentiable almost everyhere (and virtually everywhere that it is evaluated)

  • Typically, not differentiable at an exact minimizer

Benchmark examples ac suite l.jpg
Benchmark Examples: AC Suite 2005)

  • Made extensive runs for the AC suite of 18 aircraft control problems in Leibfritz’ COMPLeIB

  • In many cases we found low-order controllers that had smaller H norm than was supposedly possible according to the standard full-order controller MATLAB routine HINFSYN!

  • Sometimes this was even the case when the order was set to 0 (static output feedback)

  • After I announced this at the SIAM Control meeting in July, HINFSYN was extensively debugged and a new version has been released

Benchmark examples mass spring l.jpg
Benchmark Examples: Mass-Spring 2005)

  • A well known simple example with n=4

  • Optimizing spectral abscissa:

    • Order 1, we obtain 0, known to be optimal value

    • Order 2, we obtain about -0.73, previously conjectured to be about -0.5

Benchmarks belgian chocolate challenge l.jpg
Benchmarks: 2005)Belgian Chocolate Challenge

  • Simply stated stabilization problem due to Blondel

  • From 1994 to 2000, not known if solvable by any order controller

  • In 2000, an order 11 controller was discovered

  • We found an order 3 controller, and using our analytical results, proved that it is locally optimal

Availability of hanso and hifoo l.jpg
Availability of HANSO and HIFOO 2005)

  • Version 0.9, documented in a paper submitted to ROCOND 2006, freely available at


  • Version 0.91, available online but not fully tested so no public link yet

  • Version 0.92 will have further enhancements and we hope to announce it widely in a month or two

Some relevant publications l.jpg
Some Relevant Publications 2005)

  • A Robust Gradient Sampling Algorithm for Nonsmooth, Nonconvex Optimization

    • SIOPT, 2005 (with Burke, Lewis)

  • Approximating Subdifferentials by Random Sampling of Gradients

    • MOR, 2002 (with Burke, Lewis)

  • Stabilization via Nonsmooth, Nonconvex Optimization

    • submitted to IEEE Trans-AC (with Burke, Henrion, Lewis)

  • HANSO: A Hybrid Algorithm for Non-Smooth Optimization Based on BFGS, Bundle and Gradient Sampling

    • in planning stage


Gong xi fa cai gung hei fat choy l.jpg

恭喜发财 2005)

Gong xi fa cai!

Gung hei fat choy!