hanso and hifoo two new matlab codes l.
Skip this Video
Loading SlideShow in 5 Seconds..
HANSO and HIFOO Two New MATLAB Codes PowerPoint Presentation
Download Presentation

Loading in 2 Seconds...

play fullscreen
1 / 41

HANSO and HIFOO Two New MATLAB Codes - PowerPoint PPT Presentation

  • Uploaded on

HANSO and HIFOO Two New MATLAB Codes. Michael L. Overton Courant Institute of Mathematical Sciences New York University Singapore, Jan 2006. Two New Codes. HANSO: Hybrid Algorithm for Nonsmooth Optimization

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'HANSO and HIFOO Two New MATLAB Codes' - JasminFlorian

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
hanso and hifoo two new matlab codes


Michael L. Overton

Courant Institute of Mathematical Sciences

New York University

Singapore, Jan 2006

two new codes
Two New Codes
  • HANSO: Hybrid Algorithm for Nonsmooth Optimization
    • Aimed at finding local minimizers of general nonsmooth, nonconvex optimization problems
    • Joint with Jim Burke and Adrian Lewis
  • HIFOO: H-Infinity Fixed-Order Optimization
    • Aimed at solving specific nonsmooth, nonconvex optimization problems arising in control
    • Built on HANSO
    • Joint with Didier Henrion
many optimization objectives are
Many optimization objectives are
  • Nonconvex
  • Nonsmooth
  • Continuous
  • Differentiable almost everywhere
  • With gradients often available at little additional cost beyond that of computing function
  • Subdifferentially regular
  • Sometimes, non-Lipschitz
numerical optimization of nonsmooth nonconvex functions
Numerical Optimization of Nonsmooth, Nonconvex Functions
  • Steepest descent: jams
  • Bundle methods: better, but these are mainly intended for nonsmooth convex functions
  • We developed a simple method for nonsmooth, nonconvex minimization based on Gradient Sampling
  • Intended for continuous functions that are differentiable almost everywhere, and for which the gradient can be easily computed when it is defined
  • User need only write routine to return function value and gradient – and need not worry about nondifferentiable cases, e.g., ties for a max when coding the gradient
  • Very recently, found BFGS is often very effective when implemented correctly - and much less expensive
  • Standard quasi-Newton method for optimizing smooth functions, using gradient differences to update an inverse Hessian matrix H, defining a local quadratic model of the objective function
  • Conventional wisdom: jams on nonsmooth functions
  • Amazingly, works very well as long as implemented right (weak Wolfe line search is essential)
  • It builds an excellent, extremely ill-conditioned quadratic model
  • Often, runs until cond(H) is 1016 before breaking down, taking steplengths of around 1
  • Often converges linearly (not superlinearly)
  • By contrast, steepest descent and Newton’s method usually jam
  • Not very reliable, and no convergence theory
minimizing a product of eigenvalues
Minimizing a Product of Eigenvalues
  • An application due to Anstreicher and Lee
  • Min log of product of K largest eigenvalues of AoX

subject to X positive semidefinite and diag(X)=1

(A, X symmetric) (we usually set K=N/2)

  • Would be equivalent to SDP if replace product by sum
  • Number of variables is n = N(N-1)/2 where N is dimension of X
  • Following Burer, substitute Y Y’ for X where Y is N by r so n is reduced to rN and normalize Y so its rows have norm 1: thus both constraints eliminated
  • Typically objective is not smooth at minimizer, because the K-th largest eigenvalue is multiple
  • Results for N=10 and rank r = 2 through 10…
n 10 r 6 eigenvalues of optimal a o x
N=10, r=6: eigenvalues of optimal AoX











the u and v spaces
The U and V Spaces
  • The condition number of H, the BFGS approximation to inverse Hessian, typically reaches 1016 !!
  • Let’s look at eigenvalues of H:

H = QDQ* (* denotes transpose)

Search direction is d = -Hg= -QDQ*g (where g = gradient)

  • Next plot shows

diag(D), sorted into ascending order

Components of |Q*g|, using same ordering

Components of |DQ*g|, using same ordering (search direction expanded in eigenvector basis)

  • Tiny eigenvalues of H correspond to “nonsmooth” V-space
  • Other eigenvalues of H correspond to “smooth” U-space
  • From matrix theory, dim(V-space) = m(m+1)/2 – 1, where m is multiplicity of Kth eigenvalue of AoX
more on the u and v spaces
More on the U and V Spaces
  • H seems to be an excellent approximation to the “limiting inverse Hessian” and evidently gives us bases for the optimal U and V spaces
  • Let’s look at plots of f along random directions in the U and V spaces, as defined by eigenvectors of H, passing through minimizer (for the N=10, r=6 example)
line search for bfgs
Line Search for BFGS
  • Sufficient decrease:

for 0 < c1< 1,

f (x + td) ≤ f (x) + c1 t(grad f)(x)*d

  • Weak Wolfe condition on derivative

for c1< c2 < 1,

(grad f)(x + td)*d ≥ c2 (grad f)(x)*d

  • Not strong Wolfe condition on derivative

| (grad f)(x + t d)*d | ≤ - c2 (grad f)(x)*d

  • Essential when f is nonsmooth
  • No reason to use strong Wolfe even if f is smooth
  • Much simpler to implement
  • Why ever use strong Wolfe?
but how to check if final x is locally optimal
But how to check if final x is locally optimal?
  • Extreme ill-conditioning of H is a strong clue, but is expensive to compute and proves nothing
  • If x is locally optimal, running a local bundle method from x establishes nonsmooth stationarity: this involves repeated null-step line searches and building a bundle of gradients obtained via the line searches
  • When line search returns null step, it must also return gradient at a point lying across the discontinuity
  • Then new d = – arg min { ||d||: dÎconvG }, where G is the set of gradients in the bundle
  • Terminate if ||d|| is small
  • If ||d|| is 0 and line search steps were 0, x is Clarke stationary; more realistically it is “close” to Clarke stationary
  • I’ll call this Clarke Quay Stationary since it is a Quay Idea
first order local optimality
First-order local optimality
  • Is this implied by Clarke stationarity of f at x ?
  • No, for example f(x) = -|x|is Clarke stationary at 0
  • Yes, in sense that f’(x; d) ≥ 0 for all directions d, when f is regular at x (subdifferentially regular, Clarke regular)
  • Most of the functions that we have studied areregular everywhere, although this is sometimes hard to prove
  • Regularity generalizes smoothness and convexity
harder problems
Harder Problems
  • Eigenvalue product problems are interesting, but the Lipschitz constant L for f is not large
  • Harder problems:
    • Chebyshev Exponential Approximation
    • Optimization of Distance to Instability
    • Pseudospectral Abscissa Optimization (arbitarily large L)
    • Spectral Abscissa Optimization (not even Lipschitz)
  • In all cases, BFGS turns out to be substantially less reliable than gradient sampling
gradient sampling algorithm initialize e and x repeat
Gradient Sampling Algorithm Initialize e and x. Repeat
  • Get G, a set of gradients of function f evaluated at x and at points near x(sampling controlled by e)
  • Let d = – arg min { ||d||: dÎconvG }
  • Line search: replace x by x+ t d, with f(x + t d) < f(x)(if d is not 0)

until d = 0(or ||d|| £ tol)

Then reduce e and repeat.

convergence theory for gradient sampling method siopt 2005
Convergence Theory for Gradient Sampling Method (SIOPT, 2005)
  • Suppose
    • f is locally Lipschitz and coercive
    • f is continuously differentiable on an open dense subset of its domain
    • number of gradients sampled near each iterate is greater than problem dimension
    • line search uses an appropriate sufficient decrease condition
  • Then, with probability one and for fixed sampling parameter e, algorithm generates a sequence of points with a cluster point xthat is e-Clarke stationary
  • If f has a unique Clarke stationary point x, then the set of all cluster points generated by the algorithm converges to x as e is reduced to zero
  • Kiwiel has already improved on this! (For a slightly different, as yet untested, version of the algorithm.)
  • User provides routine to compute f and its gradient at any given x (do not worry about nonsmooth cases)
  • BFGS is then run from many randomly generated or user-provided starting points. Quit if gradient at best point found is small.
  • Otherwise, a local bundle method is run from best point found, attempting to verify local stationarity.
  • If this is not successful, gradient sampling is initiated.
  • Regardless, return a final x along with a set of nearby points, the corresponding gradients, and the d which is the smallest vector in the convex hull of these gradients
  • If the nearby points are near enough and d is small enough, f is Clarke Quay Stationary at x
  • This is an approximate first-order local optimality condition if f is regular at x
optimization in control
Optimization in Control
  • Often, not many variables, as in low-order controller design
  • Reasons:
    • engineers like simple controllers
    • nonsmooth, nonconvex optimization problems in even a few variables can be very difficult
  • Sometimes, more variables are added to the problem in an attempt to make it more tractable
  • Example: adding a Lyapunov matrix variable P and imposing stability on A by AP + PA*£ 0
  • When A depends linearly on the original parameters, this results in a bilinear matrix inequality (BMI), which is typically difficult to solve
  • Our approach: tackle the original problem
  • Looking for locally optimal solutions
h norm of a transfer function
H Norm of a Transfer Function
  • Transfer function G(s)=C (sI – A)-1 B + D
  • SS representation
  • Hnorm is  if A is not stable (stable means all eigenvalues have negative real part) and otherwise, sup||G(s)||2 over all imaginary s
  • Standard way to compute it is norm(SS,inf) in Matlab control toolbox
  • LINORM from SLICOT is much faster
h fixed order controller design
H Fixed-Order Controller Design
  • Choose matrices a, b, c, d to minimize the H norm of the transfer function
  • The dimension of a is k by k (order, between 0 and n=dim(A))
  • The dimension of b is k by p (number of system inputs)
  • The dimension of c is m (number of system outputs) by k
  • The dimension of d is m by p
  • Total number of variables is (k+m)(k+p)
h fixed order controller design32
H Fixed-Order Controller Design
  • Choose matrices a, b, c, d to minimize the H norm of the transfer function
  • The dimension of a is k by k (order, between 0 and n=dim(A))
  • The dimension of b is k by p (number of system inputs)
  • The dimension of c is m (number of system outputs) by k
  • The dimension of d is m by p
  • Total number of variables is (k+m)(k+p)
  • The case k=0 is static output feedback
h fixed order controller design33
H Fixed-Order Controller Design
  • Choose matrices a, b, c, d to minimize the H norm of the transfer function
  • The dimension of a is k by k (order, between 0 and n=dim(A))
  • The dimension of b is k by p (number of system inputs)
  • The dimension of c is m (number of system outputs) by k
  • The dimension of d is m by p
  • Total number of variables is (k+m)(k+p)
  • The case k=0 is static output feedback
  • When B1,C1 are empty, all I/O channels are in performance measure
hifoo h fixed order optimization
HIFOO: H Fixed-Order Optimization
  • Aims to find a, b, c, d for which H norm is locally optimal
  • Begins by minimizing the spectral abscissa max(real(eig(A-block))) until finds a, b, c, d for which A-block is stable (and therefore H norm is finite)
  • Then locally minimizes H norm
  • Calls HANSO to carry out both optimizations
  • Alternative objectives:
    • Optimize the spectral abscissa instead of quitting when stable
    • Optimize the pseudospectral abscissa
    • Optimize the distance to instability (complex stability radius)
  • The output a,b,c,d can then be input to optimize for larger order k, which cannot give worse result
  • Accepts various input formats
hifoo provides the gradients
HIFOO Provides the Gradients
  • For H norm, it combines
    • left and right singular vector info at point on imaginary axis where sup achieved
    • chain rule
  • For spectral abscissa max(real(eig)), it combines
    • left and right eigenvector info for eigenvalue achieving max real part
    • chain rule
  • Do not have to worry about ties
  • Function is continuous, but gradient is not
  • However, function is differentiable almost everyhere (and virtually everywhere that it is evaluated)
  • Typically, not differentiable at an exact minimizer
benchmark examples ac suite
Benchmark Examples: AC Suite
  • Made extensive runs for the AC suite of 18 aircraft control problems in Leibfritz’ COMPLeIB
  • In many cases we found low-order controllers that had smaller H norm than was supposedly possible according to the standard full-order controller MATLAB routine HINFSYN!
  • Sometimes this was even the case when the order was set to 0 (static output feedback)
  • After I announced this at the SIAM Control meeting in July, HINFSYN was extensively debugged and a new version has been released
benchmark examples mass spring
Benchmark Examples: Mass-Spring
  • A well known simple example with n=4
  • Optimizing spectral abscissa:
    • Order 1, we obtain 0, known to be optimal value
    • Order 2, we obtain about -0.73, previously conjectured to be about -0.5
benchmarks belgian chocolate challenge
Benchmarks: Belgian Chocolate Challenge
  • Simply stated stabilization problem due to Blondel
  • From 1994 to 2000, not known if solvable by any order controller
  • In 2000, an order 11 controller was discovered
  • We found an order 3 controller, and using our analytical results, proved that it is locally optimal
availability of hanso and hifoo
Availability of HANSO and HIFOO
  • Version 0.9, documented in a paper submitted to ROCOND 2006, freely available at


  • Version 0.91, available online but not fully tested so no public link yet
  • Version 0.92 will have further enhancements and we hope to announce it widely in a month or two
some relevant publications
Some Relevant Publications
  • A Robust Gradient Sampling Algorithm for Nonsmooth, Nonconvex Optimization
    • SIOPT, 2005 (with Burke, Lewis)
  • Approximating Subdifferentials by Random Sampling of Gradients
    • MOR, 2002 (with Burke, Lewis)
  • Stabilization via Nonsmooth, Nonconvex Optimization
    • submitted to IEEE Trans-AC (with Burke, Henrion, Lewis)
  • HANSO: A Hybrid Algorithm for Non-Smooth Optimization Based on BFGS, Bundle and Gradient Sampling
    • in planning stage


gong xi fa cai gung hei fat choy

Gong xi fa cai!

Gung hei fat choy!