Loading in 5 sec....

HANSO and HIFOO Two New MATLAB CodesPowerPoint Presentation

HANSO and HIFOO Two New MATLAB Codes

- 357 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'hanso and hifoo two new matlab codes ' - JasminFlorian

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### HANSO and HIFOOTwo New MATLAB Codes

Michael L. Overton

Courant Institute of Mathematical Sciences

New York University

Singapore, Jan 2006

Two New Codes

- HANSO: Hybrid Algorithm for Nonsmooth Optimization
- Aimed at finding local minimizers of general nonsmooth, nonconvex optimization problems
- Joint with Jim Burke and Adrian Lewis

- HIFOO: H-Infinity Fixed-Order Optimization
- Aimed at solving specific nonsmooth, nonconvex optimization problems arising in control
- Built on HANSO
- Joint with Didier Henrion

Many optimization objectives are

- Nonconvex
- Nonsmooth
- Continuous
- Differentiable almost everywhere
- With gradients often available at little additional cost beyond that of computing function
- Subdifferentially regular
- Sometimes, non-Lipschitz

Numerical Optimization of Nonsmooth, Nonconvex Functions

- Steepest descent: jams
- Bundle methods: better, but these are mainly intended for nonsmooth convex functions
- We developed a simple method for nonsmooth, nonconvex minimization based on Gradient Sampling
- Intended for continuous functions that are differentiable almost everywhere, and for which the gradient can be easily computed when it is defined
- User need only write routine to return function value and gradient – and need not worry about nondifferentiable cases, e.g., ties for a max when coding the gradient
- Very recently, found BFGS is often very effective when implemented correctly - and much less expensive

BFGS

- Standard quasi-Newton method for optimizing smooth functions, using gradient differences to update an inverse Hessian matrix H, defining a local quadratic model of the objective function
- Conventional wisdom: jams on nonsmooth functions
- Amazingly, works very well as long as implemented right (weak Wolfe line search is essential)
- It builds an excellent, extremely ill-conditioned quadratic model
- Often, runs until cond(H) is 1016 before breaking down, taking steplengths of around 1
- Often converges linearly (not superlinearly)
- By contrast, steepest descent and Newton’s method usually jam
- Not very reliable, and no convergence theory

Minimizing a Product of Eigenvalues

- An application due to Anstreicher and Lee
- Min log of product of K largest eigenvalues of AoX
subject to X positive semidefinite and diag(X)=1

(A, X symmetric) (we usually set K=N/2)

- Would be equivalent to SDP if replace product by sum
- Number of variables is n = N(N-1)/2 where N is dimension of X
- Following Burer, substitute Y Y’ for X where Y is N by r so n is reduced to rN and normalize Y so its rows have norm 1: thus both constraints eliminated
- Typically objective is not smooth at minimizer, because the K-th largest eigenvalue is multiple
- Results for N=10 and rank r = 2 through 10…

N=10, r=6: eigenvalues of optimal AoX

3.163533558166873e-001

5.406646569314501e-001

5.406646569314647e-001

5.406646569314678e-001

5.406646569314684e-001

5.406646569314689e-001

5.406646569314706e-001

7.149382062194134e-001

7.746621702661097e-001

1.350303124150612e+000

The U and V Spaces

- The condition number of H, the BFGS approximation to inverse Hessian, typically reaches 1016 !!
- Let’s look at eigenvalues of H:
H = QDQ* (* denotes transpose)

Search direction is d = -Hg= -QDQ*g (where g = gradient)

- Next plot shows
diag(D), sorted into ascending order

Components of |Q*g|, using same ordering

Components of |DQ*g|, using same ordering (search direction expanded in eigenvector basis)

- Tiny eigenvalues of H correspond to “nonsmooth” V-space
- Other eigenvalues of H correspond to “smooth” U-space
- From matrix theory, dim(V-space) = m(m+1)/2 – 1, where m is multiplicity of Kth eigenvalue of AoX

More on the U and V Spaces

- H seems to be an excellent approximation to the “limiting inverse Hessian” and evidently gives us bases for the optimal U and V spaces
- Let’s look at plots of f along random directions in the U and V spaces, as defined by eigenvectors of H, passing through minimizer (for the N=10, r=6 example)

Line Search for BFGS

- Sufficient decrease:
for 0 < c1< 1,

f (x + td) ≤ f (x) + c1 t(grad f)(x)*d

- Weak Wolfe condition on derivative
for c1< c2 < 1,

(grad f)(x + td)*d ≥ c2 (grad f)(x)*d

- Not strong Wolfe condition on derivative
| (grad f)(x + t d)*d | ≤ - c2 (grad f)(x)*d

- Essential when f is nonsmooth
- No reason to use strong Wolfe even if f is smooth
- Much simpler to implement
- Why ever use strong Wolfe?

But how to check if final x is locally optimal?

- Extreme ill-conditioning of H is a strong clue, but is expensive to compute and proves nothing
- If x is locally optimal, running a local bundle method from x establishes nonsmooth stationarity: this involves repeated null-step line searches and building a bundle of gradients obtained via the line searches
- When line search returns null step, it must also return gradient at a point lying across the discontinuity
- Then new d = – arg min { ||d||: dÎconvG }, where G is the set of gradients in the bundle
- Terminate if ||d|| is small
- If ||d|| is 0 and line search steps were 0, x is Clarke stationary; more realistically it is “close” to Clarke stationary
- I’ll call this Clarke Quay Stationary since it is a Quay Idea

First-order local optimality

- Is this implied by Clarke stationarity of f at x ?
- No, for example f(x) = -|x|is Clarke stationary at 0
- Yes, in sense that f’(x; d) ≥ 0 for all directions d, when f is regular at x (subdifferentially regular, Clarke regular)
- Most of the functions that we have studied areregular everywhere, although this is sometimes hard to prove
- Regularity generalizes smoothness and convexity

Harder Problems

- Eigenvalue product problems are interesting, but the Lipschitz constant L for f is not large
- Harder problems:
- Chebyshev Exponential Approximation
- Optimization of Distance to Instability
- Pseudospectral Abscissa Optimization (arbitarily large L)
- Spectral Abscissa Optimization (not even Lipschitz)

- In all cases, BFGS turns out to be substantially less reliable than gradient sampling

Gradient Sampling Algorithm Initialize e and x. Repeat

- Get G, a set of gradients of function f evaluated at x and at points near x(sampling controlled by e)
- Let d = – arg min { ||d||: dÎconvG }
- Line search: replace x by x+ t d, with f(x + t d) < f(x)(if d is not 0)

until d = 0(or ||d|| £ tol)

Then reduce e and repeat.

Convergence Theory for Gradient Sampling Method (SIOPT, 2005)

- Suppose
- f is locally Lipschitz and coercive
- f is continuously differentiable on an open dense subset of its domain
- number of gradients sampled near each iterate is greater than problem dimension
- line search uses an appropriate sufficient decrease condition

- Then, with probability one and for fixed sampling parameter e, algorithm generates a sequence of points with a cluster point xthat is e-Clarke stationary
- If f has a unique Clarke stationary point x, then the set of all cluster points generated by the algorithm converges to x as e is reduced to zero
- Kiwiel has already improved on this! (For a slightly different, as yet untested, version of the algorithm.)

HANSO 2005)

- User provides routine to compute f and its gradient at any given x (do not worry about nonsmooth cases)
- BFGS is then run from many randomly generated or user-provided starting points. Quit if gradient at best point found is small.
- Otherwise, a local bundle method is run from best point found, attempting to verify local stationarity.
- If this is not successful, gradient sampling is initiated.
- Regardless, return a final x along with a set of nearby points, the corresponding gradients, and the d which is the smallest vector in the convex hull of these gradients
- If the nearby points are near enough and d is small enough, f is Clarke Quay Stationary at x
- This is an approximate first-order local optimality condition if f is regular at x

Optimization in Control 2005)

- Often, not many variables, as in low-order controller design
- Reasons:
- engineers like simple controllers
- nonsmooth, nonconvex optimization problems in even a few variables can be very difficult

- Sometimes, more variables are added to the problem in an attempt to make it more tractable
- Example: adding a Lyapunov matrix variable P and imposing stability on A by AP + PA*£ 0
- When A depends linearly on the original parameters, this results in a bilinear matrix inequality (BMI), which is typically difficult to solve
- Our approach: tackle the original problem
- Looking for locally optimal solutions

H 2005) Norm of a Transfer Function

- Transfer function G(s)=C (sI – A)-1 B + D
- SS representation
- Hnorm is if A is not stable (stable means all eigenvalues have negative real part) and otherwise, sup||G(s)||2 over all imaginary s
- Standard way to compute it is norm(SS,inf) in Matlab control toolbox
- LINORM from SLICOT is much faster

H 2005) Fixed-Order Controller Design

- Choose matrices a, b, c, d to minimize the H norm of the transfer function
- The dimension of a is k by k (order, between 0 and n=dim(A))
- The dimension of b is k by p (number of system inputs)
- The dimension of c is m (number of system outputs) by k
- The dimension of d is m by p
- Total number of variables is (k+m)(k+p)

H 2005) Fixed-Order Controller Design

- Choose matrices a, b, c, d to minimize the H norm of the transfer function
- The dimension of a is k by k (order, between 0 and n=dim(A))
- The dimension of b is k by p (number of system inputs)
- The dimension of c is m (number of system outputs) by k
- The dimension of d is m by p
- Total number of variables is (k+m)(k+p)
- The case k=0 is static output feedback

H 2005) Fixed-Order Controller Design

- Choose matrices a, b, c, d to minimize the H norm of the transfer function
- The dimension of a is k by k (order, between 0 and n=dim(A))
- The dimension of b is k by p (number of system inputs)
- The dimension of c is m (number of system outputs) by k
- The dimension of d is m by p
- Total number of variables is (k+m)(k+p)
- The case k=0 is static output feedback
- When B1,C1 are empty, all I/O channels are in performance measure

HIFOO: H 2005) Fixed-Order Optimization

- Aims to find a, b, c, d for which H norm is locally optimal
- Begins by minimizing the spectral abscissa max(real(eig(A-block))) until finds a, b, c, d for which A-block is stable (and therefore H norm is finite)
- Then locally minimizes H norm
- Calls HANSO to carry out both optimizations
- Alternative objectives:
- Optimize the spectral abscissa instead of quitting when stable
- Optimize the pseudospectral abscissa
- Optimize the distance to instability (complex stability radius)

- The output a,b,c,d can then be input to optimize for larger order k, which cannot give worse result
- Accepts various input formats

HIFOO Provides the Gradients 2005)

- For H norm, it combines
- left and right singular vector info at point on imaginary axis where sup achieved
- chain rule

- For spectral abscissa max(real(eig)), it combines
- left and right eigenvector info for eigenvalue achieving max real part
- chain rule

- Do not have to worry about ties
- Function is continuous, but gradient is not
- However, function is differentiable almost everyhere (and virtually everywhere that it is evaluated)
- Typically, not differentiable at an exact minimizer

Benchmark Examples: AC Suite 2005)

- Made extensive runs for the AC suite of 18 aircraft control problems in Leibfritz’ COMPLeIB
- In many cases we found low-order controllers that had smaller H norm than was supposedly possible according to the standard full-order controller MATLAB routine HINFSYN!
- Sometimes this was even the case when the order was set to 0 (static output feedback)
- After I announced this at the SIAM Control meeting in July, HINFSYN was extensively debugged and a new version has been released

Benchmark Examples: Mass-Spring 2005)

- A well known simple example with n=4
- Optimizing spectral abscissa:
- Order 1, we obtain 0, known to be optimal value
- Order 2, we obtain about -0.73, previously conjectured to be about -0.5

Benchmarks: 2005)Belgian Chocolate Challenge

- Simply stated stabilization problem due to Blondel
- From 1994 to 2000, not known if solvable by any order controller
- In 2000, an order 11 controller was discovered
- We found an order 3 controller, and using our analytical results, proved that it is locally optimal

Availability of HANSO and HIFOO 2005)

- Version 0.9, documented in a paper submitted to ROCOND 2006, freely available at
http://www.cs.nyu.edu/overton/software/hifoo/

- Version 0.91, available online but not fully tested so no public link yet
- Version 0.92 will have further enhancements and we hope to announce it widely in a month or two

Some Relevant Publications 2005)

- A Robust Gradient Sampling Algorithm for Nonsmooth, Nonconvex Optimization
- SIOPT, 2005 (with Burke, Lewis)

- Approximating Subdifferentials by Random Sampling of Gradients
- MOR, 2002 (with Burke, Lewis)

- Stabilization via Nonsmooth, Nonconvex Optimization
- submitted to IEEE Trans-AC (with Burke, Henrion, Lewis)

- HANSO: A Hybrid Algorithm for Non-Smooth Optimization Based on BFGS, Bundle and Gradient Sampling
- in planning stage
http://www.cs.nyu.edu/faculty/overton/

- in planning stage

Download Presentation

Connecting to Server..