280 likes | 388 Views
Qualifier Exam in HPC. February 10 th , 2010. Quasi-Newton methods. Alexandru Cioaca. Quasi-Newton methods (nonlinear systems). Nonlinear systems: F(x) = 0, F : R n R n F(x) = [ f i (x 1 ,…, x n ) ] T Such systems appear in the simulation of processes (physical, chemical, etc.)
E N D
Qualifier Exam in HPC February 10th, 2010
Quasi-Newton methods AlexandruCioaca
Quasi-Newton methods(nonlinear systems) • Nonlinear systems: F(x) = 0, F : Rn Rn F(x) = [ fi(x1,…,xn) ]T • Such systems appear in the simulation of processes (physical, chemical, etc.) • Iterative algorithm to solve nonlinear systems • Newton’s method != Nonlinear least-squares
Quasi-Newton methods(nonlinear systems) Standard assumptions • F – continuously differentiable in an open convex set D • F – Lipschitz continuous on D • There is x* in D s.t. F(x*)=0, F’(x*) nonsingular Newton’s method: Starting from x0 (initial iterate) xk+1 = xk – F’(xk)-1 * F(xk), {xk} x* Until termination criterion is satisfied
Quasi-Newton methods(nonlinear systems) • Linear model around xk: Mn(x) = F(xn) + F’(xn)(x-xn) Mn(x) = 0 xn+1 = xn - F’(xn)-1 *F(xn) • Iterates are computed as: F’(xn) * sn = F(xn) xn+1 = xn - sn
Quasi-Newton methods(nonlinear systems) Evaluate F’(xn) • Symbolically • Numerically with finite differences • Automatic differentiation Solve the linear system F’(xn) * sn = F(xn) • Direct solve: LU, Cholesky • Iterative methods: GMRES, CG
Quasi-Newton methods(nonlinear systems) Computation: • F(xk) n scalar functions • F’(xk) n2 scalar functions • LU O(2n3/3) • Cholesky O(n3/3) • Krylov methods (depends on condition number)
Quasi-Newton methods(nonlinear systems) • LU and Cholesky are useful when we want to reuse the factorization (quasi-implicit) • Difficult to parallelize and balance the workload • Cholesky is faster and more stable but needs SPD (!) • For n large, factorization is very impractical (n~106) • Krylov methods contain elements easily parallelizable (updates, inner products, matrix-vector products) • CG is faster and more stable but needs SPD
Quasi-Newton methods(nonlinear systems) Advantages: • Under standard assumptions, Newton’s method converges locally and quadratically • There exists a domain of attraction S which contains the solution • Once the iterates enter S, they stay in S and eventually converge to x* • The algorithm is memoryless (self-corrective)
Quasi-Newton methods(nonlinear systems) Disadvantages: • Convergence depends on the choice of x0 • F’(x) has to be evaluated for each xk • Computation can be expensive: F(xk), F’(xk), sk
Quasi-Newton methods(nonlinear systems) • Implicit schemes for ODEs y’ = f(t,y) Forward Euler: yn+1 = yn + hf(tn,yn) (explicit) Backward Euler: yn+1 = yn + hf(tn+1, yn+1) (implicit) • Implicit schemes need the solution of a nonlinear system (also CN, RK, LMF)
Quasi-Newton methods(nonlinear systems) • How to circumvent evaluating F’(xk) ? • Broyden’s method Bk+1 = Bk + (yk – Bk*sk)*skT / <sk, sk> xk+1 = xk – Bk-1 * F(xk) • Inverse update (Sherman-Morrison formula) Hk+1=Hk+(sk-Hk*yk)*skT*Hk/<sk,Hk*yk> xk+1 = xk – Hk * F(xk) ( sk+1 = xk+1 – xk, yk+1 = F(xk+1) – F(xk) )
Quasi-Newton methods(nonlinear systems) Advantages: • No need to compute F’(xk) • For inverse update – no linear system to solve Disadvantages: • Superlinear convergence • No longer memoryless
Quasi-Newton methods(unconstrained optimization) • Problem: Find the global minimizer of a cost function f : Rn R, x* = arg min f • f differentiable means the problem can be attacked by looking for zeros of the gradient
Quasi-Newton methods(unconstrained optimization) • Descent methods xk+1=xk – λk*Pk*f(xk) Pk = In - steepest descent Pk = 2f(xk)-1 - Newton’s method Pk = Bk-1 - Quasi-Newton • Angle between Pk,f(xk) less than 90 • Bk has to mimic the behavior of the Hessian
Quasi-Newton methods(unconstrained optimization) Global convergence • Line search Step length: backtracking, interpolation Sufficient decrease: Wolfe conditions • Trust regions
Quasi-Newton methods(unconstrained optimization) For Quasi-Newton, Bk has to resemble 2f(xk) • Single-Rank: • Symmetry: • Positive def.: • Inverse update:
Quasi-Newton methods(unconstrained optimization) Computation • Matrix updates, inner products • DFP, PSB 3 matrix-vector products • BFGS 2 matrix-matrix products Storage • Limited memory versions (L-BFGS) • Store {sk, yk} for the last m iterations and recompute H
Further improvements Preconditioning the linear system • For faster convergence one may solve K*Bk*pk = K*F(xk) • If B is spd (and sparse) we can use sparse approximate inverses to generate the preconditioner • This preconditioner can be refined on a subspace of Bk using an algebraic multigrid technique • We need to solve the eigenvalue problem
Further improvements Model reduction • Sometimes the dimension of the system is very large • Smaller model that captures the essence of the original • An approximation of the model variability can be retrieved from an ensemble of forward simulations • The covariance matrix gives the subspace • We need to solve the eigenvalue problem
QR/QL algorithmsfor symmetric matrices • Solves the eigenvalue problem • Iterative algorithm • Uses QR/QL factorization at each step (A=Q*R, Q unitary, R upper triangular) for k = 1,2,.. Ak=Qk*Rk Ak+1=Rk*Qk end • Diagonal of Ak converges to eigenvalues of A
QR/QL algorithmsfor symmetric matrices • The matrix A is reduced to upper Hessenberg form before starting the iterations • Householder reflections (U=I-v*v’) • Reduction is made column-wise • If A is symmetric, it is reduced to tridiagonal form
QR/QL algorithmsfor symmetric matrices • Convergence to a triangular form can be slow • Origin shifts are used to accelerate it for k = 1,2,.. Ak-zk*I=Qk*Rk Ak+1=Rk*Qk+zk*I end • Wilkinson shift • QR makes heavy use of matrix-matrix products
Alternatives to quasi-Newton Inexact Newton methods • Inner iteration – determine a search direction by solving the linear system with a certain tolerance • Only Hessian-vector products are necessary • Outer iteration – line search on the search direction Nonlinear CG • Residual replaced by gradient of cost function • Line search • Different flavors
Alternatives to quasi-Newton Direct search • Does not involve derivatives of the cost function • Uses a structure called simplex to search for decrease in f • Stops when further progress cannot be achieved • Can get stuck in a local minima
More alternatives Monte Carlo • Computational method relying on random sampling • Can be used for optimization (MDO), inverse problems by using random walks • In the case where we have multiple correlated variables, the correlation matrix is spd so we can use Cholesky to factorize it
Conclusions • Newton’s method is a very powerful method with many applications and uses (solving nonlinear systems, finding minima of cost functions). Newton’s method can be used together with many other numerical algorithms (factorizations, linear solvers) • The optimization and parallelization of matrix-vector, matrix-matrix products, decompositions and other numerical methods can have a significant impact in overall performance