Qualifier Exam in HPC

Qualifier Exam in HPC February 10th, 2010

Quasi-Newton methods AlexandruCioaca

Quasi-Newton methods(nonlinear systems) • Nonlinear systems: F(x) = 0, F : Rn  Rn F(x) = [ fi(x1,…,xn) ]T • Such systems appear in the simulation of processes (physical, chemical, etc.) • Iterative algorithm to solve nonlinear systems • Newton’s method != Nonlinear least-squares

Quasi-Newton methods(nonlinear systems) Standard assumptions • F – continuously differentiable in an open convex set D • F – Lipschitz continuous on D • There is x* in D s.t. F(x*)=0, F’(x*) nonsingular Newton’s method: Starting from x0 (initial iterate) xk+1 = xk – F’(xk)-1 * F(xk), {xk}  x* Until termination criterion is satisfied

Quasi-Newton methods(nonlinear systems) • Linear model around xk: Mn(x) = F(xn) + F’(xn)(x-xn) Mn(x) = 0  xn+1 = xn - F’(xn)-1 *F(xn) • Iterates are computed as: F’(xn) * sn = F(xn) xn+1 = xn - sn

Quasi-Newton methods(nonlinear systems) Evaluate F’(xn) • Symbolically • Numerically with finite differences • Automatic differentiation Solve the linear system F’(xn) * sn = F(xn) • Direct solve: LU, Cholesky • Iterative methods: GMRES, CG

Quasi-Newton methods(nonlinear systems) Computation: • F(xk) n scalar functions • F’(xk) n2 scalar functions • LU O(2n3/3) • Cholesky O(n3/3) • Krylov methods (depends on condition number)

Quasi-Newton methods(nonlinear systems) • LU and Cholesky are useful when we want to reuse the factorization (quasi-implicit) • Difficult to parallelize and balance the workload • Cholesky is faster and more stable but needs SPD (!) • For n large, factorization is very impractical (n~106) • Krylov methods contain elements easily parallelizable (updates, inner products, matrix-vector products) • CG is faster and more stable but needs SPD

Quasi-Newton methods(nonlinear systems) Advantages: • Under standard assumptions, Newton’s method converges locally and quadratically • There exists a domain of attraction S which contains the solution • Once the iterates enter S, they stay in S and eventually converge to x* • The algorithm is memoryless (self-corrective)

Quasi-Newton methods(nonlinear systems) Disadvantages: • Convergence depends on the choice of x0 • F’(x) has to be evaluated for each xk • Computation can be expensive: F(xk), F’(xk), sk

Quasi-Newton methods(nonlinear systems) • Implicit schemes for ODEs y’ = f(t,y) Forward Euler: yn+1 = yn + hf(tn,yn) (explicit) Backward Euler: yn+1 = yn + hf(tn+1, yn+1) (implicit) • Implicit schemes need the solution of a nonlinear system (also CN, RK, LMF)

Quasi-Newton methods(nonlinear systems) • How to circumvent evaluating F’(xk) ? • Broyden’s method Bk+1 = Bk + (yk – Bk*sk)*skT / <sk, sk> xk+1 = xk – Bk-1 * F(xk) • Inverse update (Sherman-Morrison formula) Hk+1=Hk+(sk-Hk*yk)*skT*Hk/<sk,Hk*yk> xk+1 = xk – Hk * F(xk) ( sk+1 = xk+1 – xk, yk+1 = F(xk+1) – F(xk) )

Quasi-Newton methods(nonlinear systems) Advantages: • No need to compute F’(xk) • For inverse update – no linear system to solve Disadvantages: • Superlinear convergence • No longer memoryless

Quasi-Newton methods(unconstrained optimization) • Problem: Find the global minimizer of a cost function f : Rn R, x* = arg min f • f differentiable means the problem can be attacked by looking for zeros of the gradient

Quasi-Newton methods(unconstrained optimization) • Descent methods xk+1=xk – λk*Pk*f(xk) Pk = In - steepest descent Pk = 2f(xk)-1 - Newton’s method Pk = Bk-1 - Quasi-Newton • Angle between Pk,f(xk) less than 90 • Bk has to mimic the behavior of the Hessian

Quasi-Newton methods(unconstrained optimization) Global convergence • Line search Step length: backtracking, interpolation Sufficient decrease: Wolfe conditions • Trust regions

Quasi-Newton methods(unconstrained optimization) For Quasi-Newton, Bk has to resemble 2f(xk) • Single-Rank: • Symmetry: • Positive def.: • Inverse update:

Quasi-Newton methods(unconstrained optimization) Computation • Matrix updates, inner products • DFP, PSB 3 matrix-vector products • BFGS 2 matrix-matrix products Storage • Limited memory versions (L-BFGS) • Store {sk, yk} for the last m iterations and recompute H

Further improvements Preconditioning the linear system • For faster convergence one may solve K*Bk*pk = K*F(xk) • If B is spd (and sparse) we can use sparse approximate inverses to generate the preconditioner • This preconditioner can be refined on a subspace of Bk using an algebraic multigrid technique • We need to solve the eigenvalue problem

Further improvements Model reduction • Sometimes the dimension of the system is very large • Smaller model that captures the essence of the original • An approximation of the model variability can be retrieved from an ensemble of forward simulations • The covariance matrix gives the subspace • We need to solve the eigenvalue problem

QR/QL algorithmsfor symmetric matrices • Solves the eigenvalue problem • Iterative algorithm • Uses QR/QL factorization at each step (A=Q*R, Q unitary, R upper triangular) for k = 1,2,.. Ak=Qk*Rk Ak+1=Rk*Qk end • Diagonal of Ak converges to eigenvalues of A

QR/QL algorithmsfor symmetric matrices • The matrix A is reduced to upper Hessenberg form before starting the iterations • Householder reflections (U=I-v*v’) • Reduction is made column-wise • If A is symmetric, it is reduced to tridiagonal form

QR/QL algorithmsfor symmetric matrices • Convergence to a triangular form can be slow • Origin shifts are used to accelerate it for k = 1,2,.. Ak-zk*I=Qk*Rk Ak+1=Rk*Qk+zk*I end • Wilkinson shift • QR makes heavy use of matrix-matrix products

Alternatives to quasi-Newton Inexact Newton methods • Inner iteration – determine a search direction by solving the linear system with a certain tolerance • Only Hessian-vector products are necessary • Outer iteration – line search on the search direction Nonlinear CG • Residual replaced by gradient of cost function • Line search • Different flavors

Alternatives to quasi-Newton Direct search • Does not involve derivatives of the cost function • Uses a structure called simplex to search for decrease in f • Stops when further progress cannot be achieved • Can get stuck in a local minima

More alternatives Monte Carlo • Computational method relying on random sampling • Can be used for optimization (MDO), inverse problems by using random walks • In the case where we have multiple correlated variables, the correlation matrix is spd so we can use Cholesky to factorize it

Conclusions • Newton’s method is a very powerful method with many applications and uses (solving nonlinear systems, finding minima of cost functions). Newton’s method can be used together with many other numerical algorithms (factorizations, linear solvers) • The optimization and parallelization of matrix-vector, matrix-matrix products, decompositions and other numerical methods can have a significant impact in overall performance

Thank you for your time!

Qualifier Exam in HPC

Qualifier Exam in HPC

Presentation Transcript

Dell Research In HPC

Changing HPC in Japan

HPC in linguistic research

Microsoft HPC

HPC in the Cloud

HPC in Angola?

DCQ Qualifier Course

A Type-Checked Restrict Qualifier

HPC

GPU in HPC

HPC in molecular modelling

HPC challenges in Switzerland

Storage Systems in HPC

What’s Working in HPC

QUALIFIER in TREC-12 QA Main Task

DCQ Qualifier - Part 2

Mortgage Qualifier Calculator

NCAA Zone Qualifier Emma Whitner

HPC Program