Exploring Krylov Subspace Methods for Large-Scale Electrical Systems

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Lecture 25: Krylov Subspace Methods Prof. Tom Overbye Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign overbye@illinois.edu

Announcements • HW 8 is due Thursday Dec 5 • Final exam Wednesday Dec 18 from 1:30 to 4:30pm in this room (EL 260) • Closed book, closed notes; you may bring in one new note sheet and your exam 1 note sheet, along with simple calculators

Krylov Subspace Outline • Covered in Lecture 24 • Review of fields and vector spaces • Eigensystem basics • Definition of Krylov subspaces and annihilating polynomial • Generic Krylov subspace solver • Steepest descent • Covered Today • Conjugate gradient • Arnoldi process

Cayley-Hamilton Theorem and Minimum Polynomial • The Cayley-Hamilton theorem states that every square matrix satisfies its own characteristic equation • The minimal polynomial is the polynomial such thatfor minimum degree m • The minimal polynomial and characteristic polynomial are the same if A has n distinct eigenvalues

Cayley-Hamilton Theorem and Minimum Polynomial • This allows us to express A-1 in terms of powers of A • For the previous example with and • Verify

Solution of Linear Equations • As covered previously, the solution of a dense (i.e., non-sparse) system of n equations is O(n3) • Even for a sparse A the direct solution of linear equations can be computationally expensive, and using the previous techniques not easy to parallize • We next present an alternative, iterative approach to obtain the solution using the application of Krylov subspace based methods • Builds on the idea that we can express x = A-1b

Definition of a Krylov Subspace • Given a matrix A and a vector v, the ith order Krylov subspace is defined as • Clearly, i cannot be made arbitrarily large; if fact, for a matrix A of rank n, then i  n • For a specified matrix A and a vector v, the largest value of i is given by the order of the annihilating polynomial

Generic Krylov Subspace Solver • The following is a generic Krylov subspace solver method for solving Ax = b using only matrix vector multiplies • Step 1: Start with an initial guess x(0) and some predefined error tolerance e > 0; compute the residual,r(0) = b – A x(0); set i = 0 • Step 2: While r(i)   e Do (a) i := i + 1 (b) get Ki(r(0),A) (c) get x(i) {x(0) + Ki(r(0),A)} to minimize r(i)  • Stop

Krylov Subspace Solver • Note that no calculations are performed in Step 2 once i becomes greater than the order of the annihilating polynomial • The Krylov subspace methods differ from each other in • the construction scheme of the Krylov subspace in Step 2(b) of the scheme • the residual minimization criterion used in Step 2(c) • A common initial guess is x(0)= 0, giving r(0) = b – A x(0)= b

Krylov Subspace Solver • Every solver involves the A matrix only in matrix-vector products: Air(0), i=1,2,… • The methods can strive to effectively exploit the spectral matrix structure of A with the aim to make the overall procedure computationally efficient • To make this approach computationally efficient it is carried out by using the spectral information of A; for this purpose we order the eigenvalues of A according to their absolute values with

Steepest Descent Approach • Presented first since it is easiest to explain • Assume in Ax = b that A is symmetric, positive definite (all eigenvalues real, nonnegative) so • Let • The solution x* that minimizes f(x), is given by • Which is obviously the solution to Ax = b

Steepest Descent Algorithm • Set i=0, e > 0, x(0) = 0, so r(i) = b - Ax(0) = b • While  r(i)   eDo (a) calculate (b)x(i+1) = x(i) + a(i) r(i) (c) r(i+1) = r(i)- a(i) Ar(i) (d) i := i + 1End While Note there is onlyone matrix, vectormultiply per iteration

Steepest Descent Example • Keep in mind these algorithms are designed for large systems, whereas the example is necessarily small! • Let • At solution f(x*) = -29.03 • Select x(0) = 0, e = 0.1, then r(0)= b

Steepest Descent Convergence • We define the A-norm of x • We can show that where is the condition number of A, i.e., For our example 26.8,and (-1)/(+1) = 0.928

Steepest Descent Convergence • Because (-1)/(+1) < 1 the error will decrease with each steepest descent iteration, albeit potentially quite solutions (as in our example) • The function values decreases quicker, as perbut this can still be quite slow if is large • If you are thinking, "there must be a better way," you are correct. The problem is we continually searching in the same set of directions

Conjugate Direction Methods • Steepest descent often finds itself taking steps along the same direction as that of its earlier steps • An improvement over this is to take the exact number of steps using a set of search directions and obtain the solution after n such steps; this is the basic idea in the conjugate direction methods • Image compares steepestdescent with a conjugate directions approach Image Source: http://en.wikipedia.org/wiki/File:Conjugate_gradient_illustration.svg

Conjugate Direction Methods • Assume Ax= bwith Asymmetric, positive definite • We denote the n search directions by • At the ith iteration, we construct to obtain the value of the new iterate

Conjugate Direction Methods • Proposition: If A is positive definite, and the set of nonzero vectors d(0), d(1)… d(n-1) are A-orthogonal, then these vectors are l.i. • Proof: Suppose there are constants ai, i=0,1,2,…n such Multiplying by A and then scalar product with d(i) givesSince A is positive definite, it follows ai= 0Hence the vectors are l.i. Recall l.i. only if all a's = 0

Conjugate Direction Methods • The proposition that the vectors are l.i. implies • Multiplying

Residual and Error • In solving these problems it is helpful to keep in mind the definition of the residual and the error:r(i) = b - Ax(i)e(i)= x(i) – x* • The residual is easy to compute. However, during the iteration we cannot compute the error since we do not know x* • Both go to zero as we converge

Conjugate Directions Method • The value of (i) is chosen so that e(i+1) is A-orthogonal to d(i); in this way we need not take another step along the directions : or,

Conjugate Directions Method • To show that this procedure computes x* inn steps, we express e(0) in terms of the • Since the d(j) are A–orthogonal for 0  k  n-1

Conjugate Directions Method so that • The previous equation may also be rewritten as

Conjugate Directions Method and

Conjugate Directions Method • The function f(x(i) + (i)d(i)) attains its minimum at the value (i) at which • Therefore, at each iteration, the component of the residual along a direction d(i) disappears

Conjugate Direction Methods • In fact, • But this scheme explicitly requires e(i), as value which as defined we do not know • Rather we evaluate it in terms of the known quantities

Conjugate Direction Methods • But and so we can write • The conjugate directions algorithm consists of the following iterations 27

Conjugate Directions Method What we have not yet covered is how to get the n A-search directions. We'll cover that shortly, but the next slide presentsan algorithm, followed by an example.

Conjugate Gradient Algorithm • Set i=0, e > 0, x(0) = 0, so r(i) = b - Ax(0) = b • While  r(i)   eDo(a)If i = 0 Then d(i+1) = r(i) Else Begin d(i+1) = r(i) + b(i+1)d(i) End

Conjugate Gradient Algorithm (b) (c) x(i+1) = x(i) + a(i+1) d(i+1) (d) r(i+1) = r(i) - a(i+1) Ad(i+1) (e) i := i + 1 End While Note that there is onlyone matrix vectormultiply periteration!

Conjugate Gradient Example • Using the same system as before, let • Select i=0, x(0) = 0, e = 0.1, then r(0) = b • With i = 0, d(1) = r(0) = b

Conjugate Gradient Example This first step exactly matches Steepest Descent

Conjugate Gradient Example • With i=1 solve for b(i+1) • Then

Conjugate Gradient Example • And

Conjugate Gradient Example • With i=2 solve for b(i+1) • Then

Conjugate Gradient Example • And Done in 3 = n iterations!

Gram-Schmidt Orthogonalization • Trick is to quickly generate the A–orthogonal search directions • One can make use of the Gram-Schmidt orthogonalization procedure • Let {u0, u1, …, un-1} be a l.i.set of n vectors • We construct each successive direction, d(j), j=0, 1, 2, … n-1, by removing from ui all the components along directions • In the conjugate gradient method ui = r(i)

Gram-Schmidt Orthogonalization • We set d(0) = u(0), and for i < j we set • We obtain the bik by using the A–orthogonality condition on the d(k)

Gram-Schmidt Orthogonalization • Solving for bik we get • A direct application of this approach is not computationally practical since at each iteration all the known search directions are involved in the computation

Gram-Schmidt Orthogonalization • Since in conjugate gradientthe residual is orthogonal to the previous search directions, and unless the residual is 0, a new vector is created and added to the l.i. set of search directions • This makes the Gram-Schmidt orthogonalization significantly simpler

Gram-Schmidt Orthogonalization • The new residual is orthogonal to the previous search directions, which span the same subspace as the setwith the property that

Gram-Schmidt Orthogonalization • But in the conjugate gradient approach r(i) is a linear combination of the previous residuals and Ad(i-1) • Therefore • Hence is simply the ithKrylov subspace • Since r(i+1) is orthogonal to the space and , r(i+1) is A-orthogonal to the previous search directions, but not to d(i)

Gram-Schmidt Orthogonalization • Also sincethen • Which gives

Gram-Schmidt Orthogonalization

Gram-Schmidt Orthogonalization and • Therefore, the previous search directions and bij values do not need to be stored • Also we set

Gram-Schmidt Orthogonalization and evaluate where we make use of

Conjugate Gradient Convergence • We can show that • This compares for steepest descent • In the previous example with = 26.8, 47

THE ARNOLDI PROCESS • One simple way around this problem is to remove from each new vector that is constructed those components that are in the directions already generated previously; in addition, we can also have the option to renormalize each vector in this process

THE ARNOLDI PROCESS • At the step i, to construct we define with orthonormal vectors : • We increase by 1 the dimension of , by computing

THE ARNOLDI PROCESS • We orthogonalizew.r.t.the vectors by setting where, the are chosen so that • We normalize and define

Exploring Krylov Subspace Methods for Large-Scale Electrical Systems