530 likes | 660 Views
In Lecture 25 of ECE 530, Prof. Tom Overbye discusses Krylov subspace methods, pivotal in solving large-scale electrical systems efficiently. Key topics include the Cayley-Hamilton theorem, the definition of Krylov subspaces, and techniques such as steepest descent and the conjugate gradient method. The lecture emphasizes iterative approaches using Krylov subspaces to overcome challenges posed by dense systems and facilitates solving linear equations with computational efficiency. Homework and exam details are also provided.
E N D
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Lecture 25: Krylov Subspace Methods Prof. Tom Overbye Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign overbye@illinois.edu
Announcements • HW 8 is due Thursday Dec 5 • Final exam Wednesday Dec 18 from 1:30 to 4:30pm in this room (EL 260) • Closed book, closed notes; you may bring in one new note sheet and your exam 1 note sheet, along with simple calculators
Krylov Subspace Outline • Covered in Lecture 24 • Review of fields and vector spaces • Eigensystem basics • Definition of Krylov subspaces and annihilating polynomial • Generic Krylov subspace solver • Steepest descent • Covered Today • Conjugate gradient • Arnoldi process
Cayley-Hamilton Theorem and Minimum Polynomial • The Cayley-Hamilton theorem states that every square matrix satisfies its own characteristic equation • The minimal polynomial is the polynomial such thatfor minimum degree m • The minimal polynomial and characteristic polynomial are the same if A has n distinct eigenvalues
Cayley-Hamilton Theorem and Minimum Polynomial • This allows us to express A-1 in terms of powers of A • For the previous example with and • Verify
Solution of Linear Equations • As covered previously, the solution of a dense (i.e., non-sparse) system of n equations is O(n3) • Even for a sparse A the direct solution of linear equations can be computationally expensive, and using the previous techniques not easy to parallize • We next present an alternative, iterative approach to obtain the solution using the application of Krylov subspace based methods • Builds on the idea that we can express x = A-1b
Definition of a Krylov Subspace • Given a matrix A and a vector v, the ith order Krylov subspace is defined as • Clearly, i cannot be made arbitrarily large; if fact, for a matrix A of rank n, then i n • For a specified matrix A and a vector v, the largest value of i is given by the order of the annihilating polynomial
Generic Krylov Subspace Solver • The following is a generic Krylov subspace solver method for solving Ax = b using only matrix vector multiplies • Step 1: Start with an initial guess x(0) and some predefined error tolerance e > 0; compute the residual,r(0) = b – A x(0); set i = 0 • Step 2: While r(i) e Do (a) i := i + 1 (b) get Ki(r(0),A) (c) get x(i) {x(0) + Ki(r(0),A)} to minimize r(i) • Stop
Krylov Subspace Solver • Note that no calculations are performed in Step 2 once i becomes greater than the order of the annihilating polynomial • The Krylov subspace methods differ from each other in • the construction scheme of the Krylov subspace in Step 2(b) of the scheme • the residual minimization criterion used in Step 2(c) • A common initial guess is x(0)= 0, giving r(0) = b – A x(0)= b
Krylov Subspace Solver • Every solver involves the A matrix only in matrix-vector products: Air(0), i=1,2,… • The methods can strive to effectively exploit the spectral matrix structure of A with the aim to make the overall procedure computationally efficient • To make this approach computationally efficient it is carried out by using the spectral information of A; for this purpose we order the eigenvalues of A according to their absolute values with
Steepest Descent Approach • Presented first since it is easiest to explain • Assume in Ax = b that A is symmetric, positive definite (all eigenvalues real, nonnegative) so • Let • The solution x* that minimizes f(x), is given by • Which is obviously the solution to Ax = b
Steepest Descent Algorithm • Set i=0, e > 0, x(0) = 0, so r(i) = b - Ax(0) = b • While r(i) eDo (a) calculate (b)x(i+1) = x(i) + a(i) r(i) (c) r(i+1) = r(i)- a(i) Ar(i) (d) i := i + 1End While Note there is onlyone matrix, vectormultiply per iteration
Steepest Descent Example • Keep in mind these algorithms are designed for large systems, whereas the example is necessarily small! • Let • At solution f(x*) = -29.03 • Select x(0) = 0, e = 0.1, then r(0)= b
Steepest Descent Convergence • We define the A-norm of x • We can show that where is the condition number of A, i.e., For our example 26.8,and (-1)/(+1) = 0.928
Steepest Descent Convergence • Because (-1)/(+1) < 1 the error will decrease with each steepest descent iteration, albeit potentially quite solutions (as in our example) • The function values decreases quicker, as perbut this can still be quite slow if is large • If you are thinking, "there must be a better way," you are correct. The problem is we continually searching in the same set of directions
Conjugate Direction Methods • Steepest descent often finds itself taking steps along the same direction as that of its earlier steps • An improvement over this is to take the exact number of steps using a set of search directions and obtain the solution after n such steps; this is the basic idea in the conjugate direction methods • Image compares steepestdescent with a conjugate directions approach Image Source: http://en.wikipedia.org/wiki/File:Conjugate_gradient_illustration.svg
Conjugate Direction Methods • Assume Ax= bwith Asymmetric, positive definite • We denote the n search directions by • At the ith iteration, we construct to obtain the value of the new iterate
Conjugate Direction Methods • Proposition: If A is positive definite, and the set of nonzero vectors d(0), d(1)… d(n-1) are A-orthogonal, then these vectors are l.i. • Proof: Suppose there are constants ai, i=0,1,2,…n such Multiplying by A and then scalar product with d(i) givesSince A is positive definite, it follows ai= 0Hence the vectors are l.i. Recall l.i. only if all a's = 0
Conjugate Direction Methods • The proposition that the vectors are l.i. implies • Multiplying
Residual and Error • In solving these problems it is helpful to keep in mind the definition of the residual and the error:r(i) = b - Ax(i)e(i)= x(i) – x* • The residual is easy to compute. However, during the iteration we cannot compute the error since we do not know x* • Both go to zero as we converge
Conjugate Directions Method • The value of (i) is chosen so that e(i+1) is A-orthogonal to d(i); in this way we need not take another step along the directions : or,
Conjugate Directions Method • To show that this procedure computes x* inn steps, we express e(0) in terms of the • Since the d(j) are A–orthogonal for 0 k n-1
Conjugate Directions Method so that • The previous equation may also be rewritten as
Conjugate Directions Method • The function f(x(i) + (i)d(i)) attains its minimum at the value (i) at which • Therefore, at each iteration, the component of the residual along a direction d(i) disappears
Conjugate Direction Methods • In fact, • But this scheme explicitly requires e(i), as value which as defined we do not know • Rather we evaluate it in terms of the known quantities
Conjugate Direction Methods • But and so we can write • The conjugate directions algorithm consists of the following iterations 27
Conjugate Directions Method What we have not yet covered is how to get the n A-search directions. We'll cover that shortly, but the next slide presentsan algorithm, followed by an example.
Conjugate Gradient Algorithm • Set i=0, e > 0, x(0) = 0, so r(i) = b - Ax(0) = b • While r(i) eDo(a)If i = 0 Then d(i+1) = r(i) Else Begin d(i+1) = r(i) + b(i+1)d(i) End
Conjugate Gradient Algorithm (b) (c) x(i+1) = x(i) + a(i+1) d(i+1) (d) r(i+1) = r(i) - a(i+1) Ad(i+1) (e) i := i + 1 End While Note that there is onlyone matrix vectormultiply periteration!
Conjugate Gradient Example • Using the same system as before, let • Select i=0, x(0) = 0, e = 0.1, then r(0) = b • With i = 0, d(1) = r(0) = b
Conjugate Gradient Example This first step exactly matches Steepest Descent
Conjugate Gradient Example • With i=1 solve for b(i+1) • Then
Conjugate Gradient Example • With i=2 solve for b(i+1) • Then
Conjugate Gradient Example • And Done in 3 = n iterations!
Gram-Schmidt Orthogonalization • Trick is to quickly generate the A–orthogonal search directions • One can make use of the Gram-Schmidt orthogonalization procedure • Let {u0, u1, …, un-1} be a l.i.set of n vectors • We construct each successive direction, d(j), j=0, 1, 2, … n-1, by removing from ui all the components along directions • In the conjugate gradient method ui = r(i)
Gram-Schmidt Orthogonalization • We set d(0) = u(0), and for i < j we set • We obtain the bik by using the A–orthogonality condition on the d(k)
Gram-Schmidt Orthogonalization • Solving for bik we get • A direct application of this approach is not computationally practical since at each iteration all the known search directions are involved in the computation
Gram-Schmidt Orthogonalization • Since in conjugate gradientthe residual is orthogonal to the previous search directions, and unless the residual is 0, a new vector is created and added to the l.i. set of search directions • This makes the Gram-Schmidt orthogonalization significantly simpler
Gram-Schmidt Orthogonalization • The new residual is orthogonal to the previous search directions, which span the same subspace as the setwith the property that
Gram-Schmidt Orthogonalization • But in the conjugate gradient approach r(i) is a linear combination of the previous residuals and Ad(i-1) • Therefore • Hence is simply the ithKrylov subspace • Since r(i+1) is orthogonal to the space and , r(i+1) is A-orthogonal to the previous search directions, but not to d(i)
Gram-Schmidt Orthogonalization • Also sincethen • Which gives
Gram-Schmidt Orthogonalization and • Therefore, the previous search directions and bij values do not need to be stored • Also we set
Gram-Schmidt Orthogonalization and evaluate where we make use of
Conjugate Gradient Convergence • We can show that • This compares for steepest descent • In the previous example with = 26.8, 47
THE ARNOLDI PROCESS • One simple way around this problem is to remove from each new vector that is constructed those components that are in the directions already generated previously; in addition, we can also have the option to renormalize each vector in this process
THE ARNOLDI PROCESS • At the step i, to construct we define with orthonormal vectors : • We increase by 1 the dimension of , by computing
THE ARNOLDI PROCESS • We orthogonalizew.r.t.the vectors by setting where, the are chosen so that • We normalize and define