1 / 53

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems. Lecture 25: Krylov Subspace Methods. Prof. Tom Overbye Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign overbye@illinois.edu. Announcements. HW 8 is due Thursday Dec 5

zurina
Download Presentation

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Lecture 25: Krylov Subspace Methods Prof. Tom Overbye Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign overbye@illinois.edu

  2. Announcements • HW 8 is due Thursday Dec 5 • Final exam Wednesday Dec 18 from 1:30 to 4:30pm in this room (EL 260) • Closed book, closed notes; you may bring in one new note sheet and your exam 1 note sheet, along with simple calculators

  3. Krylov Subspace Outline • Covered in Lecture 24 • Review of fields and vector spaces • Eigensystem basics • Definition of Krylov subspaces and annihilating polynomial • Generic Krylov subspace solver • Steepest descent • Covered Today • Conjugate gradient • Arnoldi process

  4. Cayley-Hamilton Theorem and Minimum Polynomial • The Cayley-Hamilton theorem states that every square matrix satisfies its own characteristic equation • The minimal polynomial is the polynomial such thatfor minimum degree m • The minimal polynomial and characteristic polynomial are the same if A has n distinct eigenvalues

  5. Cayley-Hamilton Theorem and Minimum Polynomial • This allows us to express A-1 in terms of powers of A • For the previous example with and • Verify

  6. Solution of Linear Equations • As covered previously, the solution of a dense (i.e., non-sparse) system of n equations is O(n3) • Even for a sparse A the direct solution of linear equations can be computationally expensive, and using the previous techniques not easy to parallize • We next present an alternative, iterative approach to obtain the solution using the application of Krylov subspace based methods • Builds on the idea that we can express x = A-1b

  7. Definition of a Krylov Subspace • Given a matrix A and a vector v, the ith order Krylov subspace is defined as • Clearly, i cannot be made arbitrarily large; if fact, for a matrix A of rank n, then i  n • For a specified matrix A and a vector v, the largest value of i is given by the order of the annihilating polynomial

  8. Generic Krylov Subspace Solver • The following is a generic Krylov subspace solver method for solving Ax = b using only matrix vector multiplies • Step 1: Start with an initial guess x(0) and some predefined error tolerance e > 0; compute the residual,r(0) = b – A x(0); set i = 0 • Step 2: While r(i)   e Do (a) i := i + 1 (b) get Ki(r(0),A) (c) get x(i) {x(0) + Ki(r(0),A)} to minimize r(i)  • Stop

  9. Krylov Subspace Solver • Note that no calculations are performed in Step 2 once i becomes greater than the order of the annihilating polynomial • The Krylov subspace methods differ from each other in • the construction scheme of the Krylov subspace in Step 2(b) of the scheme • the residual minimization criterion used in Step 2(c) • A common initial guess is x(0)= 0, giving r(0) = b – A x(0)= b

  10. Krylov Subspace Solver • Every solver involves the A matrix only in matrix-vector products: Air(0), i=1,2,… • The methods can strive to effectively exploit the spectral matrix structure of A with the aim to make the overall procedure computationally efficient • To make this approach computationally efficient it is carried out by using the spectral information of A; for this purpose we order the eigenvalues of A according to their absolute values with

  11. Steepest Descent Approach • Presented first since it is easiest to explain • Assume in Ax = b that A is symmetric, positive definite (all eigenvalues real, nonnegative) so • Let • The solution x* that minimizes f(x), is given by • Which is obviously the solution to Ax = b

  12. Steepest Descent Algorithm • Set i=0, e > 0, x(0) = 0, so r(i) = b - Ax(0) = b • While  r(i)   eDo (a) calculate (b)x(i+1) = x(i) + a(i) r(i) (c) r(i+1) = r(i)- a(i) Ar(i) (d) i := i + 1End While Note there is onlyone matrix, vectormultiply per iteration

  13. Steepest Descent Example • Keep in mind these algorithms are designed for large systems, whereas the example is necessarily small! • Let • At solution f(x*) = -29.03 • Select x(0) = 0, e = 0.1, then r(0)= b

  14. Steepest Descent Convergence • We define the A-norm of x • We can show that where is the condition number of A, i.e., For our example 26.8,and (-1)/(+1) = 0.928

  15. Steepest Descent Convergence • Because (-1)/(+1) < 1 the error will decrease with each steepest descent iteration, albeit potentially quite solutions (as in our example) • The function values decreases quicker, as perbut this can still be quite slow if is large • If you are thinking, "there must be a better way," you are correct. The problem is we continually searching in the same set of directions

  16. Conjugate Direction Methods • Steepest descent often finds itself taking steps along the same direction as that of its earlier steps • An improvement over this is to take the exact number of steps using a set of search directions and obtain the solution after n such steps; this is the basic idea in the conjugate direction methods • Image compares steepestdescent with a conjugate directions approach Image Source: http://en.wikipedia.org/wiki/File:Conjugate_gradient_illustration.svg

  17. Conjugate Direction Methods • Assume Ax= bwith Asymmetric, positive definite • We denote the n search directions by • At the ith iteration, we construct to obtain the value of the new iterate

  18. Conjugate Direction Methods • Proposition: If A is positive definite, and the set of nonzero vectors d(0), d(1)… d(n-1) are A-orthogonal, then these vectors are l.i. • Proof: Suppose there are constants ai, i=0,1,2,…n such Multiplying by A and then scalar product with d(i) givesSince A is positive definite, it follows ai= 0Hence the vectors are l.i. Recall l.i. only if all a's = 0

  19. Conjugate Direction Methods • The proposition that the vectors are l.i. implies • Multiplying

  20. Residual and Error • In solving these problems it is helpful to keep in mind the definition of the residual and the error:r(i) = b - Ax(i)e(i)= x(i) – x* • The residual is easy to compute. However, during the iteration we cannot compute the error since we do not know x* • Both go to zero as we converge

  21. Conjugate Directions Method • The value of (i) is chosen so that e(i+1) is A-orthogonal to d(i); in this way we need not take another step along the directions : or,

  22. Conjugate Directions Method • To show that this procedure computes x* inn steps, we express e(0) in terms of the • Since the d(j) are A–orthogonal for 0  k  n-1

  23. Conjugate Directions Method so that • The previous equation may also be rewritten as

  24. Conjugate Directions Method and

  25. Conjugate Directions Method • The function f(x(i) + (i)d(i)) attains its minimum at the value (i) at which • Therefore, at each iteration, the component of the residual along a direction d(i) disappears

  26. Conjugate Direction Methods • In fact, • But this scheme explicitly requires e(i), as value which as defined we do not know • Rather we evaluate it in terms of the known quantities

  27. Conjugate Direction Methods • But and so we can write • The conjugate directions algorithm consists of the following iterations 27

  28. Conjugate Directions Method What we have not yet covered is how to get the n A-search directions. We'll cover that shortly, but the next slide presentsan algorithm, followed by an example.

  29. Conjugate Gradient Algorithm • Set i=0, e > 0, x(0) = 0, so r(i) = b - Ax(0) = b • While  r(i)   eDo(a)If i = 0 Then d(i+1) = r(i) Else Begin d(i+1) = r(i) + b(i+1)d(i) End

  30. Conjugate Gradient Algorithm (b) (c) x(i+1) = x(i) + a(i+1) d(i+1) (d) r(i+1) = r(i) - a(i+1) Ad(i+1) (e) i := i + 1 End While Note that there is onlyone matrix vectormultiply periteration!

  31. Conjugate Gradient Example • Using the same system as before, let • Select i=0, x(0) = 0, e = 0.1, then r(0) = b • With i = 0, d(1) = r(0) = b

  32. Conjugate Gradient Example This first step exactly matches Steepest Descent

  33. Conjugate Gradient Example • With i=1 solve for b(i+1) • Then

  34. Conjugate Gradient Example • And

  35. Conjugate Gradient Example • With i=2 solve for b(i+1) • Then

  36. Conjugate Gradient Example • And Done in 3 = n iterations!

  37. Gram-Schmidt Orthogonalization • Trick is to quickly generate the A–orthogonal search directions • One can make use of the Gram-Schmidt orthogonalization procedure • Let {u0, u1, …, un-1} be a l.i.set of n vectors • We construct each successive direction, d(j), j=0, 1, 2, … n-1, by removing from ui all the components along directions • In the conjugate gradient method ui = r(i)

  38. Gram-Schmidt Orthogonalization • We set d(0) = u(0), and for i < j we set • We obtain the bik by using the A–orthogonality condition on the d(k)

  39. Gram-Schmidt Orthogonalization • Solving for bik we get • A direct application of this approach is not computationally practical since at each iteration all the known search directions are involved in the computation

  40. Gram-Schmidt Orthogonalization • Since in conjugate gradientthe residual is orthogonal to the previous search directions, and unless the residual is 0, a new vector is created and added to the l.i. set of search directions • This makes the Gram-Schmidt orthogonalization significantly simpler

  41. Gram-Schmidt Orthogonalization • The new residual is orthogonal to the previous search directions, which span the same subspace as the setwith the property that

  42. Gram-Schmidt Orthogonalization • But in the conjugate gradient approach r(i) is a linear combination of the previous residuals and Ad(i-1) • Therefore • Hence is simply the ithKrylov subspace • Since r(i+1) is orthogonal to the space and , r(i+1) is A-orthogonal to the previous search directions, but not to d(i)

  43. Gram-Schmidt Orthogonalization • Also sincethen • Which gives

  44. Gram-Schmidt Orthogonalization

  45. Gram-Schmidt Orthogonalization and • Therefore, the previous search directions and bij values do not need to be stored • Also we set

  46. Gram-Schmidt Orthogonalization and evaluate where we make use of

  47. Conjugate Gradient Convergence • We can show that • This compares for steepest descent • In the previous example with = 26.8, 47

  48. THE ARNOLDI PROCESS • One simple way around this problem is to remove from each new vector that is constructed those components that are in the directions already generated previously; in addition, we can also have the option to renormalize each vector in this process

  49. THE ARNOLDI PROCESS • At the step i, to construct we define with orthonormal vectors : • We increase by 1 the dimension of , by computing

  50. THE ARNOLDI PROCESS • We orthogonalizew.r.t.the vectors by setting where, the are chosen so that • We normalize and define

More Related