Solving Large Sparse Linear Systems

Solving Large Sparse Linear Systems Ruth Shaw Dept. of CS & Appl. Stats. UNB Saint John

Outline • Solving Linear Systems • Iterative Methods • Jacobi Method • PDE Example • Designing the Parallel Algorithm

Solving Linear Systems Ax = b

Iterative Methods • Jacobi • Gauss-Seidel • successive over-relaxation (SOR) • conjugate gradient method • biconjugate gradient method • generalized minimal residual method

Jacobi Method Convert Ax = b into the form (L + D + U)x = b Dx = (-L - U)x + b x = D-1(-L - U)x + D-1b x = Cx + d Generate a sequence of approximations x (k+1) = Cx (k) + d

Jacobi Method Consider a small 3 x 3 system of equations: 2x1 – x2 + x3 = -1 x1 + 2x2 – x3 = 6 x1 – x2 + 2x3 = -3 This is converted to: x1 = [-1 – (x2 - x3)] / 2 x2 = [ 6 – (- x1 + x3)] / 2 x3 = [-3 – (x1 + x2)] / 2

Jacobi Method Successive iterates are generated as follows: x1 (k+1) = [-1 – (x2 (k)- x3 (k))] / 2 x2(k+1) = [ 6 – (- x1 (k) + x3 (k))] / 2 x3(k+1) = [-3 – (x1 (k) + x2 (k))] / 2 The initial guess (k = 0) to the solution vector is usually taken to be zero

Jacobi Method Approximation to solution after 13 iterations: x1 = 1.0002, x2 = 2.0001, x3 = -0.9997 Exact Solution: x1 = 1, x2 = 2, x3 = -1 Max Error: 3.0 x 10-4

PDE Example • Finite-Difference Solution of a PDE • typically involves solving a linear system with a large banded coefficient matrix • based on subdividing the domain of the problem with a mesh of discrete points … derivatives are replaced with difference quotients

PDE Example The steady-state temperature distribution in a plate, 0 ≤ x≤ 1, 0 ≤ y ≤ 1, can be described by the Poisson equation: uxx + uyy = f(x, y) If there are no heat sources (f(x, y) = 0) the equation is know as Laplace’s equation.

PDE Example Consider the two-dimension Laplace equation uxx + uyy = 0, 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 with boundary conditions u(0, y) = y2, u(1, y) = 1, 0 < y < 1 u(x, 0) = x2, u(x, 1) = 1, 0 < x < 1

PDE Example We define a mesh in the (x, y) plane: xi = ih, i = 0, 1, …, n, h = ∆x = 1/n yi = jk, j = 0, 1, …, m, k = ∆y = 1/m Denote the value of u(x, y) at the point (xi, yi) as uij

PDE Example Replacing the second derivatives in the Laplace equation uxx + uyy = 0 by centered differences gives a system of algebraic equations for the function values at the mesh points (1/k2)(ui,j+1 – 2uij + ui,j-1) + (1/h2)(ui+1,j – 2uij + ui-1,j) = 0

PDE Example Setting ∆x = ∆ y gives (ui,j+1 – 2uij + ui,j-1) + (ui+1,j – 2uij + ui-1,j) = 0 Which results in ui,j+1 – 4uij + ui,j-1 + ui+1,j + ui-1,j = 0

PDE Example Setting n =3 gives the following system of equations: u12 – 4u11 + u10 + u21 + u01 = 0 u13 – 4u12 + u11 + u22 + u02 = 0 : u10 u01

PDE Example The values outside the mesh represent the boundaries of the plate

PDE Example Denote the boundary values by bk , k = 1,…,4n and the unknown mesh values by ui , i = 1,…,n*n

PDE Example Moving the boundary values to the right of each equation gives us the following linear system: 4u1 – u2 – u4 = b1 + b12 - u1 + 4u2 – u3 – u5 = b2 – u2 + 4u3 – u6 = b3 + b4 - u1 + 4u4 – u5 – u7 = b11 – u2 – u4 + 4u5 – u6 – u8 = 0 - u3 – u5 + 4u6 – u9 = b5 - u4 + 4u7 – u8 = b9 + b10 - u5 – u7 + 4u8 – u9 = b8 - u6 – u8 4u9 = b6 + b7

PDE Example Assigning values to the boundaries then gives us the following graph

PDE Example Which now represents the following system: 4u1 – u2 – u4 = 0.3 - u1 + 4u2 – u3 – u5 = 0.25 – u2 + 4u3 – u6 = 1.9 - u1 + 4u4 – u5 – u7 = 0.5 – u2 – u4 + 4u5 – u6 – u8 = 0 - u3 – u5 + 4u6 – u9 = 1.0 - u4 + 4u7 – u8 = 1.8 - u5 – u7 + 4u8 – u9 = 1.0 - u6 – u8 + 4u9 = 2.0

PDE Example For the Jacobi method, this system is converted to the following set of equations: u1 = [0.3 – (u2 + u4)] / 4 u2 = [0.25 – (u1 + u3 + u5)] / 4 : u9 = [2.0 – (u6 + u8)] / 4

PDE Example The Jacobi iteration scheme is then given by: u1(k+1) = [0.3 – (u2(k) + u4(k) )] / 4 u2(k+1) = [0.25 – (u1(k) + u3(k) + u5(k))] / 4 : u9(k+1) = [2.0 – (u6(k) + u8(k))] / 4

PDE Example After 38 iterations Max error 8.9 x 10-7

Designing the Parallel Algorithm There are essentially two approaches to designing a parallel program: data-parallel approach domain decomposition … divide the data into pieces control-parallel approach functional decomposition … divide the computation into pieces

Designing the Parallel Algorithm After deciding on the decomposition of the algorithm, the communication pattern needs to be determined => minimize processor communication

Parallel Jacobi Recall our PDE example for n = 3: u1(k+1) = [0.3 – (u2(k) + u4(k) )] / 4 u2(k+1) = [0.25 – (u1(k) + u3(k) + u5(k))] / 4 u3(k+1) = [1.9 – (u2(k) + u6(k))] / 4 u4(k+1) = [0.5 – (u1(k) + u5(k) + u7(k))] / 4 u5(k+1) = [0 – (u2(k) + u4(k) + u6(k) + u8(k))] / 4 u6(k+1) = [1.0 – (u3(k) + u5(k) + u9(k))] / 4 u7(k+1) = [1.8 – (u4(k) + u8(k))] / 4 u8(k+1) = [1.0 – (u5(k) + u7(k) + u9(k))] / 4 u9(k+1) = [2.0 – (u6(k) + u8(k))] / 4

Parallel Jacobi Assume p processes where n ≥ p … An obvious approach is to have each process calculate the values in a subvector of the solution vector => for simplicity, assume n/p is even

Parallel Jacobi Assume p = 3: P0: u1(k+1) = [0.3 – (u2(k) + u4(k) )] / 4 u2(k+1) = [0.25 – (u1(k) + u3(k) + u5(k))] / 4 u3(k+1) = [1.9 – (u2(k) + u6(k))] / 4 P1: u4(k+1) = [0.5 – (u1(k) + u5(k) + u7(k))] / 4 u5(k+1) = [0 – (u2(k) + u4(k) + u6(k) + u8(k))] / 4 u6(k+1) = [1.0 – (u3(k) + u5(k) + u9(k))] / 4 P2: u7(k+1) = [1.8 – (u4(k) + u8(k))] / 4 u8(k+1) = [1.0 – (u5(k) + u7(k) + u9(k))] / 4 u9(k+1) = [2.0 – (u6(k) + u8(k))] / 4

Parallel Jacobi Partition coefficient matrix A using block-row chunkA = (n*n) / p call MPI_SCATTER(A, chunkA, MPI_REAL, A_local, chunkA, MPI_REAL, 0, PI_COMM_WORLD, ierr) 4 -1 0 -1 0 0 0 0 0 -1 4 -1 0 -1 0 0 0 0 0 -1 4 0 0 -1 0 0 0 -1 0 0 4 -1 0 -1 0 0 0 -1 0 -1 4 -1 0 -1 0 0 0 -1 0 -1 4 0 0 -1 0 0 0 -1 0 0 4 -1 0 0 0 0 0 -1 0 -1 4 -1 0 0 0 0 0 -1 0 -1 4

Parallel Jacobi Scatter the RHS vector b chunkb = n/p call MPI_SCATTER(b, chunkb, MPI_REAL, b_local, chunkb, MPI_REAL, 0, MPI_COMM_WORLD, ierr) 0.3 0.25 1.9 0.5 0 1.0 1.8 1.0 2.0

Parallel Jacobi Each processor calculates n/p elements of the solution vector P0: x_local(1), x_local(2), x_local(3) P1: x_local(4), x_local(5), x_local(6) P2: x_local(7), x_local(8), x_local(9)

Parallel Jacobi After each iteration use MPI_Allgather to give all processes a copy of the updated solution vector call MPI_ALLGATHER(x_local, chunkb, MPI_REAL, x_new, chunkb, MPI_REAL, MPI_COMM_WORLD, ierr) P0 P2 P1

Parallel Jacobi Determine if the norm of the solution vector has reached the set tolerance option 1: all processes compute global norm option 2: each process computes local norm and then use MPI_Allreduce to give all processes a copy of the global norm

Parallel Jacobi Speedup = serial time / parallel time

Parallel Jacobi

References Iterative Methods for Solving Linear Systems http://www.quantlet.com/mdstat/scripts/csa/html/node38.html Applied Numerical Analysis Using MATLAB, 2nd Ed., by Laurene Fausett, Pearson Prentic-Hall, 2008 Parallel programming in C with MPI and OpenMP by Michael Quinn, McGraw-Hill, 2004 Parallel Programming with MPI by Peter Pacheco, Morgan Kaufmann, 1997

Solving Large Sparse Linear Systems

Solving Large Sparse Linear Systems

Presentation Transcript

Solving Systems of Linear Equations

Solution of Sparse Linear Systems

Solving Linear Systems

3-2: Solving Linear Systems

Solving Linear Systems

3.1 – Graphically Solving Linear Systems

Solving three-variable linear systems

Solving Linear Systems Algebraically

A Novel approach to solving large-scale linear systems

A 3-Query PCP over integers a.k.a Solving Sparse Linear Systems

Solution of Sparse Linear Systems

Linear Systems and Problem Solving

Solving Linear Systems: Iterative Methods and Sparse Systems

Solving Linear Systems

Solving Linear Systems by Linear Combinations

Solving Linear Systems by Linear Combinations

Solving systems of linear equations

Solving Linear Systems Algebraically

Solving Linear Systems

Solving Linear Systems

Solving Linear Systems by Linear Combinations