Parallel solution of high-frequency Helmholtz equation using high-order finite difference schemes

Parallel solution of high-frequency Helmholtz equation using high-order finite difference schemes Dan Gordon Computer Science University of Haifa Rachel Gordon Aerospace Eng. Technion

Outline • Background on Helmholtz equation • The CARP-CG parallel algorithm • Comparative results using low- and high-order finite difference schemes

The Helmholtz Equation Eqn: -Δu - k2u = g (k = "wave no.") c = speed of sound, f = frequency Wave length: l = c/f = 2p/k No. of grid pts per l: Ng = l/h, h=mesh size Shifted Laplacian approach: Bayliss, Goldstein & Turkel, 1983 Erlangga, Vuik & Oosterlee, 2004/06 introduced imaginary shift: -Δu – (1 - i b) k2 u = f

The Helmholtz Equation Some other approaches: Elman, Ernst & O'Leary, 2001 Plessix & Mulder, 2003 Duff, Gratton, Pinel & Vasseeur, 2007 Bollhöfer, Grote & Schenk, 2009 Osei-Kuffuor & Saad, 2010 This work: hi-order schemes following Singer & Turkel, 2006 Erlangga & Turkel, 2011 (to appear)

Difficulties with Helmholtz • High frequencies  small diagonal • 2nd order schemes require many grid points/wavelength • "Pollution effect": high frequency requires more than fixed number of grid points/wavelength (Babuška & Sauter, 2000) •  high-order schemes required

CARP: block-parallel Kaczmarz Given: Ax=b "Normal equations": AATy=b, x=ATy Kaczmarz algorithm (1937) "KACZ" is SOR on normal equations Relaxation parameter of KACZ is the usual relax. par. of SOR Cyclic relax. par.: each eq. gets its own relax. par.

Parallel solution of the Helmholtz equation KACZ: Geometric Description initial point eq. 1 eq. 2 eq. 3

CARP: Component-Averaged Row Projections • A block-parallel version of KACZ • Equations divided into blocks (not necessarily disjoint) • Initial estimate: vector x=(x1,…,xn) • Suppose component x1 appears in 3 blocks • x1 is “cloned” as y1 , z1 , t1 in the different blocks. • Perform a KACZ iteration on each block (independently, in parallel)

CARP – Explanation (cont) • The internal iterations in each block produce 3 new values for the clones of x1: y1’ , z1’ , t1’ • The next iterative value of x1 is x1’ = (y1’ + z1’ + t1’)/3 • The next iterate is x’ = (x1’ , ... , xn’) • Repeat iterations as needed for convergence

CARP as Domain Decomposition clone of x1 y 1 x x 1 0 external grid point of A Note: domains may overlap domain A domain B

Overview of CARP domain A domain B averaging KACZ iterations KACZ iterations cloning KACZ in some superspace (with cyclic relaxation)

Convergence of CARP • Averaging Lemma: the component-averaging operations of CARP are equivalent to KACZ row-projections in a certain superspace (with w=1) •  CARP is equivalent to KACZ in the superspace, with cyclic relaxation parameters – known to converge

CARP Applications • Elliptic PDEs w/large convection term result in stiff linear systems (large off-diagonal elements) • CARP very robust on such systems, compared to leading solver & preconditioner combinations • Downside: Not always efficient • Electron tomography (ET) – joint work with J.-J. Fernández

CARP-CG: CG acceleration of CARP • CARP is KACZ in some superspace (with cyclic relaxation parameters) • Björck & Elfving (1979): developed CGMN, which is a (sequential)CG-acceleration of KACZ (double sweep, fixed relax. parameter) • We extended this result to allow cyclic relaxation parameters • Result: CARP-CG

CARP-CG: Properties • Same robustness as CARP • Very significant improvement in performance on stiff linear systems derived from elliptic PDEs • Very competitive runtime compared to leading solver/preconditioner combinations on systems derived from convection-dominated PDEs • Highly scalable on Helmholtz eqns

CARP-CG: Properties • On one processor, CARP-CG is identical to CGMN • Particularly useful on systems with LARGE off-diagonal elements • example: convection-dominated PDEs • Discontinuous coefficients are handled without requiring domain decomposition (DD)

Robustness of CARP-CG • KACZ inherently "normalizes" the eqns (eqn i is divided by ║Ai║2) • Normalization is generally useful for discontinuous coefficients • After normalization, the diagonal elements of AATare all1, and strictly greater than the off-diagonal elements • This is not diagonal dominance, but it makes the normal eqns manageable • Also: when diag ofA decreases, sum of off-diag ofAAT decreases.

Experiments with Hi-Order Relax. par. = 1.5 for all problems 2nd, 4th & 6th order central difference schemes, following Singer & Turkel, 2006 Erlangga & Turkel, 2011 Hi-order schemes  9-pt. stencil Complex eqns: separated real & imag., interleaved equations (following Day & Heroux, 2001)

Problem 1 (with analytic sol'n) • Based on Erlangga & Turkel, 2011 • Eqn: (Δ+k2)u = 0, on [-0.5,0.5][0,1] • bndry condition: Dirichlet on 3 sides: • u=0 for x=-0.5 and x=0.5 • u=cos(px) for y=0 • Sommerfeld: uy+iβu=0 for y=1,β2=k2-p2 • Analytic solution: u = cos(px)exp(-iβy) • Grid points per l: Ng = 9,12,15,18 • Approx. 186,000 – 742,000 complex variables • One processor • k = 300

Prob. 1: rel-res for 2nd, 4th, 6th order schemes

Prob. 1: rel-err for 2nd, 4th, 6th order schemes

Problem 2 (with analytic soln) Eqn: Δu + k2u = 0 Domain: [0,1][0,1] Analytic sol'n: u=sin(px)cos(βy), β2=k2-p2 Dirichlet bndry cond determined by u on the boundaries Grid points per l: Ng = 9 to 18 Approx. 186,000 – 742,000 real variables One processor k = 300 2nd, 4th, 6th order schemes

Prob. 2: rel-res for 2nd, 4th, 6th order schemes

Prob. 2: rel-err, 6th order, Ng=9–18

Problem 3 (no analytic soln) Eqn: Δu + k2u = 0 Domain: [0,1][0,1] Bndry cond on y=0: discontinuity at midpt.: u(0.5,0)=1, u(x,0) = 0 for x ≠ 0 other sides: 1st order absorbing Approx. 515,000 complex variables Grid points per l: Ng = 15 One processor k = 300 2nd, 4th, 6th order schemes

Problem 3: evaluating the error No analytic solution Run 6th order scheme to rel-res=10-13 Saved result as “true” solution Compared results of 2nd, 4th and 6th order schemes with the “true” solution

Parallel Performance, 1 to 16 Proc. No. iter for rel-res=10-7, 6th order, Ng=15, ~515,000 var.

Parallel Performance, 1 to 16 Proc. Problem 3: time (s), 6th order scheme, Ng=15, ~515,000 var. Times taken on a 12-node cluster, 2 quad proc. per node

Prob. 2 & 3: rel-res for 1 to 16 processors

Summary • Hi-freq Helmholtz require hi-order schemes • CARP-CG is applicable to hi-freq Helmholtz with hi-order schemes • Parallel and simple • General-purpose – for problems with large off-diagonal elements and discontinuous coefficients

Other Potential Applications • Hi-order schemes for Helmholtz in homog & heterog 3D domains • Maxwell equations • Other physics equations • Saddle-point problems • Circuit problems • Linear solver in some eigenvalue methods

Publications and Software http://cs.haifa.ac.il/~gordon/pub.html CARP: SIAM J Sci Comp 2005 CGMN: ACM Trans Math Software 2008 Microscopy: J Parallel & Distr Comp 2008 Large convection + discont coef: CMES 2009 CARP-CG: Parallel Comp 2010 Normalization for discont coef: J Comp & Appl Math 2010 CARP-CG software: http://cs.haifa.ac.il/~gordon/soft.html THANK YOU!

Parallel solution of high-frequency Helmholtz equation using high-order finite difference schemes

Parallel solution of high-frequency Helmholtz equation using high-order finite difference schemes

Presentation Transcript

High-Frequency Phrases

High Frequency Phrases Phrases 1 High Frequency “a”

Finite Difference Schemes

High Frequency Vocabulary

Parallel solution of the Helmholtz equation with large wave numbers

High-Frequency Words

High-Frequency Words

High Frequency Oscillatory

A Look at High-Order Finite-Volume Schemes for Simulating Atmospheric Flows

High Frequency Words

High Frequency Words

High Frequency Words

High Frequency Words

High-Frequency Phrases

High Frequency Words

Benefits of using high frequency massage machine