domain decomposition in parallel computing l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Domain decomposition in parallel computing PowerPoint Presentation
Download Presentation
Domain decomposition in parallel computing

Loading in 2 Seconds...

play fullscreen
1 / 33

Domain decomposition in parallel computing - PowerPoint PPT Presentation


  • 248 Views
  • Uploaded on

Domain decomposition in parallel computing. COT 5410 – Spring 2004. Ashok Srinivasan www.cs.fsu.edu/~asriniva Florida State University. Outline. Background Geometric partitioning Graph partitioning Static Dynamic Important points. Background.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Domain decomposition in parallel computing' - paul2


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
domain decomposition in parallel computing

Domain decomposition in parallel computing

COT 5410 – Spring 2004

Ashok Srinivasan

www.cs.fsu.edu/~asriniva

Florida State University

outline
Outline
  • Background
  • Geometric partitioning
  • Graph partitioning
    • Static
    • Dynamic
  • Important points
background
Background
  • Tasks in a parallel computation need access to certain data
  • Same datum may be needed by multiple tasks
    • Example: In matrix-vector multiplication, b2 is needed for the computation of all ci2, 1 < i < n
    • If a process does not “own” a datum needed by its task, then it has to get it from a process that has it
      • This communication is expensive
    • Aims of domain decomposition
      • * Distribute the data in such a manner that the communication required is minimized
      • * Ensure that the computational loads on processes are balanced
domain decomposition example
Domain decomposition example
  • Finite difference computation
    • New value of a node depends on old values of its neighbors
  • We want to divide the nodes amongst the processes so that
    • Communication is minimized
      • Measure of partition quality
    • Computational load is evenly balanced
geometric partitioning
Geometric partitioning
  • Partition a set of points
    • Uses only coordinate information
  • Balances the load
    • The heuristic tries to ensure that communication costs are low
  • Algorithms are typically fast, but partition not of high quality
  • Examples
    • Orthogonal recursive bisection
    • Inertial
    • Space filling curves
orthogonal recursive bisection
Orthogonal recursive bisection
  • Recursively bisect orthogonal to the longest dimension
    • Assume communication is proportional to the surface area of the domain, and aligned with coordinate axes
    • Recursive bisection
      • Divide into two pieces, keeping load balanced
      • Apply recursively, until desired number of partitions obtained
inertial
Inertial
  • ORB may not be effective if cuts along the x, y, or z directions are not good ones
  • Inertial
    • Recursively bisect orthogonal to the inertial axis
space filling curves
Space filling curves
  • * Space filling curves
    • A continuous curve that fills the space
    • Order the points based on their relative position on the curve
    • Choose a curve that preserves proximity
      • Points that are close in space should be close in the ordering too
      • Example
        • Hilbert curve
hilbert curve

Hi

H1

H2

Hi+1

Hilbert curve = lim Hn

n

Hilbert curve
  • Sources
    • http://www.dcs.napier.ac.uk/~andrew/hilbert.html
    • http://www.fractalus.com/kerry/tutorials/hilbert/hilbert-tutorial.html
domain decomposition with a space filling curve
Domain decomposition with a space filling curve
  • Order points based on their position on the curve
  • Divide into P parts
    • P is the number of processes
  • Space filling curves can be used in adaptive computations too
  • They can be extended to higher dimensions too
graph partitioning
Graph partitioning
  • * Model as graph partitioning
    • Graph G = (V, E)
    • Each task is represented by a vertex
      • A weight can be used to represent the computational effort
    • An edge exists between tasks if one needs data owned by the other
      • Weights can be associated with edges too
    • Goal
      • Partition vertices into P parts such that each partition has equal vertex weights
      • Minimize the weights of edges cut
      • Problem is NP hard
    • Edge cut metric
      • Judge the quality of the partitioning by the number of edges cut
static graph partitioning
Static graph partitioning
  • Combinatorial
    • Levelized nested dissection
    • Kernighan-Lin/Feduccia-Matheyses
  • Spectral partitioning
  • Multi-level methods
combinatorial partitioning
Combinatorial partitioning
  • Use only connectivity information
  • Examples
    • Levelized nested dissection
    • Kernighan-Lin/Feduccia-Matheyses
levelized nested dissection lnd
Levelized nested dissection (LND)
  • Idea is similar to the geometric methods
    • But cannot use coordinate information
    • Instead of projecting vertices along the longest axis, order them based on distance from a vertex that may be one extreme of the longest dimension of a graph
      • Pseudo-peripheral vertex
        • Perform a breadth-first search, starting from an arbitrary vertex
        • The vertex that is encountered last might be a good approximation to a peripheral vertex
lnd example finding a pseudoperipheral vertex
LND example Finding a pseudoperipheral vertex

3

2

3

2

1

3

1

2

Initial vertex

1

3

4

Pseudoperipheral vertex

lnd example partitioning
LND example – Partitioning

5

6

3

4

5

2

5

4

2

3

1

Partition

Initial vertex

Recursively bisect the subgraphs

kernighan lin fiduccia matheyses
Kernighan-Lin/Fiduccia-Matheyses
  • Refines an existing partition
  • Kernighan-Lin
    • Consider pairs of vertices from different partitions
    • Choose a pair whose swapping will result in the best improvement in partition quality
      • The best improvement may actually be a worsening
    • Perform several passes
      • Choose best partition among those encountered
  • Fiduccia-Matheyses
    • Similar but more efficient
  • Boundary Kernighan-Lin
    • Consider only boundary vertices to swap
  • ... and many other variants
kernighan lin example
Kernighan-Lin example

Swap these

Better partition

Edge cut = 3

Existing partition

Edge cut = 4

spectral method
Spectral method
  • Based on the observation that a Fiedler vector of a graph contains connectivity information
  • Laplacian of a graph: L
    • lii = di (degree of vertex i)
    • lij= -1 if edge {i,j} exists, otherwise 0
      • Smallest eigenvalue of L is 0 with eigenvector all 1
      • All other eigenvalues are positive for a connected graph
  • Fiedler vector
    • Eigenvector corresponding to the second smallest eigenvalue
fiedler vector
Fiedler vector
  • Consider a partitioning of V into A and B
    • Let yi = 1 if vie A, and yi = -1 if vie B
    • For load balance, Si yi = 0
    • Also Seije E (yi-yj)2 = 4 x number of edges across partitions
    • Also, yTLy = Si di yi2 – 2 Seije E yiyj

= Seije E (yi-yj)2

optimization problem
Optimization problem
  • * The optimal partition is obtain by solving
    • Minimize yTLy
    • Constraints:
      • yie {-1,1}
      • Si yi = 0
    • This is NP hard
  • Relaxed problem
    • Minimize yTLy
    • Constraints:
      • Si yi = 0
      • Add a constraint on a norm of y, example, ||y||2 = n0.5
    • Note
      • (1, 1, ..., 1)T is an eigenvector with eigenvalue 0
      • For a connected graph, all other eigenvalues are positive and orthogonal to this eigenvector, which implies Si yi = 0
      • The objective function is minimized by a Fiedler vector
spectral algorithm
Spectral algorithm
  • Find a Fiedler vector of the Laplacian of the graph
    • Note that the Fiedler value (the second smallest eigenvalue) yields a lower bound on the communication cost, when the load is balanced
  • From the Fiedler vector, bisect the graph
    • Let all vertices with components in the Fiedler vector greater than the median be in one component, and the rest in the other
  • Recursively apply this to each partition
  • Note: Finding the Fiedler vector of a large graph can be time consuming
multilevel methods
Multilevel methods
  • Idea
    • It takes time to partition a large graph
    • So partition a small graph instead!
  • * Three phases
    • Graph coarsening
      • Combine vertices to create a smaller graph
        • Example: Find a suitable matching
      • Apply this recursively until a suitably small graph is obtained
    • Partitioning
      • Use spectral or another partitioning algorithm to partition the small graph
    • Multilevel refinement
      • Uncoarsen the graph to get a partitioning of the original graph
      • At each level, perform some graph refinement
multilevel example without refinement26
Multilevel example(without refinement)

9

10

5

7

3

1

1

2

11

1

2

1

2

2

4

8

1

12

16

1

1

1

6

15

1

13

14

multilevel example without refinement27
Multilevel example(without refinement)

9

10

5

7

3

1

1

2

11

1

2

1

2

2

4

8

1

12

16

1

1

6

15

1

13

14

multilevel example without refinement28
Multilevel example(without refinement)

9

10

5

7

3

1

1

2

11

1

2

1

2

2

4

8

1

12

16

1

1

6

15

1

13

14

1

2

2

1

dynamic partitioning
Dynamic partitioning
  • We have an initial partitioning
    • Now, the graph changes
    • * Determine a good partition, fast
    • * Also minimize the number of vertices that need to be moved
  • Examples
    • PLUM
    • Jostle
    • Diffusion
slide30
PLUM
  • Partition based on the initial mesh
    • Vertex and edge weights alone changed
  • Map partitions to processors
    • Use more partitions than processors
      • Ensures finer granularity
    • Compute a similarity matrix based on data already on a process
      • Measures savings on data redistribution cost for each (process, partition) pair
      • Choose assignment of partitions to processors
        • Example: Maximum weight matching
          • Duplicate each processor: # of partitions/P times
        • Alternative: Greedy approximation algorithm
          • Assign in order of maximum similarity value
  • http://citeseer.nj.nec.com/oliker98plum.html
jostle
JOSTLE
  • Use Hu and Blake’s scheme for load balancing
    • Solve Lx = b using Conjugate Gradient
      • L = Laplacian of processor graph, bi = Weight on process Pi – Average weight
    • Move max(xi-xj, 0) weight between Pi and Pj
      • Leads to balanced load
        • Equivalent to Pi sending xi load to each neighbor j, and each neighbor Pj sending xj to Pi
        • Net loss in load for Pi = dixi - Sneighborjxj = L(i)x = bi
          • where L(i) is row i of L, and di is degree of i
        • New load for Pi = weight on Pi - bi = average weight
      • Leads to minimum L2 norm of load moved
        • Using max(xi-xj, 0)
  • Select vertices to move, based on relative gain
    • http://citeseer.nj.nec.com/walshaw97parallel.html
diffusion
Diffusion
  • Involves only communication with neighbors
  • A simple scheme
    • Processor Pi repeatedly sends a wi weight to each neighbor
      • wi = weight on Pi
      • wk = (I – a L) wk-1 , wk = weight vector at iteration k
        • Simple criteria exist for choosing a to ensure convergence
          • Example: a = 0.5/(maxi di),
  • More sophisticated schemes exist
important points
Important points
  • Goals of domain decomposition
    • Balance the load
    • Minimize communication
  • Space filling curves
  • Graph partitioning model
    • Spectral method
      • Relax NP hard integer optimization to floating point, and then discretize to get approximate integer solution
    • Multilevel methods
      • Three phases
  • Dynamic partitioning – additional requirements
    • Use old solution to find new one fast
    • Minimize number of vertices moved