Download Presentation
Domain decomposition in parallel computing

Loading in 2 Seconds...

1 / 33

# Domain decomposition in parallel computing - PowerPoint PPT Presentation

Domain decomposition in parallel computing. COT 5410 – Spring 2004. Ashok Srinivasan www.cs.fsu.edu/~asriniva Florida State University. Outline. Background Geometric partitioning Graph partitioning Static Dynamic Important points. Background.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

## Domain decomposition in parallel computing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Domain decomposition in parallel computing

COT 5410 – Spring 2004

Ashok Srinivasan

www.cs.fsu.edu/~asriniva

Florida State University

Outline
• Background
• Geometric partitioning
• Graph partitioning
• Static
• Dynamic
• Important points
Background
• Tasks in a parallel computation need access to certain data
• Same datum may be needed by multiple tasks
• Example: In matrix-vector multiplication, b2 is needed for the computation of all ci2, 1 < i < n
• If a process does not “own” a datum needed by its task, then it has to get it from a process that has it
• This communication is expensive
• Aims of domain decomposition
• * Distribute the data in such a manner that the communication required is minimized
• * Ensure that the computational loads on processes are balanced
Domain decomposition example
• Finite difference computation
• New value of a node depends on old values of its neighbors
• We want to divide the nodes amongst the processes so that
• Communication is minimized
• Measure of partition quality
• Computational load is evenly balanced
Geometric partitioning
• Partition a set of points
• Uses only coordinate information
• Balances the load
• The heuristic tries to ensure that communication costs are low
• Algorithms are typically fast, but partition not of high quality
• Examples
• Orthogonal recursive bisection
• Inertial
• Space filling curves
Orthogonal recursive bisection
• Recursively bisect orthogonal to the longest dimension
• Assume communication is proportional to the surface area of the domain, and aligned with coordinate axes
• Recursive bisection
• Divide into two pieces, keeping load balanced
• Apply recursively, until desired number of partitions obtained
Inertial
• ORB may not be effective if cuts along the x, y, or z directions are not good ones
• Inertial
• Recursively bisect orthogonal to the inertial axis
Space filling curves
• * Space filling curves
• A continuous curve that fills the space
• Order the points based on their relative position on the curve
• Choose a curve that preserves proximity
• Points that are close in space should be close in the ordering too
• Example
• Hilbert curve

Hi

H1

H2

Hi+1

Hilbert curve = lim Hn

n

Hilbert curve
• Sources
• http://www.dcs.napier.ac.uk/~andrew/hilbert.html
• http://www.fractalus.com/kerry/tutorials/hilbert/hilbert-tutorial.html
Domain decomposition with a space filling curve
• Order points based on their position on the curve
• Divide into P parts
• P is the number of processes
• Space filling curves can be used in adaptive computations too
• They can be extended to higher dimensions too
Graph partitioning
• * Model as graph partitioning
• Graph G = (V, E)
• Each task is represented by a vertex
• A weight can be used to represent the computational effort
• An edge exists between tasks if one needs data owned by the other
• Weights can be associated with edges too
• Goal
• Partition vertices into P parts such that each partition has equal vertex weights
• Minimize the weights of edges cut
• Problem is NP hard
• Edge cut metric
• Judge the quality of the partitioning by the number of edges cut
Static graph partitioning
• Combinatorial
• Levelized nested dissection
• Kernighan-Lin/Feduccia-Matheyses
• Spectral partitioning
• Multi-level methods
Combinatorial partitioning
• Use only connectivity information
• Examples
• Levelized nested dissection
• Kernighan-Lin/Feduccia-Matheyses
Levelized nested dissection (LND)
• Idea is similar to the geometric methods
• But cannot use coordinate information
• Instead of projecting vertices along the longest axis, order them based on distance from a vertex that may be one extreme of the longest dimension of a graph
• Pseudo-peripheral vertex
• Perform a breadth-first search, starting from an arbitrary vertex
• The vertex that is encountered last might be a good approximation to a peripheral vertex
LND example Finding a pseudoperipheral vertex

3

2

3

2

1

3

1

2

Initial vertex

1

3

4

Pseudoperipheral vertex

LND example – Partitioning

5

6

3

4

5

2

5

4

2

3

1

Partition

Initial vertex

Recursively bisect the subgraphs

Kernighan-Lin/Fiduccia-Matheyses
• Refines an existing partition
• Kernighan-Lin
• Consider pairs of vertices from different partitions
• Choose a pair whose swapping will result in the best improvement in partition quality
• The best improvement may actually be a worsening
• Perform several passes
• Choose best partition among those encountered
• Fiduccia-Matheyses
• Similar but more efficient
• Boundary Kernighan-Lin
• Consider only boundary vertices to swap
• ... and many other variants
Kernighan-Lin example

Swap these

Better partition

Edge cut = 3

Existing partition

Edge cut = 4

Spectral method
• Based on the observation that a Fiedler vector of a graph contains connectivity information
• Laplacian of a graph: L
• lii = di (degree of vertex i)
• lij= -1 if edge {i,j} exists, otherwise 0
• Smallest eigenvalue of L is 0 with eigenvector all 1
• All other eigenvalues are positive for a connected graph
• Fiedler vector
• Eigenvector corresponding to the second smallest eigenvalue
Fiedler vector
• Consider a partitioning of V into A and B
• Let yi = 1 if vie A, and yi = -1 if vie B
• For load balance, Si yi = 0
• Also Seije E (yi-yj)2 = 4 x number of edges across partitions
• Also, yTLy = Si di yi2 – 2 Seije E yiyj

= Seije E (yi-yj)2

Optimization problem
• * The optimal partition is obtain by solving
• Minimize yTLy
• Constraints:
• yie {-1,1}
• Si yi = 0
• This is NP hard
• Relaxed problem
• Minimize yTLy
• Constraints:
• Si yi = 0
• Add a constraint on a norm of y, example, ||y||2 = n0.5
• Note
• (1, 1, ..., 1)T is an eigenvector with eigenvalue 0
• For a connected graph, all other eigenvalues are positive and orthogonal to this eigenvector, which implies Si yi = 0
• The objective function is minimized by a Fiedler vector
Spectral algorithm
• Find a Fiedler vector of the Laplacian of the graph
• Note that the Fiedler value (the second smallest eigenvalue) yields a lower bound on the communication cost, when the load is balanced
• From the Fiedler vector, bisect the graph
• Let all vertices with components in the Fiedler vector greater than the median be in one component, and the rest in the other
• Recursively apply this to each partition
• Note: Finding the Fiedler vector of a large graph can be time consuming
Multilevel methods
• Idea
• It takes time to partition a large graph
• So partition a small graph instead!
• * Three phases
• Graph coarsening
• Combine vertices to create a smaller graph
• Example: Find a suitable matching
• Apply this recursively until a suitably small graph is obtained
• Partitioning
• Use spectral or another partitioning algorithm to partition the small graph
• Multilevel refinement
• Uncoarsen the graph to get a partitioning of the original graph
• At each level, perform some graph refinement
Multilevel example(without refinement)

9

10

5

7

3

1

1

2

11

1

2

1

2

2

4

8

1

12

16

1

1

1

6

15

1

13

14

Multilevel example(without refinement)

9

10

5

7

3

1

1

2

11

1

2

1

2

2

4

8

1

12

16

1

1

6

15

1

13

14

Multilevel example(without refinement)

9

10

5

7

3

1

1

2

11

1

2

1

2

2

4

8

1

12

16

1

1

6

15

1

13

14

1

2

2

1

Dynamic partitioning
• We have an initial partitioning
• Now, the graph changes
• * Determine a good partition, fast
• * Also minimize the number of vertices that need to be moved
• Examples
• PLUM
• Jostle
• Diffusion
PLUM
• Partition based on the initial mesh
• Vertex and edge weights alone changed
• Map partitions to processors
• Use more partitions than processors
• Ensures finer granularity
• Compute a similarity matrix based on data already on a process
• Measures savings on data redistribution cost for each (process, partition) pair
• Choose assignment of partitions to processors
• Example: Maximum weight matching
• Duplicate each processor: # of partitions/P times
• Alternative: Greedy approximation algorithm
• Assign in order of maximum similarity value
• http://citeseer.nj.nec.com/oliker98plum.html
JOSTLE
• Use Hu and Blake’s scheme for load balancing
• Solve Lx = b using Conjugate Gradient
• L = Laplacian of processor graph, bi = Weight on process Pi – Average weight
• Move max(xi-xj, 0) weight between Pi and Pj
• Leads to balanced load
• Equivalent to Pi sending xi load to each neighbor j, and each neighbor Pj sending xj to Pi
• Net loss in load for Pi = dixi - Sneighborjxj = L(i)x = bi
• where L(i) is row i of L, and di is degree of i
• New load for Pi = weight on Pi - bi = average weight
• Leads to minimum L2 norm of load moved
• Using max(xi-xj, 0)
• Select vertices to move, based on relative gain
• http://citeseer.nj.nec.com/walshaw97parallel.html
Diffusion
• Involves only communication with neighbors
• A simple scheme
• Processor Pi repeatedly sends a wi weight to each neighbor
• wi = weight on Pi
• wk = (I – a L) wk-1 , wk = weight vector at iteration k
• Simple criteria exist for choosing a to ensure convergence
• Example: a = 0.5/(maxi di),
• More sophisticated schemes exist
Important points
• Goals of domain decomposition
• Balance the load
• Minimize communication
• Space filling curves
• Graph partitioning model
• Spectral method
• Relax NP hard integer optimization to floating point, and then discretize to get approximate integer solution
• Multilevel methods
• Three phases
• Dynamic partitioning – additional requirements
• Use old solution to find new one fast
• Minimize number of vertices moved