- By
**paul2** - Follow User

- 267 Views
- Uploaded on

Download Presentation
## Domain decomposition in parallel computing

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Domain decomposition in parallel computing

COT 5410 – Spring 2004

Ashok Srinivasan

www.cs.fsu.edu/~asriniva

Florida State University

Outline

- Background
- Geometric partitioning
- Graph partitioning
- Static
- Dynamic
- Important points

Background

- Tasks in a parallel computation need access to certain data
- Same datum may be needed by multiple tasks
- Example: In matrix-vector multiplication, b2 is needed for the computation of all ci2, 1 < i < n
- If a process does not “own” a datum needed by its task, then it has to get it from a process that has it
- This communication is expensive
- Aims of domain decomposition
- * Distribute the data in such a manner that the communication required is minimized
- * Ensure that the computational loads on processes are balanced

Domain decomposition example

- Finite difference computation
- New value of a node depends on old values of its neighbors

- We want to divide the nodes amongst the processes so that
- Communication is minimized
- Measure of partition quality
- Computational load is evenly balanced

Geometric partitioning

- Partition a set of points
- Uses only coordinate information
- Balances the load
- The heuristic tries to ensure that communication costs are low
- Algorithms are typically fast, but partition not of high quality
- Examples
- Orthogonal recursive bisection
- Inertial
- Space filling curves

Orthogonal recursive bisection

- Recursively bisect orthogonal to the longest dimension
- Assume communication is proportional to the surface area of the domain, and aligned with coordinate axes
- Recursive bisection
- Divide into two pieces, keeping load balanced
- Apply recursively, until desired number of partitions obtained

Inertial

- ORB may not be effective if cuts along the x, y, or z directions are not good ones
- Inertial
- Recursively bisect orthogonal to the inertial axis

Space filling curves

- * Space filling curves
- A continuous curve that fills the space
- Order the points based on their relative position on the curve
- Choose a curve that preserves proximity
- Points that are close in space should be close in the ordering too
- Example
- Hilbert curve

H1

H2

Hi+1

Hilbert curve = lim Hn

n

Hilbert curve- Sources
- http://www.dcs.napier.ac.uk/~andrew/hilbert.html
- http://www.fractalus.com/kerry/tutorials/hilbert/hilbert-tutorial.html

Domain decomposition with a space filling curve

- Order points based on their position on the curve
- Divide into P parts
- P is the number of processes
- Space filling curves can be used in adaptive computations too
- They can be extended to higher dimensions too

Graph partitioning

- * Model as graph partitioning
- Graph G = (V, E)
- Each task is represented by a vertex
- A weight can be used to represent the computational effort
- An edge exists between tasks if one needs data owned by the other
- Weights can be associated with edges too
- Goal
- Partition vertices into P parts such that each partition has equal vertex weights
- Minimize the weights of edges cut
- Problem is NP hard
- Edge cut metric
- Judge the quality of the partitioning by the number of edges cut

Static graph partitioning

- Combinatorial
- Levelized nested dissection
- Kernighan-Lin/Feduccia-Matheyses
- Spectral partitioning
- Multi-level methods

Combinatorial partitioning

- Use only connectivity information
- Examples
- Levelized nested dissection
- Kernighan-Lin/Feduccia-Matheyses

Levelized nested dissection (LND)

- Idea is similar to the geometric methods
- But cannot use coordinate information
- Instead of projecting vertices along the longest axis, order them based on distance from a vertex that may be one extreme of the longest dimension of a graph
- Pseudo-peripheral vertex
- Perform a breadth-first search, starting from an arbitrary vertex
- The vertex that is encountered last might be a good approximation to a peripheral vertex

Kernighan-Lin/Fiduccia-Matheyses

- Refines an existing partition
- Kernighan-Lin
- Consider pairs of vertices from different partitions
- Choose a pair whose swapping will result in the best improvement in partition quality
- The best improvement may actually be a worsening
- Perform several passes
- Choose best partition among those encountered
- Fiduccia-Matheyses
- Similar but more efficient
- Boundary Kernighan-Lin
- Consider only boundary vertices to swap
- ... and many other variants

Spectral method

- Based on the observation that a Fiedler vector of a graph contains connectivity information
- Laplacian of a graph: L
- lii = di (degree of vertex i)
- lij= -1 if edge {i,j} exists, otherwise 0
- Smallest eigenvalue of L is 0 with eigenvector all 1
- All other eigenvalues are positive for a connected graph
- Fiedler vector
- Eigenvector corresponding to the second smallest eigenvalue

Fiedler vector

- Consider a partitioning of V into A and B
- Let yi = 1 if vie A, and yi = -1 if vie B
- For load balance, Si yi = 0
- Also Seije E (yi-yj)2 = 4 x number of edges across partitions
- Also, yTLy = Si di yi2 – 2 Seije E yiyj

= Seije E (yi-yj)2

Optimization problem

- * The optimal partition is obtain by solving
- Minimize yTLy
- Constraints:
- yie {-1,1}
- Si yi = 0
- This is NP hard
- Relaxed problem
- Minimize yTLy
- Constraints:
- Si yi = 0
- Add a constraint on a norm of y, example, ||y||2 = n0.5
- Note
- (1, 1, ..., 1)T is an eigenvector with eigenvalue 0
- For a connected graph, all other eigenvalues are positive and orthogonal to this eigenvector, which implies Si yi = 0
- The objective function is minimized by a Fiedler vector

Spectral algorithm

- Find a Fiedler vector of the Laplacian of the graph
- Note that the Fiedler value (the second smallest eigenvalue) yields a lower bound on the communication cost, when the load is balanced
- From the Fiedler vector, bisect the graph
- Let all vertices with components in the Fiedler vector greater than the median be in one component, and the rest in the other
- Recursively apply this to each partition
- Note: Finding the Fiedler vector of a large graph can be time consuming

Multilevel methods

- Idea
- It takes time to partition a large graph
- So partition a small graph instead!
- * Three phases
- Graph coarsening
- Combine vertices to create a smaller graph
- Example: Find a suitable matching
- Apply this recursively until a suitably small graph is obtained
- Partitioning
- Use spectral or another partitioning algorithm to partition the small graph
- Multilevel refinement
- Uncoarsen the graph to get a partitioning of the original graph
- At each level, perform some graph refinement

Dynamic partitioning

- We have an initial partitioning
- Now, the graph changes
- * Determine a good partition, fast
- * Also minimize the number of vertices that need to be moved
- Examples
- PLUM
- Jostle
- Diffusion

PLUM

- Partition based on the initial mesh
- Vertex and edge weights alone changed
- Map partitions to processors
- Use more partitions than processors
- Ensures finer granularity
- Compute a similarity matrix based on data already on a process
- Measures savings on data redistribution cost for each (process, partition) pair
- Choose assignment of partitions to processors
- Example: Maximum weight matching
- Duplicate each processor: # of partitions/P times
- Alternative: Greedy approximation algorithm
- Assign in order of maximum similarity value
- http://citeseer.nj.nec.com/oliker98plum.html

JOSTLE

- Use Hu and Blake’s scheme for load balancing
- Solve Lx = b using Conjugate Gradient
- L = Laplacian of processor graph, bi = Weight on process Pi – Average weight
- Move max(xi-xj, 0) weight between Pi and Pj
- Leads to balanced load
- Equivalent to Pi sending xi load to each neighbor j, and each neighbor Pj sending xj to Pi
- Net loss in load for Pi = dixi - Sneighborjxj = L(i)x = bi
- where L(i) is row i of L, and di is degree of i
- New load for Pi = weight on Pi - bi = average weight
- Leads to minimum L2 norm of load moved
- Using max(xi-xj, 0)
- Select vertices to move, based on relative gain
- http://citeseer.nj.nec.com/walshaw97parallel.html

Diffusion

- Involves only communication with neighbors
- A simple scheme
- Processor Pi repeatedly sends a wi weight to each neighbor
- wi = weight on Pi
- wk = (I – a L) wk-1 , wk = weight vector at iteration k
- Simple criteria exist for choosing a to ensure convergence
- Example: a = 0.5/(maxi di),
- More sophisticated schemes exist

Important points

- Goals of domain decomposition
- Balance the load
- Minimize communication
- Space filling curves
- Graph partitioning model
- Spectral method
- Relax NP hard integer optimization to floating point, and then discretize to get approximate integer solution
- Multilevel methods
- Three phases
- Dynamic partitioning – additional requirements
- Use old solution to find new one fast
- Minimize number of vertices moved

Download Presentation

Connecting to Server..