1 / 15

Application Paradigms: Unstructured Grids CS433 Spring 2001

Application Paradigms: Unstructured Grids CS433 Spring 2001. Laxmikant Kale. Unstructured Grids. Typically arise in finite element method: E.g. Space is tiled with variable-size-and-shape triangles in 3D: may be tetrahedra, or hexahedra

fleur
Download Presentation

Application Paradigms: Unstructured Grids CS433 Spring 2001

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application Paradigms:Unstructured GridsCS433Spring 2001 Laxmikant Kale

  2. Unstructured Grids • Typically arise in finite element method: • E.g. Space is tiled with variable-size-and-shape triangles • in 3D: may be tetrahedra, or hexahedra • Allows one to adjust the resolution in different regions • The base data structure is a graph • Often, represented as bipartite graph: • E.g. Triangles (Elements) and Nodes

  3. Unstructured grid computations • Typically • Attributes (stresses, strains, pressure, temperature, velocities) are attached to nodes and elements • Programs loop over elements and loop over nodes, separately • Each time you “visit” an element: • Need to access, and possibly modify, all nodes connected to it. • Each time you visit a node: • Typically, access and modify only node attributes • Rarely: access/modify attributes of elements connected to it

  4. Unstructured grids: parallelization issues • Two concerns: • The unstructured grid graph must be partitioned across processors • vproc (virtual processor, in general) • Boundary values must be shared • What to partition and what to duplicate (at the boundaries) • Partition elements (so each element belongs to exactly one vproc) • Share nodes at the boundary • Each node potentially has several ghost copies • Why is this better than partitioning nodes, and sharing elements?

  5. Partitioning unstructured grids • Not so simple as structured grids • “by rows”, “by columns”, “rectangular”, .. Don’t work • Geometric? • Applicable only if each node has coordinates • Even when applicable, may not lead to good performance • What performance metrics to use? • Load balance: the number of elements in each partition • Communication • Number of shared nodes (Total) • Maximum number of shared nodes for any one partition • Maximum number of “neighbor partitions” for any partition • Why? per message cost • Geometric: difficult to optimize both

  6. MP issues: • Charm++ help: • Today (Wed, 2/21) 2pm to 5:30 pm, • 2504, 2506, 2508 DCL (Parallel Programming Laboratory) • My office hours for this week: • Thursday 10:00 A.M. to 12:00 noon on Thursday

  7. Grid partitioning • When communication costs are relatively low • Either because the data-set is large or the computation per element is large • Geometric methods can be used: • Orthogonal Recursive Bisection (ORB) • Basic idea: Recursively divide sets into two • Keep shapes squarish as long as possible • For each set: • Find bounding box (Xmax, Xmin, Ymax, Ymin, ..) • Find the longer dimension (X or Y or ..) • Find a cut along the longer dimension that will divide the set equally • Doesn’t have to be at the midpoint of the section • Partition the element in the two sets based on the cut • Repeat for each set • Variation: non-power-of-two processors

  8. Grid partitioning: quad/oct trees • Another Geometric technique: • At each step, divide the set into 2xD subsets, where D is the number of physical dimensions\ • In 2-D: 4 quadrants • Dividing line goes thru geometric midpoint of the box. • Bounding box is NOT recalculated each time in the recursion • Comparison with ORB

  9. Grid partitioning: Graph partitioners • CHACO and METIS are well-known programs • Optimize both load imbalance and communication overhead • But often ignore per-message cost, or the maximum-per-partition costs • Earlier algorithm: KR (Kernigham-Ritchie) • METIS first coarsens the graph, applies KR to it, and then refines the graph • Doing this not just once, but a k-level coarsening-refining

  10. Crack Propagation • Explicit FEM code • Zero-volume Cohesive Elements inserted near the crack • As the crack propagates, more cohesive elements added near the crack, which leads to severe load imbalance • Framework handles • Partitioning elements into chunks • Communication between chunks • Load Balancing Decomposition into 16 chunks (left) and 128 chunks, 8 for each PE (right). The middle area contains cohesive elements. Pictures: S. Breitenfeld, and P. Geubelle

  11. Crack Propagation Decomposition into 16 chunks (left) and 128 chunks, 8 for each PE (right). The middle area contains cohesive elements. Both decompositions obtained using Metis. Pictures: S. Breitenfeld, and P. Geubelle

  12. Unstructured grid: managing communication • Suppose triangles A B and C are on different processors • Node 1 is shared between all 3 processors • Must have a copy on all 3 processors • When values need to be added up: • Option 1 (star): let A (say) be the “owner” of 1, • B and C send their copy of “1” to A, • A combines them (usually, just adding them up) • A sends updated values to B and C • Option 2: (symmetric): each sends its copy of 1 to both the others • Which one is better? 1 C B A

  13. Unstructured grid: managing communication • In either scheme: • Each vproc maintains a list of neighboring vprocs • For each neighbor: • maintains a list of shared nodes • Each node has a local index (my 5th node). • The same list works in both directions • Send • Receive

  14. Adaptive variations: Structured grids: • Suppose you need a different level of refinement at different places in the grid: • Adaptive Mesh Refinement • Quad and Oct trees can be used • Neighboring regions may have resolutions that differ by 1 level • Requiring (possibly complex) interpolation algorithms • The fact that you have to do the refinement in the middle of a parallel computation makes a difference • Again and again, but often not every step • Adjust your communication list • Alternatively, put a layer of software in the middle to do the interpolations • so each square chunk thinks it has exactly one nbr on each side

  15. Adaptive variations: unstructured grids • Mesh may be refined in places, dynamically: • This is much harder to do (even sequentially) than for structured grids • Think about triangles: • Quality restriction: avoid skinny long triangles • From parallel computing point of view: • Need to change the list of shared nodes • Load balance may shift • Load balancing: • Abandon partitioning and repartition • Incrementally adjust • (typically with virtualization)

More Related