1 / 21

An Evaluation of Partitioners for Parallel SAMR Applications

An Evaluation of Partitioners for Parallel SAMR Applications. Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001 : European Conference on Parallel Computing. Introduction. AMR – Adaptive Mesh Refinement

colby
Download Presentation

An Evaluation of Partitioners for Parallel SAMR Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001 : European Conference on Parallel Computing

  2. Introduction • AMR – Adaptive Mesh Refinement • AMR used for solving PDEs for dynamic applications • Challenges involved: • Dynamic resource allocation • Dynamic data distribution and load balancing • Communication and co-ordination • Partitioning of adaptive grid hierarchy • Evaluation of dynamic domain-based partitioning strategies with an application-centric approach

  3. Motivation & Goal • Even for a single application, the most suitable partitioning technique depends on input parameters and its run-time state • Application-centric characterization of partitioners as a function of number of processors, problem size, and granularity • Enable the run-time selection of partitioners based on input parameters and application state

  4. Partitioning Adaptive Grid Hierarchies • Adaptive Mesh Refinement • Start with a base coarse grid with minimum acceptable resolution • Tag regions in the domain requiring additional resolution, cluster the tagged cells, and fit finer grids over these clusters • Proceed recursively so that regions on the finer grid requiring more resolution are similarly tagged and even finer grids are overlaid on these regions • Resulting grid structure is a dynamic adaptive grid hierarchy The Berger-Oliger Algorithm Recursive Procedure Integrate(level) If (RegridTime) Regrid Step t on all grids at level “level” If (level + 1 exists) Integrate (level + 1) Update(level, level + 1) End if End Recursion level = 0 Integrate(level)

  5. Time Step 0 Time Step 40 Time Step 80 Time Step 120 Time Step 160 Time Step 182 Level 1: Level 0: Level 3: Level 2: Level 4: Legend SAMR 2-D Grid Hierarchy

  6. Partitioning Techniques • Static or Dynamic techniques • Geometric or Non-geometric • Dynamic partitioning – global or local approaches • Partitioners for SAMR grid applications • Patch-based • Domain-based • Hybrid

  7. Partitioners Evaluated • SFC: Space Filling Curve based partitioning • G-MISP: Geometric Multi-level Inverse Space filling curve Partitioning • G-MISP+SP: Geometric Multi-level Inverse Space filling curve Partitioning with Sequence Partitioning • pBD-ISP: p-way Binary Dissection Inverse Space filling curve Partitioning • SP-ISP:“Pure” Sequence Partitioning with Inverse Space filling curve Partitioning • WD: Wavefront Diffusion based on global work load

  8. SFC • Recursive linear representation of multi-dimensional grid hierarchy using space-filling mappings (N-to-1D mapping) • Computational load determined by segment length and recursion level

  9. G-MISP & G-MISP+SP G-MISP • Multi-level algorithm views matrix of workloads from SAMR grid hierarchy as a one-vertex graph, refined recursively • Speed at expense of load balance G-MISP+SP • “Smarter” variant of G-MISP – uses sequence partitioning to assign consecutive portions of one-dimensional list to processors • Load balance improves but scheme is computationally more expensive

  10. pBD-ISP • Generalization of binary dissection – domain partitioned into p partitions • Each split divides load as evenly as possible, considering processors

  11. SP-ISP • Domain sub-divided into p*b equally sized blocks • Dual-level algorithm - parameter settings for each level • Fine granularity scheme: good load balance but increased overhead, communication and computational cost

  12. WD • Part of ParMetis suite based on global workload • Used for repartitioning graphs with scattered refinements • Results in fine grain partitionings with jagged boundaries and increased communication costs and overheads • Metis integration extremely expensive, dedicated SAMR partitioners performed much better • Two extra steps needed for Metis in our interface • Metis graph generated from grid before partitioning, clustering used to regenerate grid blocks from graph partitions after partitioning

  13. Experimental Setup • Application – RM3D • 3-D “real world” compressible turbulence application solving Richtmyer-Meshkov instability • Fingering instability which occurs at a material interface accelerated by a shock wave • Machine –NPACI IBM SP2 Blue Horizon at SDSC • Teraflop-scale Power3 based SMP cluster • 1152 processors and 512GB of main memory • AIX operating system • Peak bi-directional data transfer rate of approx. 115 MBps

  14. Experimental Setup (contd.) • Base coarse grid – 128 * 32 * 32 • 3 levels of factor 2 space-time refinements • Application ran for 150 coarse level time-steps • Experiments consisted of varying – • Partitioner (from the set of evaluated partitioners) • Number of processors (16 – 128) • Granularity, i.e. the atomic unit (2*2*2 – 8*8*8) • Metrics used – total run-time, maximum load imbalance, AMR efficiency

  15. Experimental Results

  16. Run-times

  17. Max. Load Imbalance

  18. AMR Efficiency

  19. Experimental Evaluation • RM3D needs rapid refinement and efficient redistribution • pBD-ISP, G-MISP+SP, SFC best suited for RM3D – fast partitioners with low imbalance and maintaining good communication patterns • pBD-ISP fastest, but average load imbalance • G-MISP+SP and SFC generate lowest imbalance but are relatively slower • Evaluated partitioning techniques scale reasonably well

  20. Evaluation (contd.) • Coarse granularity produces high load imbalance • Fine granularity leads to greater synchronization and coordination overheads and higher execution times • Optimal partitioning granularity requires a trade-off between execution speed and load imbalance • For RM3D application, granularity of 4 gives lowest execution time with acceptable load imbalance

  21. Conclusions • Experimental evaluation of dynamic domain-based partitioning and load-balancing techniques • RM3D compressible turbulence application • Effect of choice of partitioner and granularity on execution time • Formulation of application-centric characterization of the partitioners as a function of number of processors, problem size, and partitioning granularity

More Related