1 / 46

Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization. Hank Childs Lawrence Berkeley National Laboratory / University of California at Davis October 25, 2010. Outline. Motivation Parallelization strategies Master-slave parallelization

ion
Download Presentation

Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large Vector-Field Visualization,Theory and Practice:Large Data and Parallel Visualization Hank Childs Lawrence Berkeley National Laboratory / University of California at Davis October 25, 2010

  2. Outline • Motivation • Parallelization strategies • Master-slave parallelization • Hybrid parallelism

  3. Supercomputers are generating large data sets that often require parallelized postprocessing. 1 billion element unstructured mesh 217 pin reactor cooling simulation. Nek5000 simulation on ¼ of Argonne BG/P. Image credit: Paul Fischer using VisIt

  4. Communication between “channels” are a key factor in effective cooling.

  5. Particle advection can be used to study communication properties.

  6. Observe where particles come out (compare with experimental data) Observe which channels the particles pass through This sort of analysis requires many particles to be statistically significant. How can we parallelize this process? Place thousands of particles in one channel Repeat for other channels

  7. Outline • Motivation • Parallelization strategies • Master-slave parallelization • Hybrid parallelism

  8. P2 P4 P1 P3 P0 P5 P8 P7 P6 P9 For embarrassingly parallel algorithms, the most common processing technique is pure parallelism. Parallelized visualization data flow network Parallel Simulation Code Read Process Render Particle advection is not embarrassingly parallel. So how to parallelize? A: it depend Processor 0 Read Process Render P0 P1 P2 P3 Processor 1 P4 P5 P6 P7 Read Process Render Pieces of data (on disk) P8 P9 Processor 2

  9. Particle advection:Four dimensions of complexity vs Data set size Seed set distribution vs vs Seed set size Vector field complexity

  10. Do we need advanced parallelization techniques? When? • Data set size? • Not enough! • Large #’s of particles?

  11. Parallelization for small data and a large number of particles. Parallelized visualization data flow network Simulation code Read Advect Render Processor 0 File Read Advect Render Processor 1 Read Advect Render Processor 2

  12. Do we need advanced parallelization techniques? When? • Data set size? • Not enough! • Large #’s of particles? • Need to parallelize, but embarrassingly parallel OK • Large #’s of particles + large data sets sizes

  13. P2 P4 P1 P3 P0 P5 P8 P7 P6 P9 Parallelization for large data with good “distribution”. Parallelized visualization data flow network Parallel Simulation Code Read Advect Render Processor 0 Read Advect Render P0 P1 P2 P3 Processor 1 P4 P5 P6 P7 Read Advect Render Pieces of data (on disk) P8 P9 Processor 2

  14. Do we need advanced parallelization techniques? When? • Data set size? • Not enough! • Large #’s of particles? • Need to parallelize, but embarrassingly parallel OK • Large #’s of particles + large data sets sizes • Need to parallelize, simple schemes may be OK • Large #’s of particles + large data set sizes + (bad distribution OR complex vector field) • Need smart parallelization

  15. Parallelization with big data &lots of seed points & bad distribution P3 P0 P4 P1 P2 P4 P1 P2 P0 P3 P2 P1 P3 P4 P0 P3 P1 P2 P0 P4 P4 P2 P1 P0 P3 • Two extremes: • Partition data over processors and pass particles amongst processors • Parallel inefficiency! • Partition seed points over processors and process necessary data for advection • Redundant I/O! P0 P1 P3 P2 P4 Parallelize over particles Parallelize over data Hybrid algorithms Notional streamline example

  16. Outline • Motivation • Parallelization strategies • Master-slave parallelization • Hybrid parallelism

  17. The master-slave algorithm is an example of a hybrid technique. • “Scalable Computation of Streamlines on Very Large Datasets”, Pugmire, Childs, Garth, Ahern, Weber. SC09 • Many of the following slides compliments of Dave Pugmire. • Algorithm adapts during runtime to avoid pitfalls of parallelize-over-data and parallelize-over-seeds. • Nice property for production visualization tools. • Implemented inside VisIt visualization and analysis package.

  18. Master P0 P1 P2 P3 Master Master Master P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 Slave Slave Slave Slave Slave Slave Slave Slave Slave Slave Slave Slave Master-Slave Hybrid Algorithm • Divide processors into groups of N • Uniformly distribute seed points to each group • Master: • Monitor workload • Make decisions to optimize resource utilization • Slaves: • Respond to commands from Master • Report status when work complete

  19. Master Process Pseudocode Master() { while ( ! done ) { if ( NewStatusFromAnySlave() ) { commands = DetermineMostEfficientCommand() for cmd in commands SendCommandToSlaves( cmd ) } } } What are the possible commands?

  20. Commands that can be issued by master • Assign /Loaded Block • Assign / Unloaded Block • Handle OOB / Load • Handle OOB / Send • OOB = out of bounds Master Slave Slave is given a streamline that is contained in a block that is already loaded

  21. Commands that can be issued by master • Assign /Loaded Block • Assign / Unloaded Block • Handle OOB / Load • Handle OOB / Send • OOB = out of bounds Master Slave Slave is given a streamline and loads the block

  22. Load Commands that can be issued by master • Assign /Loaded Block • Assign / Unloaded Block • Handle OOB / Load • Handle OOB / Send • OOB = out of bounds Master Slave Slave is instructed to load a block. The streamline in that block can then be computed.

  23. Send to J Commands that can be issued by master • Assign /Loaded Block • Assign / Unloaded Block • Handle OOB / Load • Handle OOB / Send • OOB = out of bounds Master Slave Slave J Slave is instructed to send a streamline to another slave that has loaded the block

  24. Master Process Pseudocode Master() { while ( ! done ) { if ( NewStatusFromAnySlave() ) { commands = DetermineMostEfficientCommand() for cmd in commands SendCommandToSlaves( cmd ) } } } * See SC 09 paper for details

  25. Algorithm Test Cases • Core collapse supernova simulation • Magnetic confinement fusion simulation • Hydraulic flow simulation Dense and sparse seeding

  26. Workload distribution in supernova simulation Parallelization by: Seeds Blocks Hybrid Colored by processor doing integration

  27. Workload distribution in seed algorithm Too much I/O

  28. Workload distribution in block algorithm Starvation

  29. Workload distribution in hybrid algorithm Just right

  30. Comparison of workload distribution Hybrid Block Seed

  31. Astrophysics Test Case: Total time to compute 20,000 Streamlines Seconds Seconds Number of procs Number of procs

  32. Astrophysics Test Case: Number of blocks loaded Blocks loaded Blocks loaded Number of procs Number of procs

  33. Outline • Motivation • Parallelization strategies • Master-slave parallelization • Hybrid parallelism

  34. Are today’s algorithms going to fit correctly on tomorrow’s machines? • Traditional approachfor parallel visualization – 1 core per MPI task – may not work well on future supercomputers, which will have100-1000 cores per node. • Exascale machines will likely have O(1M) nodes … and we anticipate in situ particle advection. Hybrid parallelism blends distributed- and shared-memory parallelism concepts.

  35. The word “hybrid” is being used in two contexts… • The master-slave algorithm is a hybrid algorithm, sharing concepts from both parallelization-over-data and parallelization-over-seeds. • Hybrid parallelism involves using a mix shared and distributed memory techniques, e.g. MPI + pthreads or MPI+CUDA. • One could think about implement a hybrid particle advection algorithm in a hybrid parallel setting.

  36. What can we learn from a hybrid parallel study? • How do we implement parallel particle advection algorithms in a hybrid parallel setting? • How do they perform? • Which ones perform better? How much better? • Why? • How does hybrid parallelism benefit this algorithm? Streamline Integration Using MPI-Hybrid Parallelism on a Large Multi-Core Architecture by David Camp, Christoph Garth, Hank Childs, Dave Pugmire and Ken Joy Accepted to TVCG

  37. Streamline integration using MPI-hybrid parallelism on a large multi-core architecture • Implement parallelize-over-data and parallelize-over-seeds in a hybrid parallel setting • Did not study the master-slave algorithm • Run series of tests on NERSC Franklin machine (Cray) • Compare 128 MPI tasks (non-hybrid) vs 32 MPI tasks / 4 cores per task (hybrid) • 12 test cases: large vs small # of seeds dense vs sparse seed locations 3 data sets

  38. Hybrid parallelism for parallelize-over-data • Expected benefits: • Less communication and communicators • Should be able to avoid starvation by sharing data within a group.

  39. Hybrid parallelism for parallelize-over-seeds • Expected benefits: • Only need to read blocks once for node, instead of once for core. • Larger cache allows for reduced reads • “Long” paths automatically shared among cores on node

  40. Measuring the benefits of hybrid parallelism for parallelize-over-seeds

  41. Gantt chart for parallelize-over-seeds

  42. Measuring the benefits of hybrid parallelism for parallelize-over-data

  43. Gantt chart for parallelize-over-data

  44. Summary of Hybrid Parallelism Study • Hybrid parallelism appears to be extremely beneficial to particle advection. • We believe the parallelize-over-data results are highly relevant to the in situ use case. • Although we didn’t implement the master-slave algorithm, we believe the benefits shown at the spectrum extremes provide good evidence that it will also benefit. • Implemented on VisIt branch, goal to integrate into VisIt proper in the coming months.

  45. Summary for Large Data and Parallelization • The type of parallelization required will vary based on data set size, number of seeds, seed locations, and vector field complexity • Parallelization may occur via parallelization-over-data, parallelization-over-seeds, or somewhere in between (master-slave). • Hybrid parallelism appears to provide a large benefit, since it de-emphasizes the negative aspects of each parallelization approach.

  46. Thank you for your attention! • Acknowledgments: DoESciDAC program, VACET • Hank Childs (hchilds@lbl.gov) • + Dave Pugmire (ORNL), Dave Camp (LBNL/UCD), Christoph Garth (UCD), Sean Ahern (ORNL), Gunther Weber (LBNL), and Ken Joy (UCD) • Questions?

More Related