1 / 23

ASTA Progress Report

ASTA Progress Report. Robert Henschel. April 23 2009. Contents. The Problem(s) Benchmarking and Tracing First Steps Toward a Solution Future work. The Problem(s). Researchers at Indiana University use a 3D hydrodynamic code to study planet formation

eron
Download Presentation

ASTA Progress Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ASTA Progress Report Robert Henschel April 23 2009

  2. Contents • The Problem(s) • Benchmarking and Tracing • First Steps Toward a Solution • Future work

  3. The Problem(s) • Researchers at Indiana University use a 3D hydrodynamic code to study planet formation • Code is a finite-difference scheme on a Eulerian grid • User has a legacy Fortran code that has been made OpenMP parallel • Fortran 66, 77 and 90 • Originally made parallel for small problem sizes (16K computational cells) and small core counts (8-16), but is now used for much larger problem sizes (16M computational cells) • Code generates large amounts of data (1-4TB per simulation) that needs to be analyzed interactively • User would like to do many simulations

  4. The Problem(s) • So the main issues to address are: • Scaling the code to larger core counts (64) and acquiring time on shared memory machines • Transferring multiple TB of data from the compute site to IU for interactive analysis • Automation of the simulation workflow

  5. Benchmarking and Tracing • In order to determine how to improve scalability we first benchmarked the code and used the tool VampirTrace to trace it using 64 cores on Pople and used Pfmon to profile the code • Analysis was performed on three Altix systems at PSC, NASA, and ZIH

  6. Benchmarking and Tracing • Based on the benchmarking, bottleneck subroutines were identified, specifically a subroutine which calculates the gravitational potential for the boundary cells and the subroutine which calculates the gravitational potential with the boundary potential as an input • These subroutines inherently require that all cells communicate with each other

  7. Benchmarking and Tracing • Based on these results we identified two subroutines to restructure and devised a way to reformulate them to reduce off node communication • Several iterations of improvement were preformed and the user reported a speedup of 1.8 in the boundary generation subroutine • A more extensive restructuring of both of the subroutines has been devised, but has yet to be implemented

  8. Transferring Data • Due to the large problem size, each of the user’s simulations generates several terabytes of data which is then analyzed interactively using IDL a proprietary data analysis and plotting package • Transferring this amount of data via traditional methods (ftp, scp, etc.) to IU is extremely time consuming and tedious • By mounting the Data Capacitor at PSC on Pople the user can write their data directly to IU and then access from servers in their department

  9. Transferring Data • The IU Data Capacitor is a Lustre file system that can be mounted over the WAN • Modifications and tuning involved • Although the data are directly written to the Data Capacitor, network issues can introduce significant I/O overhead

  10. Transferring Data • In tracing the code at PSC we discovered that I/O to the Data Capacitor was much slower than I/O to the local Lustre scratch disk

  11. I/O Behavior

  12. I/O Behavior

  13. I/O Behavior

  14. I/O Behavior

  15. Local NFS vs. Local Lustre

  16. Local NFS vs. Local Lustre

  17. Local Lustre vs. Data Capacitor

  18. Local Lustre vs. Data Capacitor

  19. Data Capacitor: Impact of Network Fixes

  20. Data Capacitor: Impact of Network Fixes

  21. Transferring Data • Working with the Data Capacitor team we were able to track down the network issue and eliminate the I/O overhead resulting in a speedup of 30% for the user • Files now appear locally as they are generated by the users simulation

  22. Automating the Workflow • The user would like to be able to run several ~10-15 simulations at the same time and have the data transfer and some preliminary analysis occur automatically • Some of the analysis can be automated • Generation of images • Calculation of gravitational torques • We are currently working on automating the user’s workflow

  23. Future Work • Fully restructure the calculation of the gravitational potential • Devise and implement a workflow framework

More Related