1 / 22

Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010

Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010. Benefits of sampling in tracefiles. Outline. Instrumentation and sampling Folding Summarized traces Some results Current work. Instrumentation. Performance tools based on instrumentation

duy
Download Presentation

Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010 Benefits of sampling in tracefiles

  2. Program Development for Extreme-Scale Computing Outline • Instrumentation and sampling • Folding • Summarized traces • Some results • Current work

  3. Program Development for Extreme-Scale Computing Instrumentation • Performance tools based on instrumentation • Granularity of the results depends on the application structure • Data gathered includes: • Performance counters, callstack, message size…

  4. Program Development for Extreme-Scale Computing Sampling • Sampling reaches any application point at a interval • Easily tunable frequency • Gather performance counters and callstack

  5. Program Development for Extreme-Scale Computing Main objective • Combine both mechanisms • Deeper performance details • Using PAPI_overflow(..) • ... what about frequency trade-off? • Not too high to disrupt the performance data • Not too low to get useful information

  6. Program Development for Extreme-Scale Computing Work done: Folding • Harald Servat, Germán Llort, Judit Giménez, Jesús Labarta: Detailed performance analysis using coarse grain sampling. PROPER, 2009. • Objective: get detailed metrics with few samples • Benefits from both high and low frequencies! • Take advantage of stationary behavior of scientific applications • Build synthetic region from scattered samples • Reintroduce into the tracefile at chosen ratio

  7. Program Development for Extreme-Scale Computing Folding: Moving samples Steps • Main idea: Move samples to the target iteration preserving their original relative time.

  8. Program Development for Extreme-Scale Computing Folding: Interpolation • Instructions evolution for routine copy_faces of NAS MPI BT B • No instrumentation points within the routine, but we got details • Red crosses represent the folded samples and show the completed instructions from the start of the routine • Green line is the curve fitting of the folded samples and is used to reintroduce the values into the tracefile • Blue line is the derivative of the curve fitting

  9. Program Development for Extreme-Scale Computing Folding areas • Folding is applied to delimited regions • Previously instrumented • User function • Iteration • Automatically obtained from the gathered results • Clusters of computation bursts • Juan González, Judit Giménez, Jesús Labarta, Automatic detection of parallel applications computation phases, IPDPS 2009 • Delimited time regions • Marc Casas, Rosa M. Badia, Jesús Labarta, Automatic Structure Extraction from MPI Applications Tracefiles, Euro-Par 2007

  10. Program Development for Extreme-Scale Computing Impact of the sampling frequency • The more samples being fold, the more detailed results • Longer executions • Increase frequency • Reach stability? • Example: • NAS BT class B copy_faces • showing from 10 to 200 iterations • 20 samples per second @ SGI Altix

  11. Program Development for Extreme-Scale Computing Impact of the sampling frequency • Choosing a sampling frequency is important • Sampling frequency can couple with application frequency • Choose frequencies based on prime factors

  12. Program Development for Extreme-Scale Computing Outline • Instrumentation and sampling • Folding • Summarized traces • Some results • Current work

  13. Program Development for Extreme-Scale Computing Dealing with large scale traces • Jesús Labarta, Judit Giménez, Eloy Martínez, Pedro González, Harald Servat, Germán Llort, Xavier Aguilar: Scalability of tracing and visualization tools, PARCO 2005. • Application’s behavior can be divided in: • Communication phases • Intensive computation phases • Instrumentation library that identifies relevant computation phases

  14. Program Development for Extreme-Scale Computing Dealing with large scale traces • Information emitted at phase change • Punctual (callstack) • Aggregated • Hardware Counters • Software Counters • Number of point-to-point and collective operations • Number of bytes transferred • Time in MPI

  15. Program Development for Extreme-Scale Computing Example • PEPC 16384 tasks on Jaguar Duration of the computation bursts # of MPI collective operations

  16. Program Development for Extreme-Scale Computing Benefits of summarized tracefiles • Important trace size reduction • Gadget2 (128) – 10 Gbytes down to 428 Mbytes • PEPC (16k) – 19 Gbytes down to 400 Mbytes • PFLOTRAN (16k) – +250Gbytes down to 6 Gbytes • Whole execution analysis

  17. Program Development for Extreme-Scale Computing Working with large traces? • We're dealing with large scale executions • Maintain scalability of tracing + sampling • By adding more data? • Use folding to reduce data • Example (Gadget2 using 128 tasks) • 100 its, 5 samples/s during 90minutes ~ 236MB • Folding on 1 iteration @ 200 samples/s ~ 64 MB

  18. Program Development for Extreme-Scale Computing Outline • Instrumentation and sampling • Folding • Summarized traces • Combining mechanisms • Some results • Current work

  19. Program Development for Extreme-Scale Computing Gadget2 analysis, 128 tasks force_tree.c +75 - gravity_tree.c +167 predict.c +92 - pm_periodic.c +385 32% 16% 13% 8% gravity_tree.c +528 - density.c +167 force_tree.c +1701 - hydra.c +246

  20. Program Development for Extreme-Scale Computing PEPC analysis, 32 tasks tree_aswalk.f90 +162 - tree_aswalk.f90 +380 tree_aswalk.f90 +380 - tree_aswalk.f90 +162 45% 37% 5% 3% tree_domains.f90 +548 - tree_branches.f90 +155 tree_branches.f90 +548 - tree_properties.f90 +328

  21. Program Development for Extreme-Scale Computing Current directions • We work on: • Is there an optimal sampling frequency? • Quantify correctness and validate the results • Callstack analysis

  22. Program Development for Extreme-Scale Computing • Thank you!

More Related