1 / 11

John Mellor-Crummey Mike Fagan Mark Krentel Nathan Tallent Department of Computer Science

HPCToolkit is a powerful tool for analyzing the performance of large, multi-lingual applications. It supports measurement and analysis of both serial and parallel codes, avoiding intrusion into development practices. With its innovative approach, it presents analysis results effectively and helps pinpoint scalability bottlenecks in parallel code.

cliffordw
Download Presentation

John Mellor-Crummey Mike Fagan Mark Krentel Nathan Tallent Department of Computer Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application Performance Analysis with HPCToolkit John Mellor-Crummey Mike Fagan Mark Krentel Nathan Tallent Department of Computer Science Rice University http://www.hipersoft.rice.edu/hpctoolkit/

  2. HPCToolkit Goals • Support large, multi-lingual applications • a mix of of Fortran, C, C++ • external libraries (possibly binary only) • thousands of procedures, hundreds of thousands of lines • Avoid intrusion into development practices • no manual instrumentation • don’t significantly alter the build process • no recompilation • Collect execution measurements scalably and efficiently • don’t excessively dilate or perturb execution • avoid large trace files for long running codes • Support measurement and analysis of serial and parallel codes • Present analysis results effectively

  3. HPCToolkit Approach • Work at binary level for language independence • support multi-lingual codes with external binary-only libraries • Profile rather than adding code instrumentation • minimize measurement overhead and distortion • enable data collection for large-scale parallelism • Collect and correlate multiple performance measures • can’t diagnose a problem with only one species of event • Compute derived metrics to aid analysis • Support top down performance analysis • cope with complex programs • intuitive enough for scientists and engineers to use • detailed enough to meet the needs of compiler writers • Aggregate events for loops and procedures • accurate despite approximate event attribution from counters • loop-level info is more important than line-level info

  4. HPCToolkit Workflow application source binary object code compilation linking source correlation binary analysis profile execution program structure hyperlinked database performance profile interpret profile hpcviewer

  5. HPCToolkit Workflow application source binary object code compilation • launch optimized application binaries • collect statistical profiles of events of interest linking source correlation binary analysis profile execution program structure hyperlinked database performance profile interpret profile hpcviewer

  6. application source binary object code compilation linking source correlation profile execution binary analysis program structure hyperlinked database performance profile interpret profile hpcviewer HPCToolkit Workflow • decode instructions and combine with profile data

  7. application source binary object code compilation linking source correlation profile execution binary analysis program structure hyperlinked database performance profile interpret profile hpcviewer HPCToolkit Workflow • extract loop nesting & inlining from executables

  8. application source binary object code compilation linking source correlation profile execution binary analysis program structure hyperlinked database performance profile interpret profile hpcviewer HPCToolkit Workflow • synthesize new metrics as functions of existing metrics • relate metrics and structure to program source

  9. application source binary object code compilation linking source correlation profile execution binary analysis program structure hyperlinked database performance profile interpret profile hpcviewer HPCToolkit Workflow • support top-down analysis with interactive viewer • analyze results anytime, anywhere with Java-based viewer

  10. HPCToolkit User Interface Integrates static and dynamic calling context to attribute performance measurements to loops and inlined code Source View Call path profile of Chroma: C++ program for Lattice Quantum Chromodynamics Navigation Metrics

  11. HPCToolkit Innovations • Measurement • attribute costs to dynamic context using call path profiling • quantify context-dependent costs with minimal overhead • support measurement of fully optimized programs • analyze executables to enable stack unwinding of optimized code • Attribution • recover information about loops and inlined code from binaries • Analysis • pinpoint and quantify scalability bottlenecks in parallel code • use differential analysis of call path profiles • Presentation • integrate static and dynamic context for presenting measurements

More Related