1 / 12

Introduction Scope, Tool Categories, Definitions

Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory. Introduction Scope, Tool Categories, Definitions. Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551.

kalyca
Download Presentation

Introduction Scope, Tool Categories, Definitions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Martin SchulzCenter for Applied Scientific ComputingLawrence Livermore National Laboratory Introduction Scope, Tool Categories, Definitions Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 LLNL-PRES-426152

  2. Development Tools for HPC • Wide variety of tools exist • Open source & commercial • Cross platform & vendor specific • Area of active research • The tool teams at all three labs support many tools • Production support across all labs • Experimental installs at individual sites/machines • Research at the laboratories • Goals of this workshop • Make users aware of what is available • Provide basic usage instructions • More in depth for selected few in the afternoon sessions • Identify gaps / missing functionality

  3. Questions we will try to answer • Which tool can I use to ... • … debug my code? • … find memory corruptions? • … profile the performance of my code? • … understand the communication behavior? • Where can I find the tools and how can I use them? • Supported platforms • Installation locations • Basic user guides • Who can I contact for more help & information?

  4. Questions we would like answered • Which tools are you using? • Active and regularly vs. occasionally vs. not at all? • Do you have any problems with them? • Which concrete tools are you missing? • Do you know of concrete tools that you would like? • Are there tools on some platforms that you like, but are not available on platforms that you need? • What tool capabilities are you missing? • What about your codes do you need to know? • Would you like extended training sessions for any tools? • On-site tutorials? • One-on-one sessions with tool developers?

  5. Categories of Program Development Tools • Debugging Tools • What should I do if my program fails? • What should I do if my program hangs? • Performance Analysis Tools • Where is my code spending its time? • Which communication operations take the most time? • What is the message pattern of my code? • Which call paths are most often taken? • Memory Analysis Tools • Two separate categories • How can I detect silent memory corruptions? • How can I find out how much memory I am using?

  6. Debugging Options • Traditional debugging • Interactive sessions • Ability to attach and control jobs • Set breakpoints, single stepping, inspect state • BUT: inherent scalability problems • Should work up to 4096 processes • Likely to be infeasible after that • Debugging at scale • Need a pre-selection of processes • Subset detection • Representatives of equivalence classes • Application information or light-weight tools • Subset attach features of traditional debuggers

  7. Performance Analysis Options • Tools to analyze, understand, and optimize performance • Wide range of tools with varying level of detail • Varying functionality: from display to automatic tuning • Typical workflow • Instrument code • Run and gather data • Analyze data (command line or GUI) • Instrumentation Options • Transparent instrumentation (online or pre-loading) • Binary or offline instrumentation • Automatic source code instrumentation • Manual source code instrumentation

  8. Profiling Techniques • Aggregate events over time • Aggregate metrics (e.g., time spent in all MPI calls) • Statistical sampling • When should I use it? • Getting a first overview of performance • Finding hot spots • Tradeoffs • (+) Easy to use, low overhead, small result files • (-) Little details, sometimes hard to correlate

  9. Tracing Techniques • Gather information about individual events • Optional: combine with profile data • Typically visualized as timeline graph • Examples • Collect information on all MPI calls • Find all calls to I/O routines • When should I use it? • After profiling points to a particular segment • Understand individual event interactions • Tradeoffs • (+) Very detailed information, catch outliers • (-) Higher overhead, potentially huge output files

  10. Memory Analysis Tools • Use case 1: Debugging • Detect memory access problems • Stray writes / memory corruptions • Repeated free’s / stale pointers • Options • Guard blocks, “Electric Fence” • Emulation / simulation of each memory access • Use case 2: Memory profiling • How much memory am I using? • Where does memory get allocated?

  11. List of Tools (in order) • Debuggers • Totalview • STAT • Performance Analysis • TAU • perflib • Open|SpeedShop • Vampir • mpiP • gprof • CrayPAT • CrayApprentice • Memory Analysis • Totalview • Valgrind • memP • Other tools • Scalasca • Libra load balancer • LOBA • HPC Toolkit • HPCToolkit • Paraver • DDT • Javelina • ThreadSpotter

  12. List of Tools (by category) • Debuggers • Full featured • Totalview, DDT • Scalable pre-selection • STAT • Memory Analysis • Corruption Detection • Valgrind (Emulation) • Totalview (Guard blocks) • Memory usage • memP • Other categories • Thread Spotter (threading) • Javelina (code coverage) • Performance Analysis • Profiling (transparent) • mpiP, gprof, HPCToolkit • Libra (load balance) • LOBA (MPI details) • Tracing (code instr.) • Vampir, Paraver • Profiling & Tracing • Open|SpeedShop, (transparent) • TAU (code or manual instrument.) • CrayPAT/Apprentice (code instr.) • HPC Toolkit (binary instrument.) • Hybrid • Scalasca (Trace analysis) • Manual Instrumentation • perflib

More Related