1 / 29

CEPBA Tools (DiP) Evaluation Report

CEPBA Tools (DiP) Evaluation Report . Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida. Color encoding key: Blue: Information Red: Negative note Green: Positive note. Basic Information. Name: Dimemas, MPITrace, Paraver

kedem
Download Presentation

CEPBA Tools (DiP) Evaluation Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive note

  2. Basic Information • Name: Dimemas, MPITrace, Paraver • Developer: European Center for Parallelism of Barcelona • Current versions: • MPITrace 1.1 • Paraver 3.3 • Dimemas 2.3 • Website:http://www.cepba.upc.es/tools_i.htm • Contact: • Judit Gimenez (judit@cepba.upc.edu)

  3. DiP Overview Write code Instrument (MPITrace) • DiP = Dimemas, Paraver • Toolset used for improving performance of parallel programs • Created by CEPBA ca. 1992/93, still in development • Has three main components: • Trace collection • MPITrace for MPI programs • OMPTrace for OpenMP programs (not evaluated) • OMPITrace for hybrid OpenMP/MPI programs (not evaluated) • Trace visualization: Paraver • Trace simulation: Dimemas • Uses MPIDTrace for instrumentation • Workflow encouraged by DiP • “Measure-modify” approach • Pictured right Examine tracefile (Paraver) Hypothesize about bottlenecks Test new hypothesis Verify via simulation (Dimemas) Fixbottlenecks

  4. MPITrace Overview • Automatically profiles all MPI commands using MPI profiling interface • Compilation command:mpicc -L/path/to/mpitrace/libs \ -L/path/to/papi/libs -lmpitrace -lpapi \ <rest of compilation cmds> • Can record other information too • Hardware counters via PAPI (MPItrace_counters) • Custom events (MPItrace_event) • Requires special runtime wrapper script to produce tracefile • Command:mpitrace mpirun <rest of regular cmds> • mpitrace requires license to run • mpitrace must be started from machine listed in license file

  5. MPITrace Overview (2) • After running mpitrace, several .mpit files are created (one per MPI process) • Collect them into a single tracefile with command:mpi2prv –syn *.mpit • -syn flag necessary to line up events correctly (not mentioned in docs [1]) • This command creates a single logfile (.prv) and Paraver config file (.pcf) • .pcf file also contains names and colors of custom events • Tracefile format • ASCII (plain text), well-documented (see [1]) • Can get to be quite large • .prv files can be converted to faster-loading, platform-dependent, undocumented binary format via prv2log command • Was never able to get hardware counters working • Took several tries to get any tracefile to be created • PAPI 3.0.7 installed with no problems on Kappas 1-8 • No errors but no hardware counter events in tracefile! • Rest of review assumes that this can be fixed given enough time

  6. MPITrace Overhead • All programs executed correctly when instrumented  • Benchmarks marked with a star had high variability in execution time • Readings with stars probably not accurate • Based on LU benchmark, expect ~ 30% tracing overhead • More communication == more overhead • Wasn’t able to test overhead of hardware counter instrumentation

  7. Paraver Overview • Four main pieces of Paraver (see right): • Filtering • Semantic module • Visualization • Graphical timeline • Text • Analysis (1D/2D) • Complex piece of software! • Had to review several documents to get a feel for how to use [2, 3, 4, 5] • Tutorial short but not too clear • Reference manual best documentation, but lengthy Image courtesy [2]

  8. Paraver: Process/Resource Models Process model (courtesy [3]) Resource model (courtesy [3])

  9. Paraver: Graphical Timeline • Graphic display uses standard timeline view • Event view similar to Jumpshot, Upshot, etc. (right, top) • Can also display time-varying data like global CPU utilization (right, bottom) • Tool can display more than one trace file at a time • Uses “tape” metaphor instead of scrolling • Play, pause, rewind to beginning, fast forward to end • Cumbersome and nonintuitive • Breaks intuition of what scroll bars do (scroll bars do not scroll window) • Moving window creates animations which slows things down compared to regular scrolling • Interface is workable, but takes some getting used to • Zooming always brings up another window • Quickly results in many open windows • This complexity handled by adding the a save/restore open windows function • Save/restore windows nice feature • Interface is generally snappy • Uses ugly widget set by today’s standards

  10. Paraver: Text Views • Provide very detailed information about trace files • Textual listing of events • Which events happen when • Access by clicking on graphical timeline

  11. Paraver: 1D/2D Analysis • 1D Analysis (right, top) • Shows statistics about various types of events • Shown per thread as text or histogram • 2D Analysis (right, bottom) • Shows statistics for 1 event type between • Pairs of threads • Item chosen by semantic module • Uses color to encode information (high variance, max/min) • Analysis mode takes into account filter and semantic modules (described next) • Very complex and user-unfriendly, but • Allows complicated analyses to be performed, can easily reconstruct most “normal” profiling information

  12. Paraver: Filter Module • Filter module allows filtering of events before they are • Shown in the timeline • Processed by the semantic module • Analyzed by the 1D/2D analyzers • Can filter events by communication parameters • Who sends/receives the message • Message tag (MPI tag) • Logical times (when send/receive functions are called) or physical times (when send/receive actually takes place) • Combination of ANDs/ORs from the above • Also by user events • Type and/or value • Interface for filtering events is straightforward

  13. Paraver: Semantic Module • Interface between raw tracefile data what user sees • Sits above filter, below visualization modules • Makes heavy use of the runtime/process model • Uses 3 different methods for getting values • Work with the process model (next slide) • Application, task, thread, and workload levels • Work with the available system resources (next slide) • Node, CPU, and system levels • Combine different existing views • E.g., combine TLB misses with loads for average TLB miss ratios • In a few words: controls how trace file information is displayed • Flexible way of being able to display disparate types of information (communication vs. hardware counters) • Can take a lot of work to get Paraver to show what information you’re looking for • Saved window configurations can help greatly here (perform steps only once, use for all traces later on) • Easily the most confusing aspect of Paraver • Documentation doesn’t necessarily help with this

  14. Dimemas Overview • Uses generic “network of SMPs” model to perform trace-driven simulation • Outputs trace files that can be directly visualized by Paraver • Uses different tracefile format for input than Paraver • Was never able to get this to work • “dimemas” GUI crashed • Java version works, but other problems exist…. • “Dimemas” complained about missing license even though one was in $DIMEMAS_HOME/etc/license.dat • Need MPIDTrace? • Rest of evaluation based on available documentation [4, 5, 6]

  15. Dimemas: Architectural/Process Model • Simulated architecture: network of SMPs • Parameters for interconnection network • Number of buses (models resource contention) • Bisection bandwidth of network • Full duplex/half duplex links (from node to bus) • Parameters for nodes • Bandwidth and latency for intra-node communication • Latency for inter-node communication • Processor speed (uses linear speedup model) • Parameters for existing systems are collected (manually) via microbenchmarks • Uses the same process model as Paraver • Application (Ptask), task, thread levels • Can model MPI, OMP, hybrid models with this model Image courtesy [5]

  16. Dimemas: Communication Model • Figures to right illustrate timing information that is simulated • Point-to-point communication model • Shown right top • Straightforward model based on latencies, bandwidth, and contention (bus model) • Collective communication model • Shown right bottom • Implicit barrier before all collective operations • Two phases: • Fan in • Fan out • Collective communication time represented 3 ways (selected by user) • Constant • Linear • Logarithmic • User specifies parameters • Located in special Dimemas “database” text files • Existing set covers IBM SP, SGI Origin 2000, and a few others Image courtesy [5] Image courtesy [5]

  17. Dimemas: Accuracy, Other Features • Accuracy • On trivial applications (ping-pong), expected error with correct parameters is less than 12% [4] • Collective communication model for MPI verified in [6] on NAS benchmark suite • Most applications within 30% accuracy (IS.A.8 jumped to over 150% error) • Other features • Critical path selection • Starts at end, shows dependency path back to beginning of critical path • Sensitivity analysis (factorial analysis, vary parameters within 10%) • “What-if” analysis • Can adjust the time taken for each function call to see what would happen if you could write a faster version • Can also answer questions like “what would happen if we double our bandwidth?” • Simulation time: unknown (not reported in any documentation) • Only communication events are simulated • Therefore, assume simulation time is proportional to amount of communication • Also, uses simple (coarse bus-based) contention model, so simulation times should be reasonable

  18. Bottleneck Identification Test Suite • Testing metric: what did trace visualization tell us (automatic instrumentation)? • Assumed a fully-functional installation of Paraver and Dimemas • CAMEL: PASSED • Identified large number of small messages at beginning of program execution • Assuming hardware counters worked, could also identify sequential parts of algorithm (sort on node 0, etc) • NAS LU (“W” workload): PASSED • Showed communication bottlenecks very clearly • Large(!) number of small messages • Illustrated time taken for repartitioning data • Shows sensitivity to latency for processors waiting on data from other processors • Could use Dimemas to pinpoint latency problem by testing on ideal network with no/little latency • Moderately-sized trace file (62MB), loaded slowly (> 60 seconds) in Paraver

  19. Bottleneck Identification Test Suite (2) • Big message: PASSED • Traces illustrated large amount of time spent in send and receive • Diffuse procedure: PASSED • Traces illustrated a lot of synchronization with one process doing more work • Since no source code correlation, hard to tell why problem existed • Hot procedure: TOSS-UP • Assuming hardware counters work, would be easy to see extra CPU utilization • No source code correlation would make it difficult to pinpoint problem • Intensive server: PASSED • Traces showed that other nodes were waiting on node 0 • Ping pong: PASSED • Traces illustrated that the application was very latency-sensitive • Much time being spent on waiting for messages to arrive • Random barrier: PASSED • Traces showed that one was doing more work than the others • Small messages: PASSED • Traces illustrated a large number of messages being sent to node 0 • Also illustrated overhead of instrumentation for writing tracefile information • System time: FAILED • No way to tell system time vs. user time • Wrong way: PASSED • First receive took a long time for message to arrive in trace

  20. General Comments • Very large learning curve • Complex software with lots of concepts • Concepts must be totally understood or • The software doesn’t make sense • The software seems like it has no functionality • Some “common” actions (e.g., view TLB cache misses) can be very difficult to do at first in Paraver • Stored window configuration helps with this • Older tools • Seem to have grown and gained features as the need for them arose • Lots of “cruft” and strange ways of presenting things • User interface clunky by today’s standards • User interface complicated by anyone’s standards!

  21. General Comments (2) • Trace-driven simulation: useful? • Can be useful for performing “what-if” studies and sensitivity analyses • But, still limited on what you can explore without modifying the application • Can see what happens when a function can run twice as fast • Can’t see effect of different algorithms without rerunning application • Tools provide little guidance on what user should do next • Heavily reliant on skill of user to make efficient use of tools

  22. Adding UPC/SHMEM Support • Commercial tool! • No way to explicitly add support into Dimemas or Paraver for UPC or SHMEM • However, tools written using modular design • Existing process and resource models can be used to model UPC and SHMEM applications • Paraver and Dimemas do not need to explicitly support UPC and SHMEM, just trace files • Assuming we have methods for instrumenting UPC and SHMEM code, all that is required is writing to the .prv file format • Documented! • Not sure about Dimemas’ trace file format…

  23. Evaluation (1) • Available metrics: 5/5 • Can use PAPI and existing hardware counters • Paraver can combine trace information and give you just about any metric you can think of • Cost: 1/5 • For Paraver, Dimemas, and MPITrace, 1 seat: 2000 Euros (~$2,600) • Documentation quality: 1/5 • MPITrace: Inadequate documentation for Linux • Dimemas: Only tutorial available unless you want to read through conference papers and PhD theses • Paraver: User manual very thorough but technical and unclear • Many grammar errors impair reading! • “temporal files” -> temporary files • Many more… *Note: evaluated Linux version

  24. Evaluation (2) • Extensibility: 0/5 • Commerical (no source), but • Can add new functions to semantic module for Paraver • Flexible design lets you support a wide variety of programming paradigms by using documented trace file format • Filtering and aggregation: 5/5 • Paraver has powerful filtering & aggregation capability • Filtering & aggregation only post-mortem, however • Hardware support: 3/5 • AlphaServer (Tru64), 64-bit Linux (Opteron, Itanium), IBM SP (AIX), IRIX, HP-UX • Most everything supported: Linux, AIX, IRIX, HP-UX • No Cray support • Heterogeneity support: 0/5 (not supported)

  25. Evaluation (3) • Installation: 1/5 • Linux installation riddled with errors and problems • PAPI dependency for hardware counters complicates things (needs kernel patch) • Have had the software over 2 months, still not working correctly • According to our contact, this is not normal, but other tools nowhere near as hard to install • Interoperability: 1/5 • No export interoperability with other tools • Apparently tools exist to import SDDF and other formats (but I couldn’t find them) • Can import UTE traces • Learning curve: 1/5 • All graphical interfaces have unintuitive interfaces • Software is complex, and tutorials do not lessen learning curve very much • Manual overhead: 1/5 • MPITrace only records MPI events • Linux needs extra instructions in source code to get hardware counter information • Need to relink or recode to turn tracing on or off • Measurement accuracy: 4/5 • CAMEL overhead: ~8% • Tracing overhead not negligible, but within acceptable limits • Dimemas accuracy decent, but good enough to do what Dimemas is intended for

  26. Evaluation (4) • Multiple executions: 1/5 • Paraver supports displaying multiple tracefiles at the same time • This lets you relate different runs (with different parameters) to each other relatively easily • Multiple analyses & views: 4/5 • Semantic modules provide a convenient (if awkward) way of displaying different types of data • Semantic modules also allow the displaying of the same type of data in different ways • Analysis modules show statistical summary information over time ranges • Performance bottleneck identification: 4.5/5 • No automatic bottleneck identification • All the information you need to identify a bottleneck should be available between Paraver and Dimemas • However, much manual effort is needed to determine where bottlenecks are • Also, no information is related back to the source code level • Profiling/tracing support: 2/5 • Only supports tracing • Trace files can be quite large and can take some time to open • Response time: 3/5 • No data at all until after run has completed and tracefile has been opened • Dimemas requires simulation to fully finish and Paraver to open up the generated tracefile before information is shown to user

  27. Evaluation (5) • Searching: 3/5 • Search features provided by Dimemas • Software support: 3.5/5 • MPI profiling library allows linking against any existing libraries • OpenMP, OpenMP+MPI programs also supported via add-on instrumentation libraries • Source code correlation: 0/5 • Not supported directly, can use user events to identify program phases • System stability: 3/5 • MPITrace stable (had no problems other than installation) • Paraver crashed relatively often (>= 1 time per hour) • Dimemas stability not tested • Technical support: 3/5 • Responses from contact within 24-48 hours • Some problems not resolved quickly, though

  28. References [1] “MPITrace tool version 1.1: User’s guide,” November 2000. http://www.cepba.upc.es/paraver/docs/MPItrace.pdf [2] “Paraver version 2.1: Tutorial,” November 2000. http://www.cepba.upc.es/paraver/docs/Paraver_TUTORIAL.pdf [3] “Paraver version 3.1: Reference manual (DRAFT),” October 2001. http://www.cepba.upc.es/paraver/docs/Paraver_MANUAL.pdf [4] “DiP: A Parallel Program Development Environment,” Jesús Labarta et al. In proc. of 2nd International EuroPar Conference (EuroPar 96), Lyon (France), August 1996.

  29. References (2) [5] “Performance Prediction and Evaluation Tools,” Sergi Turell. PhD thesis, Universitat Politecnica de Catalunya, March 2003. [6] “Validation of Dimemas communication model for collective MPI communications,” S. Girona et al. In proc. of EuroPVM/MPI 2000, Balatonfüred, Lake Balaton, Hungary, September 2000. [7] “Introduction to Dimemas,” (tutorial). http://www.cepba.upc.edu/dimemas/docs/Dimemas_MANUAL.pdf

More Related