1 / 45

Projections – A Performance Tool for Charm++ Applications

Chee Wai Lee cheelee@uiuc.edu Parallel Programming Laboratory Dept. of Computer Science University of Illinois at Urbana Champaign http://charm.cs.uiuc.edu. Projections – A Performance Tool for Charm++ Applications. Outline. General Introduction to Projections Projections Basics

romeo
Download Presentation

Projections – A Performance Tool for Charm++ Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chee Wai Lee cheelee@uiuc.edu Parallel Programming Laboratory Dept. of Computer Science University of Illinois at Urbana Champaign http://charm.cs.uiuc.edu Projections – A Performance Tool for Charm++ Applications

  2. Outline • General Introduction to Projections • Projections Basics • Advanced Features • Features to aid Effective Analysis • Extremely Large Datasets • Tips and Notes

  3. Projections • Projections is a performance tool designed for use with Charm++/AMPI. • Trace-based, post-mortem analysis. • Supports highly detailed traces, summary formats and a flexible user-level API. • Java-based visualization tool for presenting performance information. 10/19/05 Projections Tutorial 3

  4. Charm++ Model • Object Oriented. Chares (objects) encapsulate data, standard C++ methods and entry methods. • Message Driven. Entry methods represent work units activated by an incoming message. • Only one entry method may execute at any time in a Chare. • A runtime schedules incoming messages. 10/19/05 Projections Tutorial 4

  5. What you will need • A version of Charm++ built without the CMK_OPTIMIZE flag. • User-built, download our release or check out a copy from anonymous CVS. • Pre-built, check with your machine sysadmin. • Java Runtime 1.3.1 or higher. • Projections Visualization binary (projections.jar) • User-built Charm++, located in charm/tools/projections/bin • Pre-built, check with sysadmin or acquire the binary separately.

  6. Outline: Basics • General Introduction to Projections • Projections Basics • Instrumentation • Trace generation • Visualization • Advanced Features • Features to aid Effective Analysis • Extremely Large Datasets

  7. Trace Generation: Basics • Automatic trace instrumentation - No user codes required by default. • Any Charm++ version built without the CMK_OPTIMIZE flag supports tracing. • All Charm++ entry methods and messaging events are traced.

  8. Trace Generation: Steps • Link your application with link-time options “-tracemode projections -tracemode summary” • Run your application normally. • At the end of the run, you will see “.log”,“.sum” as well as “.sts” files generated on the same directory as your application binary.

  9. Visualization: Basic Steps • Run the script found at charm/tools/projections/bin as such: • projections [<application>.sts] • Or activate the Java binary projections.jar via the command-line passing, in the optional <application>.sts filename as argument.

  10. Visualization: Main Window 10/19/05 Projections Tutorial 10

  11. Visualization: Overview

  12. Visualization: Usage Profile

  13. Visualization: Time Profile

  14. Visualization: Timeline

  15. Visualization: Task Histogram

  16. Visualization: Communication

  17. Visualization: Tabulated Call Info

  18. Projections Basic demo. 10/19/05 Projections Tutorial 18

  19. Outline: Advanced Features • General Introduction to Projections • Projections Basics • Advanced Features • Partial Tracing • Tracing User Events • Tracing AMPI Functions • Features to aid Effective Analysis • Extremely Large Datasets • Tips and Notes

  20. Partial Trace Generation • The following API calls are provided by the tracing framework: • void traceBegin() • void traceEnd() • The above calls turns tracing on/off for the processor on which the call was made. • int traceIsOn() queries the tracing framework status. • +traceoff runtime option • Causes tracing (over all processors) to be turned off when the application is started. 10/19/05 Projections Tutorial 20

  21. Partial Tracing “watch-it”s • traceBegin() and traceEnd() calls apply only on the processor on which it invoked. • Offers flexibility but vulnerable to programmer error. • Typically used in a collective manner. (eg. NAMD does this at specific load balancing operations) • Partial trace calls are invoked in the context of an entry method. One should be prepared to drop initial performance data just after turning tracing on and just before turning tracing off. • Appropriate use of the +traceoff runtime option is essential.

  22. Partial Tracing Example // in the case when trace is off at the beginning, // only turn trace of from after the first LB to the firstLdbStep after // the second LB. // 1 2 3 4 5 6 7 // off on Alg7 refine refine ... on #if CHARM_VERSION >= 050606 if (traceAvailable()) { static int specialTracing = 0; if (ldbCycleNum == 1 && traceIsOn() == 0) specialTracing = 1; if (specialTracing) { if (ldbCycleNum == 4) traceBegin(); if (ldbCycleNum == 6) traceEnd(); } } #endif

  23. Tracing User Events • The following APIs are provided for user event registration and tracing: • int traceRegisterUserEvent(char *eventDesc, int EventNum=-1) • Acquire or specify an event ID to be associated with event name. • void traceUserEvent(int eventNum) • Use a valid event ID to record an event. • void traceUserBracketEvent(int eventNum, double startTime, double endTime) • Use a valid event ID to record an event interval. 10/19/05 Projections Tutorial 23

  24. AMPI Function Tracing API • Works like User Events in Charm++ Applications. • Why? MPI Function abstraction is invisible to the Charm++ runtime. • REGISTER_FUNCTION(<namestring>) to register <namestring> as a string-id to be traced. • TRACEFUNC(<funcall>,<namestring>) to record a call to function <funcall> to be associated with the event registered as <namestring>.

  25. Advanced Features Demo

  26. Outline: Effective Analysis • General Introduction to Projections • Projections Basics • Advanced Features • Features to aid Effective Analysis • How Tracing Works • Memory Footprint Control • Data Volume Control • Visualization Controls • Extremely Large Datasets • Tips and Notes

  27. Log Event Tracing • Each Charm++ event and any registered user events are recorded into a pre-assigned memory buffer on each processor. • Default Buffer size is 10,000 trace log entries. • When a buffer is full, a special flush event is logged and buffer is flushed to disk. • Flushing is done independent of other processors. There is no synchronization on a flush.

  28. Summary Tracing • Invoked by the same event set as in Log tracing. User events are, however, ignored. • Memory buffer is organized into k bins of representing an initial time of 1ms. Each event contributes data into the appropriate time-bin. • When a buffer is filled, bin-time representation is doubled and the data is packed into the first k/2 bins. Event contribution continues at the (k/2+1)th bin.

  29. Memory Footprint Control • Tracing Memory Footprint • Event Log tracemode (-tracemode projections) • Default 10,000 event entries. • Controlled by runtime flag “+logsize <size>”. • Summary tracemode (-tracemode summary) • Default 10,000 time bins of 1 ms. • Controlled by runtime flags “+bincount <#bins>” and “+binsize <seconds>”.

  30. Memory Footprint Control (2) • Trade-offs to consider • Log Buffers • Flush overhead vs Memory usage • Frequency of flushes vs Size of flushes • Summary Buffers • Frequency of compaction + Data Granularity vs Memory usage

  31. Controlling Data Volume • [Code-time] User API for partial tracing. • [Link-time] Generating only summary data. • [Run-time] Writing compressed output (runtime flag): +gz-trace • [Post-run] Deleting subset of generated logs. • [Visualization] Parameter range control, analysis “Memory”.

  32. Visualization Parameter Control

  33. Specific Visualization Tool Issues • Memory usage constraints • Timeline - dependent on the event-density of the selected time range of the application. Typical workable range is 10ms to 10s for between 10-20 processors. • Processor based tools (Overview, Communications, Usage Profile) - limit to 2000 processors or less. • Interval based tools (Graph, Time Profile, Communication vs Time, Animation) - limit to 1500 time intervals. • Histogram tools – limit to 1000 bins or less. 10/19/05 Projections Tutorial 33

  34. Analysis Techniques • Zoom in from wide-ranged low-detail views to detailed look at problem spots. • Make use of range histories. • Control data volume. • Use effective colors. • Make effective use of specific tool features (see manual).

  35. Analysis Techniques (2) • Load Imbalance: Overview, Usage Profile. • Where's my work going?: Time Profile. • How is my communication behaving?: Communication, Communication vs Time. • Are there critical paths? I need details!: Timeline.

  36. Putting it all together

  37. Outline: Extremely Large Datasets • General Introduction to Projections • Projections Basics • How Tracing Works • Features to aid Effective Analysis • Extremely Large Datasets • Tips and Notes

  38. Considerations for Extremely Large Datasets • Large number of files – ware thee the filesystem. • Huge amounts of data – control!

  39. Demo – NAMD on 8192 processors

  40. Outline: Miscellany • General Introduction to Projections • Projections Basics • How Tracing Works • Features to aid Effective Analysis • Extremely Large Datasets • Tips and Notes

  41. Conveniently Placing Logs • Specifying a user-defined output location (runtime option): +traceroot <desired log directory> • It is important to note that <desired log directory> must be available on a machine's compute nodes. 10/19/05 Projections Tutorial 41

  42. Perturbation Issues • Perturbation of application. • Tracing overhead • Timer overheads (rdtsc, machine wallclock). • Acquisition of performance data and storage. • Observed in the case of NAMD with timesteps below 10ms with many compute objects in the microsecond range. 10/19/05 Projections Tutorial 42

  43. Look Closely! • Do not be fooled! Projections does not handle some visualization artifacts well: • Fine grain details can sometimes look like one big solid block on timeline. • It is hard to mouse-over items that represent fine-grained events. • Other times, tiny slivers of activity become too small to be drawn.

  44. Answering my questions (again!) • Load Imbalance: Overview, Usage Profile. • Where's my work going?: Time Profile. • How is my communication behaving?: Communication, Communication vs Time. • Are there critical paths? I need details!: Timeline.

  45. Frequently Asked Questions Q: I tried user events and projections visualization crashes! A: Did you register the events before using them? Q: There are giant stretched event(s) in my run! A: Did you set a large enough log buffer size? Q: Projections visualization crashes! A: Are your logs corrupted? These can happen on bad I/O (experienced on NFS).

More Related