1 / 14

Overview of CrayPat and Apprentice 2

Overview of CrayPat and Apprentice 2. Adam Leko UPC Group HCS Research Laboratory University of Florida. Color encoding key: Blue: Information Red: Negative note Green: Positive note. Basic Information. Name: CrayPat, Apprentice 2 Developer: Cray Current Version: CrayPat v24.107

Download Presentation

Overview of CrayPat and Apprentice 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of CrayPat and Apprentice2 Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive note

  2. Basic Information • Name: CrayPat, Apprentice2 • Developer: Cray • Current Version: • CrayPat v24.107 • Apprentice2 v2.0 • (not available separately) • Languages: Fortran, C, C++ • Website: Documentation available at • http://www.cray.com/cgi-bin/swpubs/craydoc30/craydoc.cgi • Contact: • Luiz DeRose (ldr@cray.com)

  3. CrayPat and Apprentice2 Overview • CrayPat • Cray’s toolkit for instrumenting executables and producing data from runs • Uses static binary instrumentation • Supports tracing, profiling, and sampling • Outputs data in binary format which can be converted to • XML format (for Apprentice2) • Text format (report that contains statistical information) • Apprentice2 • Visualization tool for CrayPat data files • Can read in .xml or .xml.gz files (gzipped XML reports converted from binary output of CrayPat) • Several visualizations available

  4. CrayPat Overview • Command-line based performance optimization tools • In CrayPat, you perform experiments on instrumented executables • Several types of experiments available • Tracing: Record timestamps and arguments for all instrumented functions • Sampling: Samples hardware counters or callstack at fixed intervals • Profiling: Performs a specific sampling experiment where user + system time are sampled for all functions in a program • Type of experiment guided by setting environment variables • However, can only perform tracing experiments on executables instrumented for tracing • But, can perform sample-type experiments on executables instrumented for tracing • General workflow • 1. Compile application and run as normal • 2. Instrument using pat_build • 3. Run instrumented executable as normal; binary .xf log file will be produced • 4. View report using pat_report • Can also use pat_run to combine steps 3 & 4, or pat_hwpc on uninstrumented exectuables to get hardware counter reports • CrayPat supports many languages + extensions • C, C++, Fortran, UPC, MPI, CoArray Fortran, OpenMP, SHMEM

  5. CrayPat Instrumentation • Instrumentation is very simple! • Build application as normal (not even debugging symbols needed), keeping the .o files • Eg: • UPC: cc -hupc -hkeepfiles *.upc -o exe • C/C++/MPI C: cc -hkeepfiles *.c -o exe • Fortran: ftn -hkeepfiles *.f77 -o exe • Can also use -c flag with compilers and link in separate stage as normal • Use pat_build to build instrumented executable • For profiling or sampling: pat_build exe inst.exe • For tracing: • UPC: pat_build -g upc exe inst.exe • MPI: pat_build -g mpi exe inst.exe • Several other things can be traced with -g flag (CoArray Fortran, heap calls, I/O system calls • Passing the -u flag also traces all (non-inlined) user function calls • Then run program as normal as shown earlier • Use of binary instrumentation means low overhead and no interference with compiler optimizations • X1 and X1E are extremely dependent on compiler optimizations (loop vectorization especially), so this is an absolute necessity for CrayPat • In our informal tests, sampling instrumentation resulted in negligible overhead (< 2-3 %) • Also, .xf logfiles from runs seem very compact

  6. Sample pat_report Output Table 1: -d time%,cum_time%,time,traces,P,E,M -b exp,pe,thread,ssp,function,ca Time% | Cum.Time% | Time | Traces |Experiment=1 |PE=0 |Thread=0 |SSP=0 |Function |Caller 100.0% | 100.0% | 33.364290 | 72 |Total |------------------------------------------------------- | 100.0% | 100.0% | 33.359228 | 1 |main | | | | | (N/A) | 0.0% | 100.0% | 0.003155 | 45 |timer_now$$CFE_id_hex2UINT ||------------------------------------------------------ || 0.0% | 100.0% | 0.001608 | 21 | timer_elapsed$$CFE_id_hex2UINT || | | | | main || 0.0% | 100.0% | 0.001547 | 24 |main ||====================================================== | 0.0% | 100.0% | 0.001443 | 2 |ioctl | | | | | printf | | | | | main | 0.0% | 100.0% | 0.000395 | 21 |timer_elapsed$$CFE_id_hex2UINT | | | | | main | 0.0% | 100.0% | 0.000026 | 1 |extendDC | | | | | main | 0.0% | 100.0% | 0.000025 | 1 |_exit | | | | | sigtramp | | | | | main | 0.0% | 100.0% | 0.000018 | 1 |hex2UINT | | | | | main |======================================================= • By default, pat_report lists profile-type information • Can also produce a listing of events with -c records option, but not very useful • Although necessary for exporting traces to Apprentice2 • Lots of different summary information can be displayed using pat_report • Output very customizable • Can change text format, how stats are computed, which data is displayed, … • Like prof on steroids

  7. Apprentice2 Overview • Visualization tool for XML files produced by CrayPat • Supports visualization of • Callstack sampling experiments • MPI trace experiments • Available visualizations • Overview piecharts that contain a breakdown of data by time and calls • Traffic (timeline/Gantt chart) • Text report (similar to what is available from CrayPat) • Mosaic (shows communication volume between processing elements) • Activity (shows % time spent in different MPI functions as a function of time) • Profile (show call tree with observed times) • Several visualizations also have “calipers” at bottom of screen to restrict view to certain time periods

  8. Apprentice2 Problems • Was never able to get Apprentice2 to run properly • Followed instructions provided by Cray [1], but was never able to get Apprentice to show a callstack profile or an MPI trace • All visualizations looked empty! • See right for examples • Probably due to using a (beta) public-access Cray machine • Rest of information garnered from [2]

  9. Apprentice2 Visualizations • Call graph view • Shows summary of sampled call stacks • Similar to display of KCacheGrind • Inclusive/exclusive time annotated by height and width of functions

  10. Apprentice2 Visualizations (2) • Overview display • Overview shows breakdown of execution time by each function in a pie chart • Clicking on each function brings up a tab showing breakdown per node • Clicking on “other” brings up text list of other functions • Can also display pie chart of function times by node

  11. Apprentice2 Visualizations (3) • Timeline view • Shows communication in Gantt chart view • Similar to other trace-based MPI visualization tools

  12. Apprentice2 Visualizations (4) • Mosaic view • Shows pair-wise communication statistics • Can show different stats • Max time • Average time • Min time

  13. Apprentice2 Visualizations (5) • Activity view • Shows percentage of time spent in MPI calls as a function of time • Ex: • Red = barrier • Light green = broadcast • Dark green = send

  14. References [1] “Optimizing Applications on Cray X1 Series Systems,” #S-2315-54, 2005. (available from docs.cray.com) [2] L. DeRose, “Performance Analysis and Visualization with Cray Apprentice2,” SC 2004, Pittsburgh, PA, November 2004.

More Related