1 / 16

The SIGMA Tools

The SIGMA Tools. Jeff Hollingsworth (University of Maryland) Luiz Derose K Ekanadham (IBM Research). Sigma Goals. Family of tools to understand caches Focus of detailed statistics Complement existing hardware counters Ability to handle real applications MPI and openMP programs

Download Presentation

The SIGMA Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The SIGMA Tools Jeff Hollingsworth (University of Maryland) Luiz Derose K Ekanadham (IBM Research)

  2. Sigma Goals • Family of tools to understand caches • Focus of detailed statistics • Complement existing hardware counters • Ability to handle real applications • MPI and openMP programs • Fortran and C • Provide hints about restructuring • Padding (both inter and intra data structures) • Blocking

  3. Approach • Run instrumented program • Capture full information about memory use • Produce compact trace • Extracts loops and memory strides • Post execution tools • Memory profiler • share of accesses due to each data structure • Cache Prediction Tool • Predict cache misses using symbolic equations • Detailed simulator • Full discrete event simulator

  4. dumpMap .addr ProgramExecution trace files Instrumentedbinary CacheSimulator PredictionTool MemoryRef Tool Structure of SIGMA Data Collection source files SigmaCompile/Link .lst files

  5. New Dyninst Features for SIGMA • Fortran Common Blocks • Class BPatch_cblock • Represents a unique definition of a common block • getComponents – returns members of the common block • getFunctions – returns functions that define this block • Class BPatch_type • getCblocks – returns list of BPatch_cblock • Global Variables • Named common blocks now visible • Fortran specific Debug Symbols • Now parsed and visible

  6. RPT BLK1 ADR ADR ADR BLK2 ADR ADR BLK3 250 100 200 300 300 500 7 4 4 4 4 4 Representing Program Execution • Capture full execution behavior • Record all basic blocks and memory addresses • Produces large traces (due to looping) • Trace compression • Maintain pattern buffer • Scan for repeating patterns • Extract memory strides • Repeat algorithms for nested loops Base Count Length Stride

  7. Trace Information • Compression ratio a function of regularity • Slowdown depends on fraction of instructions that load/store memory

  8. Using SIGMA Trace Generation • Compiling - modify makefile • .f to .o rules • prepend $(SIGMA)/bin/sigmaCompile $< • Link step • prepend $(SIGMA)/bin/sigmaLink • Running • Two environment variables • SIGMA_TRACELEVEL • SIGMA_TRACEDIR • Selected instrumentation • Only sigmaCompile selected files • No overhead for uninstrumented files • Explict calls to enable/disable • Some overhead remains

  9. Cache Prediction Tool • Use compressed traces • Convert memory refs back to array refs • Compute Miss Equations • re-use vectors (Ghosh & Martonosi) • Direct set of linear constraints (Chatterjee et. al) • To Compute Misses • define misses as a system of linear equations • use Omega library to solve • Provides • count of misses • information about iterations that cause misses

  10. Iteration Space • Re-use vectors • defines points in the iteration space that access the same data • Miss equations • describe points in interaction space that cause misses on conflicts

  11. Predicting cache misses • Operate on compact traces • Only expand to full trace if needed • Use algorithms developed for compilers • Re-use vectors • Cache miss equations • Miss types are identified • capacity, cold, and conflict

  12. Cache Terminology Memory consists of lines L Cache -way associate Each Line maps to a set S

  13. Array References • A reference Rv(i1,i2) refers to • the vth array reference in a loop • the i1th iteration of the outer loop • the i2nd iteration of the inner loop • Rv(i1,i2) precedes Ru(j1,j2) if • i1 < j1 or • i1 = j1 and i2 < j2 or • i1 = j1 and i2 = j2 and v < u

  14. A Replacement Miss • There exists a reference Ra(i1,i2) such that • Ra(i1,i2) refers to line L and maps to set S • There exists another Rb(j1,j2) such that • Rb(j1,j2) refers to line L and maps to set S • Rb(j1,j2) precedes Ra(i1,i2) • There exist at least  references such that • Rn(k1,k2) maps to set S • Rn(k1,k2) refers to line line Ln where • Ln is distinct from all other Ln’s and L • Ra(j1,j2) precedes Rb(k1,k2) precedes Rb(i1,i2)

  15. Using Miss Data • For each Reference get • Set of iterations that produce cold misses • Set of iterations that produce replacement misses • Counting Misses • Can count misses at each reference • Combined counts for a loop nest

  16. Status • Trace Generation Running • Cache Prediction Running for small loops • Future Work • Multiple loop nests • Multi-level caches • Irregular programs

More Related