160 likes | 266 Views
The SIGMA tools provide detailed cache statistics and hints for restructuring real applications in MPI and openMP programs. With features like memory profiler and cache prediction tool, SIGMA extracts loops and predicts cache misses using symbolic equations.
E N D
The SIGMA Tools Jeff Hollingsworth (University of Maryland) Luiz Derose K Ekanadham (IBM Research)
Sigma Goals • Family of tools to understand caches • Focus of detailed statistics • Complement existing hardware counters • Ability to handle real applications • MPI and openMP programs • Fortran and C • Provide hints about restructuring • Padding (both inter and intra data structures) • Blocking
Approach • Run instrumented program • Capture full information about memory use • Produce compact trace • Extracts loops and memory strides • Post execution tools • Memory profiler • share of accesses due to each data structure • Cache Prediction Tool • Predict cache misses using symbolic equations • Detailed simulator • Full discrete event simulator
dumpMap .addr ProgramExecution trace files Instrumentedbinary CacheSimulator PredictionTool MemoryRef Tool Structure of SIGMA Data Collection source files SigmaCompile/Link .lst files
New Dyninst Features for SIGMA • Fortran Common Blocks • Class BPatch_cblock • Represents a unique definition of a common block • getComponents – returns members of the common block • getFunctions – returns functions that define this block • Class BPatch_type • getCblocks – returns list of BPatch_cblock • Global Variables • Named common blocks now visible • Fortran specific Debug Symbols • Now parsed and visible
RPT BLK1 ADR ADR ADR BLK2 ADR ADR BLK3 250 100 200 300 300 500 7 4 4 4 4 4 Representing Program Execution • Capture full execution behavior • Record all basic blocks and memory addresses • Produces large traces (due to looping) • Trace compression • Maintain pattern buffer • Scan for repeating patterns • Extract memory strides • Repeat algorithms for nested loops Base Count Length Stride
Trace Information • Compression ratio a function of regularity • Slowdown depends on fraction of instructions that load/store memory
Using SIGMA Trace Generation • Compiling - modify makefile • .f to .o rules • prepend $(SIGMA)/bin/sigmaCompile $< • Link step • prepend $(SIGMA)/bin/sigmaLink • Running • Two environment variables • SIGMA_TRACELEVEL • SIGMA_TRACEDIR • Selected instrumentation • Only sigmaCompile selected files • No overhead for uninstrumented files • Explict calls to enable/disable • Some overhead remains
Cache Prediction Tool • Use compressed traces • Convert memory refs back to array refs • Compute Miss Equations • re-use vectors (Ghosh & Martonosi) • Direct set of linear constraints (Chatterjee et. al) • To Compute Misses • define misses as a system of linear equations • use Omega library to solve • Provides • count of misses • information about iterations that cause misses
Iteration Space • Re-use vectors • defines points in the iteration space that access the same data • Miss equations • describe points in interaction space that cause misses on conflicts
Predicting cache misses • Operate on compact traces • Only expand to full trace if needed • Use algorithms developed for compilers • Re-use vectors • Cache miss equations • Miss types are identified • capacity, cold, and conflict
Cache Terminology Memory consists of lines L Cache -way associate Each Line maps to a set S
Array References • A reference Rv(i1,i2) refers to • the vth array reference in a loop • the i1th iteration of the outer loop • the i2nd iteration of the inner loop • Rv(i1,i2) precedes Ru(j1,j2) if • i1 < j1 or • i1 = j1 and i2 < j2 or • i1 = j1 and i2 = j2 and v < u
A Replacement Miss • There exists a reference Ra(i1,i2) such that • Ra(i1,i2) refers to line L and maps to set S • There exists another Rb(j1,j2) such that • Rb(j1,j2) refers to line L and maps to set S • Rb(j1,j2) precedes Ra(i1,i2) • There exist at least references such that • Rn(k1,k2) maps to set S • Rn(k1,k2) refers to line line Ln where • Ln is distinct from all other Ln’s and L • Ra(j1,j2) precedes Rb(k1,k2) precedes Rb(i1,i2)
Using Miss Data • For each Reference get • Set of iterations that produce cold misses • Set of iterations that produce replacement misses • Counting Misses • Can count misses at each reference • Combined counts for a loop nest
Status • Trace Generation Running • Cache Prediction Running for small loops • Future Work • Multiple loop nests • Multi-level caches • Irregular programs