Extended Whole Program Paths

Extended Whole Program Paths Sriraman Tallam Rajiv Gupta Xiangyu Zhang University of Arizona

Control Flow and Dependence Traces • Control Flow Traces • Sequence of basic blocks. • Identification of hot paths. • Path Sensitive Instruction Scheduling and Optimization. • Path Prediction and Instruction Fetching. • Dependence Traces • Capture data dependences. • Flow from a definition to a use. • Data Speculative Optimizations for Itanium. • Computation of Dynamic Slices.

Control Flow and Dependence Traces • Control Flow Traces are smaller than Dependence Traces and can be compressed well. • Average size for Spec 2K benchmarks is 179 MB. • Compression Factor • Sequitur – 681 • VPC – 442 • Dependence Traces are large and do not compress as well as Control Flow Traces. • Average size for Spec 2K benchmarks is 565 MB. • Compression Factor • Sequitur – 1.31 • VPC – 5.8 • Is there an alternative trace representation ?

Our Approach • Extended Control Flow Trace – Unified Trace Representation. • Capture both control flow and dependence information. • The data dependences are embedded as control flow. • The unified trace is smaller than control flow + dependence traces. • Our compressed unified trace is also smaller than the compressed control flow + compressed dependence traces.

If p==&X 5 = X 6 Goals in Designing the eCF 1 X = _ • The dependence can now not be recovered due to possible aliasing. • Additional Control Flow can capture the dependence. • The dependence can be recovered from the Control Flow. 3 2 X = _ *p = _ = X 4 4

Cost of Capturing Dependences • No-cost capture • For these dependences, no disambiguation checks are needed. • Fixed cost capture • The number of disambiguation checks needed is a constant. • Variable cost capture. • The number of disambiguation checks varies.

No Cost Capture • All instances of the dependence can be recovered from the control flow trace.

Fixed Cost Capture • A single disambiguation check is sufficient to capture this dependence. Single Check

Variable Cost Capture • The instances of the dependence can be caused by any instance of the definition statement. Multiple Checks

Cost of Instrumentation and Trace Compressibility • Reducing the number of checks • Reducing the size of the generated trace. • Reduction in run-time overhead. • Improving the Compressibility • Similar Control Flow Signatures.

Two Phased Approach • Conservative nature of Static Pointer Analysis. • Too many potential dependences per use. • Two phased Approach • Filtering Phase • Find all dependences exercised. • Profiling Phase • Add disambiguation checks only for those dependences exercised.

Binary Search vs. Linear Search • Track the last definition and instance of every write to a memory address. • Search the address array using binary search instead of linear search.

Optimizing Trace Length and Compressibility

Experimental Results • Implementation on the Microsoft Phoenix RDK. • Spec 2K benchmark binaries were rewritten to obtain instrumented versions. • Easy to implement using Phoenix. • Intermediate representation was low-level x86 instruction set. • Split dependences into register and memory. • Register dependences are always recoverable from control flow trace. • Memory dependences were recovered using our approach.

Register and Memory dependences • A Significant (76 %) of dependences (register) can be recovered from the control flow trace

Uncompressed Trace Sizes Cont. + Dep. Unified Ratio • The unified trace is 62 % of the size of Control Flow + Dependence Trace

Sequitur Compressed Cont. + Dep. Unified Ratio • The compressed unified trace is 4 % of the size of compressed Control Flow + Dependence Trace

VPC Compressed Cont. + Dep. Unified Ratio • The compressed unified trace is 21 % of the size of compressed Control Flow + Dependence Trace

Memory Dependence Types • 30 % of dependences can be recovered at no cost.

Address Comparisons • Binary Search reduces the address comparisons by 4 orders of magnitude.

Run-time Overhead • There is a 20 % increase in run-time overhead in collecting the unified trace.

Conclusions • We have designed an extended control flow trace that captures both control flow and data dependence history. • The key to the unified trace is the ability to convert memory data dependences into control flow. • The resulting unified trace is smaller than the combined control flow + dependence trace. • The run-time overhead increases by 20 %. Our Thanks to Hoi Vo of Microsoft Corporation and the Phoenix Compiler Infrastructure Group.

Extended Whole Program Paths