1 / 29

Lengthening Traces to Improve Opportunities for Dynamic Optimization

Lengthening Traces to Improve Opportunities for Dynamic Optimization. Chuck Zhao , Cristiana Amza, Greg Steffan, University of Toronto Youfeng Wu Intel Research. Feb. 16, 2007 Interact-12, HPCA. Intel’s StarDBT Project. StarDBT A D ynamic B inary T ranslation framework

tilly
Download Presentation

Lengthening Traces to Improve Opportunities for Dynamic Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lengthening Traces to Improve Opportunities for Dynamic Optimization Chuck Zhao, Cristiana Amza, Greg Steffan, University of Toronto Youfeng Wu Intel Research Feb. 16, 2007 Interact-12, HPCA

  2. Intel’s StarDBT Project • StarDBT • A Dynamic Binary Translation framework • Operates on traces, optimizes hot traces • Long term goal: Use StarDBT to allow legacy apps to exploit TM support • (NOT by automatically parallelizing legacy apps) • Allow speculative sequential optimizations • Use hardware TM’s checkpoint/restore • Problem: default traces are too small • TM overheads would overwhelm benefits Challenge: lengthening traces can be tricky

  3. Trace Formation basic-block profile trace profile A A B B C C D D F E E F G G off-trace stub on-trace blocks Control flow that goes off-trace can be costly

  4. Trade-offs when Lengthening Traces • Completion ratio: • likelihood of execution • staying on trace • percentage of execution • reaching trace tail side-exit ratio A B 5% D A F 5% B G 5% D 5% A F B 5% G 5% Tradeoffs: longer traces have more optimization opportunities longer traces have more side-exit branches D F 5% G 100% - 10% = 90% 100% - 25% = 75% completion ratio Sweet spot exits in between, can we find it?

  5. Our Work So Far (i.e., this talk) • Lengthening traces while maintaining completion ratios • Through unrolling and straightening • A characterization of the impact on traces • length, completion ratio, unroll factor, … • Improving optimization opportunities on longer traces • Improve Local Value Numbering (LVN) hits • Measurement of impact on performance is pending • Performing on-the-fly actions by DBT system • Decisions made by instrumenting/sampling code online

  6. Related Work • Binary Translation Systems • Dynamo • DynamoRIO • PIN • StarDBT • transparent translation • x86 legacy code • Trace Collection and Optimizations • Java JIT • Dynamo, DynamoRIO, Mojo • StarDBT • x86 binary level • MRET2 to improve trace formation • aggressive trace optimizations First full analysis of trace-lengthening issues for DBT systems

  7. StarDBT Trace Types c dispatcher b d a self type other trace type elsewhere type

  8. a a a Lengthening Traces Through Unrolling 90% 81% 72.9% a completion ratio: 90% Unrolling increases trace’s length, but reduces completion ratio

  9. a a a a a Finding the Sweet-Spot Unroll Factor ... chosen by system designer given porig = 99% and ptarget = 90% Traces with 100% completion ratio: set N = 10

  10. d Lengthening Traces Through Straightening c b b c We don’t yet implement/evaluate straightening

  11. Evaluation

  12. Distribution of Original Completion Ratios original completion ratio Original Completion Ratios Majority of hot traces have completion ratios in 90%-100%

  13. Impact of Unrolling on Hot Trace Size 36% longer completion ratio Average Number of Instructions Select SPECIntCPU 2000 bmarks with MinneSpec input Lengthening increases hot trace size by more than 36%

  14. How Much are Traces Unrolled? Target completion ratio Average Unroll Factor 1.38-1.58x Not unrolled Hot traces are unrolled on average by 1.38x or more

  15. Average Completion Ratio After Lengthening 90% 80% <0.5%  70% completion ratio 60% 50% Completion Ratio 40% 30% 20% 10% Lengthening traces reduces completion ratio by < 0.5%

  16. Impact of Lengthening on Optimizations

  17. Local Value Numbering (LVN) • No need to build Control Flow Graph (CFG) • Partial info • No need to perform Data Flow Analysis (DFA) • Expensive, rely on CFG • Can be arranged into a single-pass scan • Ease of implementation • Relatively light weight algorithm • Performs three optimizations: • Common Subexpression Elimination (CSE) • Copy Propagation (CP) • Dead-Code Elimination (DCE) LVN is common in JIT optimizers

  18. Ex: LVN On a Lengthened Trace Original Traces Lengthened Trace Optimized Trace … c = a + b d = a e = b … c3 = a1 + b2 d1 = a1 e2 = b2 f3 = d1 + e2 f3 = c3 d4 = x4 … … c = a + b e = b f = c d = x … DCE hit f = d + e d = x … CSE hit

  19. LVN Hits Improvement (%) 35% 30% target completion ratio 25% 20% % Increase in LVN Hits 15% 10% 5% 10+% more LVN hits are available through lengthening

  20. Ongoing Work • Complete DBT Optimization Framework • Evaluate speculative optimizations on long hot traces with high completion ratios • Automatically determine optimal transaction granularity • Use HTM to support trace-based speculative optimizations

  21. cmp 90+% 10-% ld x=[y] … Control Speculation A Compiler Framework for Speculative Analysis and Optimizations: Lin et. al, PLDI 03 ld.s x = [y] if(c){ chk.s x, recovery next: … } recovery: ld x=[y] jmp next

  22. cmp 90+% 10-% ld x=[y] … Use HTM to Support Trace-based Speculative Optimizations start_tx ld x = [y] if(c){ chk x, abort_tx … } commit_tx Use longer traces with high completion ratio as tx granularity HTM hardware support simplifies speculative optimization

  23. Conclusion • Traces can be effectively lengthened • increase in trace size by 36+% • decrease completion ratio by less than 0.5% • Longer traces provide better opportunities for optimization • increase in LVN hits by 10%+

  24. Q + A

  25. Complete StarDBT Optimization Framework • X86 CISIC ISA • code patching won’t work • Really need a code generator and IR • Design + implement a low-level Runtime IR • close to hardware • capture + represent all necessary low-level info • easy to convert from/to machine code • easy to implement analysis and optimizations • Starting point • Dynamo IR • LLVM IR • GCC RTL • …

  26. StarDBT Overall Structure

  27. Trace Formation Heuristics • MRET: Most Recent Execution Tail • originally proposed by Dynamo • Trace head • loop head (backward branch target) • sampling counter reaches a certain threshold • Trace tail • satisfy certain trace-tail conditions • MRET2: 2-pass MRET • perform 2 independent MRET trace formation • intersect traces with common head

  28. Traces and Hot Traces • Trace • MRET2 recognize trace heads • Trace tails satisfy certain conditions • Blocks in between become a trace • Hot Trace • Based on recognized Traces • Put in additional software counters • head: head counter • each early-exit branch: off-trace counters • sampling: hot-trace’s completion ratio

More Related