1 / 19

Feedback Directed Dynamic Recompilation for Statically Compiled Languages

Feedback Directed Dynamic Recompilation for Statically Compiled Languages. Dorit Nuzman , Sergei Dyshel, Revital Eres IBM Research, Haifa. Thematic Session on Dynamic Compilation HiPEAC Computing Systems Week Paris, May 3 rd 2013. Performance problem. Motivating Scenario. No.

ronnie
Download Presentation

Feedback Directed Dynamic Recompilation for Statically Compiled Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research, Haifa Thematic Session on Dynamic Compilation HiPEAC Computing Systems Week Paris, May 3rd 2013

  2. Performance problem Motivating Scenario No Increase target platform level? Nope Increase optimization level? Can’tdo Apply feedback directed optimization? (IBM’s) customer Third party software owned by some ISV Independent Software Vendor Power780 server Computer System Vendor (e.g., IBM)

  3. Performance problem Motivating Scenario Program Source Code • opt = -O2 • arch = common • no-profile (IBM’s) customer Dynamic execution stage Fat Binary Runtime Engine Static Compiler Native machine code Profiler JIT compiler Intermediate Representation Independent Software Vendor Power780 server Computer System Vendor (e.g., IBM)

  4. Performance problem Motivating Scenario Program Source Code • opt = -O2 • arch = common • no-profile (IBM’s) customer Dynamic execution stage Fat Binary Runtime Engine Static Compiler Native machine code Profiler JIT compiler Intermediate Representation Independent Software Vendor Power780 server Computer System Vendor (e.g., IBM)

  5. Our approach: Fat Binary based, feedback-directed, dynamic recompilation selective profile-driven recompilation Program Source Code Dynamic execution stage Fat Binary Runtime Engine Static Compiler Native machine code Profiler JIT compiler Intermediate Representation • Used for years in dynamic languages & Java  Needed also for static languages • Opposed to dynamic binary optimization: includes high-level semantic information allows aggressive, speculative transformations

  6. Background • O3 (O4, O5) • Inter-procedural • Auto-vect/par • Feedback-directed • Hardware-specific • Modern compilers provide sophisticated optimizations. • Complicates build process • Prolongs development & testing cycle • Requires per-customer tuning – too costly • No representative input • These optimizations are usually not used. • Only in benchmarking and HPC • We can gain back the lost performance benefit by applying the optimizations dynamically, at runtime.

  7. Dynamic Recompilation • Solves the static-compiler usability issue • Transparentfeedback-directed optimization for current workload. • Tuning for current hardware • Separation of optimization from software production • Allows adaptive optimization. • Allows iterative optimization. • Virtualization & Cloud: physical resources known only at runtime, and continuously change

  8. Dynamic Recompilation for Static Languages Other Approaches: • Focus only on very long running programs with heavy workloads to compensate for time spent profiling. • Focus on optimization across consecutive runs of repetitive programs • Domain specific (focus on a specific optimization, to a small pre-selected part of the code) • Trace-based binary-optimization …Our Goal: Demonstrate an execution environment with overheads that are low enough to allow the dynamic optimizer to speed up execution of the current invocation, for regular programs/workloads. …Our Goal: Demonstrate an execution environment with overheads that are low enough to allow the dynamic optimizer to speed up execution of the current invocation, for regular programs/workloads.

  9. Our approach: Fat Binary based, feedback-directed, dynamic recompilation Program Source Code Dynamic execution stage Fat Binary Runtime Engine Static Compiler Native machine code Profiler JIT compiler Split-IR

  10. Runtime Monitoring and Recompilation Optimized methodversion Instrumented methodversion Original methodversion Recompilation thread Recompilation cost Instrumentation Optimization t2 t3 t6 t7 Synchronization cost Startup cost (loading & mapping) t1 t4 t5 t0 t8 timeline t9 Execution and sampling thread Instrumentation-based profiling sampling-based profiling for method hottness Slow instrumented execution monitoring overhead

  11. SPECint2006: Dynamic Optimization Overheads – “ref” dataset Stress test1: using highly statically-optimized executable (–O3 -qhot) Overall not degrading performance.

  12. SPECint2006: Dynamic Optimization Overheads – “train” dataset Stress test2: using highly statically-optimized executable (–O3 -qhot) Works also for very short running programs. Currently limited gain from FDO alone.

  13. Optimization effect (isolated from overheads) (1) Similar impact gained using sampled profile as with using a “perfect” profiles. the problem is not it the profile quality (2) offline optimizer applies link-time FDO (cross methods and modules). Our optimizer limited currently to single module

  14. programs are statically under-optimized / moderately-optimized Program Source Code • opt = -O2 • arch = common • no-profile (IBM’s) customer Dynamic execution stage Fat Binary Runtime Engine Static Compiler Native machine code Profiler JIT compiler Intermediate Representation Independent Software Vendor Power780 server Computer System Vendor (e.g., IBM)

  15. moderately-optimized scenario (program statically compiled with –O2) SPECint2006: Overall Effect of Dynamic Execution (ref) Selected methods from the program dynamically recompiled using a higher optimization level. Overall 7% improvement on average

  16. Recompilation Statistics moderately-optimized scenario (program statically compiled with –O2) Selected methods from the program dynamically recompiled using a higher optimization level. Overall 8% improvement on average Overall 7% improvement on average Default recompilation mode (default method hotness threshold) Aggressive recompilation mode (lower method hotness threshold)

  17. More Benchmarks: SQlite • SQlite: • Static version compiled with default compiler options: -O2 warm. • Using 1G of TPC-H tables. • (smallest dataset) • Using TPC-H queries: • Stream of 13 instances of query #1  13% improvement from dynamic FDO • Most improvement comes from higher optimization level.

  18. Summary and Conclusions • Overall cost of runtime optimization environment, including • environment startup cost • recompilation • profiling overheads is less than 2% on average (SPECint2006) • For highly optimized native binaries, on average, there is no overall degradation • These low overheads imply that the fat-binary based approach is practical for real-world use-cases and workloads • Feedback directed optimization can easily surpass these costs • Aggressive optimization level for selected methods at runtime brings up to 20% speedup, and an 8% average speedup • Much more potential available: •  more aggressive optimizations: loop-nest, memory-hierarchy, parallelization •  more profiling (event based?) •  more synergy with static compiler more synergy with underlying (virtual) environment, to adapt to changes

  19. Thematic Session on Dynamic Compilation 1) What is the dynamic optimization stage?  During program execution 2) What triggers the dynamic compilation cycle?  A method gets warm 3) How are these triggers being detected?  sampling execution/PCs (via time interrupts & code instrumentation) to monitor application behavior 4) How/when are the above triggers being inserted?  at run-time 5) What is the recompilation scope/granularity? method 6) What is the target application domain?  general purpose/commercial applications 7) What is the input code for the dynamic optimization?  fat-binary (binary + IR) 8) What is the programming language of the target applications?  statically compiled languages (C/C++...) 9) What specific adaptation / optimization / code-transformation is applied?  general feedback-directed optimizations (BB ordering, …)

More Related