1 / 25

Adaptive Optimization in the Jalapeño JVM

Adaptive Optimization in the Jalapeño JVM. M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney. Presented by Andrew Cove 15-745 Spring 2006. Research JVM developed at IBM T.J. Watson Research Center Extensible system architecture based on federation of threads that communicate asynchronously

aizza
Download Presentation

Adaptive Optimization in the Jalapeño JVM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptive Optimization in the Jalapeño JVM M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney Presented by Andrew Cove 15-745 Spring 2006

  2. Research JVM developed at IBM T.J. Watson Research Center • Extensible system architecture based on federation of threads that communicate asynchronously • Supports adaptive multi-level optimization with low overhead • Statistical sampling Jalapeño JVM

  3. Extensible adaptive optimization architecture that enables online feedback-directed optimization • Adaptive optimization system that uses multiple optimization levels to improve performance • Implementation and evaluation of feedback-directed inlining based on low-overhead sample data • Doesn’t require programmer directives Contributions

  4. Written in Java • Optimizations applied not only to application and libraries, but to JVM itself • Boot Strapped • Boot image contains core Jalapeño services precompiled to machine code • Doesn’t need to run on top of another JVM • Subsystems • Dynamic Class Loader • Dynamic Linker • Object Allocator • Garbage Collector • Thread Scheduler • Profiler • Online measurement system • 2 Compilers Jalapeño JVM - Details

  5. 2 Compilers • Baseline • Translates bytecodes directly into native code by simulating Java’s operand stack • No register allocation • Optimizing Compiler • Linear scan register allocation • Converts bytecodes into IR, which it uses for optimizations • Compile-only • Compiles all methods to native code before execution • 3 levels of optimization • … Jalapeño JVM - Details

  6. Optimizing Compiler (without online feedback) • Level 0: Optimizations performed during conversion • Copy, Constant, Type, Non-Null propagation • Constant folding, arithmetic simplification • Dead code elimination • Inlining • Unreachable code elimination • Eliminate redundant null checks • … • Level 1: • Common Subexpression Elimination • Array bounds check elimination • Redundant load elimination • Inlining (size heuristics) • Global flow-insensitive copy and constant propagation, dead assignment elimination • Scalar replacement of aggregates and short arrays Jalapeño JVM - Details

  7. Optimizing Compiler (without online feedback) • Level 2 • SSA based flow sensitive optimizations • Array SSA optimizations Jalapeño JVM - Details

  8. Jalapeño JVM - Details

  9. Sample based profiling drives optimized recompilation • Exploit runtime information beyond the scope of a static model • Multi-level and adaptive optimizations • Balance optimization effectiveness with compilation overhead to maximize performance • 3 Component Subsystems (Asynchronous threads) • Runtime Measurement • Controller • Recompilation • Database (3+1 = 3 ?) Jalapeño Adaptive Optimization System (AOS)

  10. Jalapeño Adaptive Optimization System (AOS)

  11. Sample driven program profile • Instrumentation • Hardware monitors • VM instrumentation • Sampling • Timer interrupts trigger yields between threads • Method-associative counters updated at yields • Triggers controller at threshold levels • Data processed by organizers • Hot method organizer • Tells controller the time dominant methods that aren’t fully optimized • Decay organizer • Decreases sample weights to emphasize recent data Subsystems – Runtime Measurement

  12. A hot method is where the program spends a lot of its time • Hot edges are used later on to determine good function calls to inline • In both cases, hotness is a function of the number of samples that are taken • In a method • In a given callee from a given caller • The system can adaptively adjust hotness thresholds • To reduce optimization in startup • To encourage optimization of more methods • To reduce analysis time when too many methods are hot Hotness

  13. Orchestrates and conducts the other components of AOS • Directs data monitoring • Creates organizer threads • Chooses to recompile based on data and cost/benefit model Subsystems – Controller

  14. To recompile or not to recompile? • Find j that minimizes expected future running time of recompiled m • If , recompile m at level j • Assume, arbitrarily, that program will run for twice its current duration • , Pm is estimated percentage of future time Subsystems – Controller

  15. System estimates effectiveness of optimization levels as constant based on offline measurements • Uses linear model of compilation speed for each optimization level as function of method size • Linearity of higher level optimizations? Subsystems – Controller

  16. In theory • Multiple compilation threads that invoke compilers • Can occur in parallel to the application • In practice • Single compilation thread • Some JVM services require the master lock • Multiple compilation threads are not effective • Lock contention between compilation and application threads • Left as a footnote! • Recompilation times are stored to improve time estimates in cost/benefit analysis Subsystems – Recompilation

  17. Statistical samples of method calls used to build dynamic call graph • Traverse call stack at yields • Identify hot edges • Recompile caller methods with inlined callee (even if the caller was already optimized) • Decay old edges • Adaptive Inlining Organizer • Determine hot edges and hot methods worth recompiling with inlined method call • Weight inline rules with boost factor • Based on number of calls on call edge and previous study on effects of removing call overhead • Future work: more sophisticated heuristic • Seems obvious: new inline optimizations don’t eliminate old inlines Feedback-Directed Inlining

  18. System • Dual 333MHz PPC processors, 1 GB memory • Timer interrupts at 10 ms intervals • Recompilation organizer 2 times per second to 1 time every 4s • DCG and adaptive inline organizer every 2.5 seconds • Method sample half life 1.7 seconds • Edge weight half life 7.3 seconds • SPECjvm98 • Jalapeño Optimizing Compiler • Volano chat room simulator • Startup and Steady-State measurements Experimental Methodology

  19. Results • Compile time overhead plays large role in startup

  20. Results • Multilevel Adaptive does well (and JIT’s don’t have overhead)

  21. Results • Startup doesn’t reach high enough optimization level to benefit

  22. Assuming execution time will be twice the current duration is completely arbitrary, but has nice outcome (less optimization at startup, more at steady state) • Meaningless measurements of optimizations vs. phase shifts • Due to execution time estimation Questions

  23. Does it scale? • More online-feedback optimizations • More threads needing cycles • Organizer threads • Recompilation threads • More data to measure • Especially slow if there can only be one recompilation thread • More complicated cost/benefit analysis • Potential speed ups and estimate compilation times Questions

  24. Questions

  25. Questions

More Related