Optimizing Compilers CISC 673 Spring 2011 Dynamic Compilation

Optimizing CompilersCISC 673Spring 2011Dynamic Compilation John Cavazos University of Delaware

High Level View of JVM

JVM Interpreter • Reads a bytecode from a method • “Interprets” the bytecode • Decodes opcode and operands • Based on opcodes jumps to some C code • Passes operands • Continues reading bytecodes from method until: • Call • Return • Exception

Interpretation • Popular approach for high-level languages • Ex, Python, APL, SNOBOL, BCPL, Perl, MATLAB • Useful for memory-challenged environments • Low startup time & space overhead, but much slower than native code execution • MMI (Mixed Mode Interpreter) [Suganauma’01] • Fast interpreter implemented in assembler

Dynamic Compilation Techniques • Baseline compiler • Translates bytecodes one by one to machine code • Quick compilation • Reduced set of optimizations for fast compilation

Dynamic Compilation Techniques • Full compilation • Full optimizations only for selected hot methods • Classic just-in-time compilation • Compile methods to native code on first invocation • Ex, ParcPlace Smalltalk-80, Self-91 • Initial high (time & space) overhead for each compilation • Precludes use of sophisticated optimizations (eg. SSA) • Responsible for many of today’s myths

Interpretation vs JIT Execution: 20 time units Execution: 2000 time units

Selective Optimization Hypothesis: most execution is spent in a small percentage of methods (90/10 rule) Idea: use two execution strategies 1. Interpreter or non-optimizing compiler 2. Full-fledged optimizing compiler Strategy: • Use option 1 for initial execution of all methods • Profile to find “hot” subset of methods • Use option 2 on this subset

Selective Optimization Selective opt: compiles 10%-20% of methods, representing 90-99% of execution time Execution: 20 time units Execution: 2000 time units

Designing a Selective Optimizer • AKA: Adaptive Optimization System • What is the system architecture? • What are the profiling mechanisms and policies for driving recompilation? • How effective are these systems?

Basic Structure of a Dynamic Compiler Still needs good core compiler - but more Machine code Program Structural inlining unrolling loop perm Scalar cse constants expressions Memory scalar repl ptrs Reg. Alloc Scheduling peephole

Executing Program Program Basic Structure of a Dynamic Compiler Instrumented code Raw Profile Data History prior decisions compile time Optimizations Profile Processor Interpreter or Simple Translation Processed Profile Compiler subsystem Compilation decisions Controller

Method Profiling • Counters • Call Stack Sampling • Combinations

Method Profiling: Counters • Insert method-specific counter on method entry and loop back edges • Counts how often a method is called and approximates how much time is spent in a method • Very popular approach: Self, HotSpot • Issues: overhead for incrementing counter can be significant • Not present in optimized code

Method Profiling: Counters foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . . }

Method Profiling: Call Stack Sampling • Periodically record which method(s) on call stack • Approximates amount of time spent in each method • Can be compiled into the code • Jikes RVM, JRocket • or use hardware sampling • Issues: timer-based sampling is not deterministic

A B C Method Profiling: Call Stack Sampling A A A A A B B B B ... ... C C Sample

Method Profiling Mixed • Combinations • Use counters initially and sampling later on • IBM DK for Java foo ( … ) { fooCounter++; if (fooCounter > Threshold) { recompile( … ); } . . . } A B C

Recompilation Policies Problem: given optimization candidates, which should be optimized? • Counters: Optimize method that surpass threshold • Simple, but hard to tune, doesn’t consider context • Sampling: Optimize method on call stack top • Addresses context issue

Recompilation Policies Problem: given optimization candidates, which should be optimized? • Call Stack Sampling: • Optimize all methods that are sampled • Simple to implement • Use cost/benefit model • Seemingly complicated, but easy to engineer • Maintenance free • Naturally supports multiple optimization levels

Jikes RVM: Recompilation Policy – Cost/Benefit Model • Define • cur, current opt level for method m • Exe(j), expected future execution time at level j • Comp(j), compilation cost at opt level j • Choose j > cur that minimizes Exe(j) + Comp(j) • If Exe(j) + Comp(j) < Exe(cur) recompile at level j

Jikes RVM: Recompilation Policy – Cost/Benefit Model • Assumptions • Sample data determines how long a method has executed • Method will execute as much in the future as it has in the past • Compilation cost and speedup are offline averages

Optimization Levels Optimization Level Optimizations Controlled Branch Opts Low Constant Prop / Local CSE Reorder Code Opt Level O0 Copy Prop / Tail Recursion Static Splitting / Branch Opt Med Simple Opts Low Opt Level O1 While into Untils / Loop Unroll Branch Opt High / Redundant BR Simple Opts Med / Load Elim Expression Fold / Coalesce Global Copy Prop / Global CSE SSA Opt Level O2

Short Running Programs No FDO, Mar’04, AIX/PPC

Steady State No FDO, Mar’04, AIX/PPC

Steady State

Profiling for What to Do • Myth: Sophisticated profiling is too expensive to perform online • Reality: Well-known technology can collect sophisticated profiles with sampling and minimal overhead

Suggested ReadingDynamic Compilation • Adaptive optimization in the Jalapeno JVM, M. Arnold, S. Fink, D. Grove, M. Hind, and P. Sweeney, Proceedings of the 2000 ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages & Applications (OOPSLA '00), pages 47--65, Oct. 2000.

Spare Slides

Method Profiling Timer Based if (flag) handler(); • Useful for more than profiling • Jikes RVM • Schedule garbage collection • Thread scheduling policies, etc. if (flag) handler(); class Thread scheduler (...) { ... flag = 1; } void handler(...) { // sample stack, perform GC, swap threads, etc. .... flag = 0; } foo ( … ) { // on method entry, exit, & all loop backedges if (flag) { handler( … ); } . . . } A if (flag) handler(); B C

Arnold-Ryder [PLDI 01]: Full Duplication Profiling • Generate two copies of a method • Execute “fast path” most of the time • Execute “slow path” with detailed profiling occassionally • Adapted by J9 due to proven accuracy and low overhead

Optimizing Compilers CISC 673 Spring 2011 Dynamic Compilation

Optimizing Compilers CISC 673 Spring 2011 Dynamic Compilation

Presentation Transcript

Optimizing Compilers CISC 673 Spring 2011 More Control Flow

Optimizing Compilers CISC 673 Spring 2011 Inlining

Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

Optimizing Compilers CISC 673 Spring 2009 More Control Flow

Optimizing Compilers CISC 673 Spring 2011 Static Single Assignment

Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

Optimizing Compilers CISC 673 Spring 2011 Dynamic Compilation

Optimizing Compilers CISC 673 Spring 2011 Static Single Assignment II

Optimizing Compilers CISC 673 Spring 2009 Control Flow

Optimizing Compilers CISC 673 Spring 2009 Data flow analysis

Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II

Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation I

Optimizing Compilers CISC 673 Spring 2011 Yet More Data flow analysis

Optimizing Compilers CISC 673 Spring 2011 Register Allocation

Optimizing Compilers CISC 673 Spring 2009 Feedback Directed Optimization

Optimizing Compilers CISC 673 Spring 2011 Data flow analysis

Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II

Optimizing Compilers CISC 673 Spring 2011 Register Allocation

Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

Optimizing Compilers CISC 673 Spring 2011 Overview of Compilers and JikesRVM

Optimizing Compilers CISC 673 Spring 2011 Static Single Assignment II

Optimizing Compilers CISC 673 Spring 2009 More Control Flow