Some Challenges Facing Effective Native Code Compilation in a Modern Just-In-Time Compiler

Some Challenges Facing Effective Native Code Compilation in a Modern Just-In-Time Compiler Mark Stoodley and Compilation Control Team Testarossa JIT Compiler Team IBM Toronto Lab

Outline • Identification Challenge • Finding the right methods to compile • Effectiveness Challenge • What is the right way to compile those methods? • Timing Challenge • When is the right time to compile those methods? • Summary Compilation Technology

Identification Challenge:Finding the right methods to compile • What are the right methods? • Methods that will execute a lot in future • Methods that benefit best from compilation • Race with the program itself • Want to discover methods as early as possible • But minimize false positives • Cannot afford much overhead Identification Challenge

Finding Methods with Invocation Count • We detect “hot” interpreted methods via counts • Theory: invoked a lot means program executes it a lot • On the plus side: • Easy to implement, low overhead for interpreter • But : • Frequently invoked methods don’t necessarily consume lots of CPU and may not be good compilation choices • e.g. getters and setters invoked a lot but don’t consume CPU • e.g. a big matrix multiply invoked less but consumes CPU Identification Challenge

Finding Methods with Sampling • Periodically record top method of active thread stacks • In theory: • M of N ticks in one method  consuming M/N of CPU • In practice: • Depends on application characteristics • Hindered by sampling granularity Identification Challenge

Thinking about Sampling • Most operating systems give sampling period  10ms • 10ms  100Hz is Nyquist rate for signal with max frequency 50Hz • Of course, we don’t need perfect knowledge of the input “signal” • Processor speeds measured in GHz • Method invocation rate still near or in MHz band • Sampling works best for applications where methods execute for a looooooong time, e.g. apps with hot spots • What about programs that don’t have hot spots? Identification Challenge

Sampling Effectiveness Depends onPlatform and Application Characteristics • More stuff happens in 10ms on faster machines than on slower machines • Raw machine speeds vary widely • Virtualized targets are entering the mainstream • Emulated targets seem really slow • Matters more for applications without hot spots than for those with hotspots • Sampling will find hot spots • Sampling frequency too course even on slow machines when no hot spots Identification Challenge

The Identification Challenge • Identify the methods burning CPU as quickly as possible with low overhead • Machine speeds leveling off, but still a wide range of frequencies especially virtualized/emulated platforms • More cores • Beware synchronization • Cache per thread decreasing: cache footprint critical • Application characteristics evolving • New layers of abstraction • Easy to write lots of code automatically (visual interface) • Increased use of generated classes Identification Challenge

Identification Challenge: Steps We’ve Taken • Sampling framework: relative hotness • Compiles not only triggered by absolute sample count • Instead: how hot is this method compared to all others • Sampling windows adjusted based on method size • More likely to catch samples in big methods • Small hot methods harder to find • Make it easier for them to reach compilation trigger • Large set of heuristics in this space Identification Challenge

Effectiveness Challenge:What’s the right way to compile a method? • Depends on many factors: • Application phase • Application requirements • Application characteristics • Availability of resources • System utilization Effectiveness Challenge

Example: Middleware Server + Application • IBM WebSphere Application Server startup: • Loads 15,000 classes (includes DayTrader application) • Executes more than 38,000 methods • Takes 10s – 120s, depending on platform • Application then runs for extended period • Some methods active in start-up and steady-state • Forces trade-off: start-up vs. steady-state performance Effectiveness Challenge

Start-up executes lots of methods a few times • Want to compile many many methods cheaply • Native code performance for highest number of methods • Cheap compilations means better coverage • Also methods can appear hot at startup that aren’t important later • Benefit of aggressive optimization is lower • Class hierarchy is highly unstable • Careful about methods also active in steady-state • Cheap compilations also means slower code • Will need fast performance for these methods in steady-state Effectiveness Challenge

Steady-State is very different from Start-up • Flat profile, thousand(s) of active methods • Want to compile many methods more aggressively • Best throughput performance • Class hierarchy stabilized so aggressive opts more worthwhile • Application code complexity requires profiling and analysis • Large application scale limits effectiveness of some opts • Tough to find methods that matter due to flat profile • Also to upgrade cheap compilations from startup that matter Effectiveness Challenge

Classic Phase Identification Problem, Right? • Distinguish “start-up” from “steady-state” • Apply different compilation strategy in each phase • Testarossa uses class load phase heuristic • Loading lots of classes means start-up • Compilations during class load phase done cheaply (cold) • Compilations outside class load phase more aggressive (warm) • Mostly works • But not easy Effectiveness Challenge

Class Load Phase Heuristic Complexities • What does “lots” of classes mean? • Need to establish some threshold • IBM SE JVMs supports 12 platforms • Ranging from laptops to mainframes • Processor / memory / disk speeds vary substantially from machine to machine and platform to platform • Especially with growth of virtualized and emulated targets • Sensitivity • How long to wait before saying in or out? • How long does the decision last? Effectiveness Challenge

More Complexities • Compiles hurt more on some platforms than others • Slower systems seem to pay a higher (relative) price • Easy to miss mistakes because they don’t hurt you everywhere • Not all class loads are equal • Classes vary widely in size • Increased use of generated classes • Tools that “precompile” to bytecode Effectiveness Challenge

Some (Annoying) Facts Fact #1: Lots of people care about how fast the application can process transactions (steady-state throughput) Fact #2: Lots of people care about how fast the server can start (startup time) Personal Observation: These two sets intersect less than I’d like Fact #3: Everyone wants what they care about to get better Really Annoying Fact of Life: People complain a LOT if the thing they care about gets worse Really Annoying Fact of Life: Customers rarely care if something works well for their platform but not for another platform Effectiveness Challenge

…And I’ve Simplified the Problem • Other criteria matter too (not just start-up and throughput) • Throughput ramp-up time • Throughput variability from run to run • Maximum application pause • Application utilization • Power and energy consumption are becoming important • Memory for code and used by JIT • All these criteria are also sensitive to the target platform • Matter to varying degrees, from not at all to very very much • Evolving heuristics is really hard Effectiveness Challenge

The Effectiveness Challenge • Properly account for relative importance of a growing set of criteria when generating native code while adapting to the characteristics of increasingly complex applications running on a wide range of targets • We’ve always had to deal with platform sensitivity • Increases the challenge Effectiveness Challenge

Effectiveness Challenge: Steps We’ve Taken • Adaptive class load phase • Tries to adjust for different machine speeds • Ahead-Of-Time (AOT) compilation • Store code in a persistent cache • Avoid compilation cost completely • Can be used to amortize compilation cost across JVMs • Trade-off is lower code quality for Java conformant persistence Effectiveness Challenge

Timing Challenge:When is the Right Time to Compile a Method? • A1: As early as possible (maximize benefit?) • A2: After behaviour has “settled” • Resolve references, class hierarchy stabilized, code paths executed • A3: When application is idle (minimize impact?) • A4: Not “now” • e.g. Real-time applications have utilization expectations • e.g. CPU consumption may cost money • A5: RIGHT NOW! Timing Challenge

The Timing Challenge • Compile methods at the right time to maximize benefit and minimize impact to application • Current approach relies on when we identify a method • “Benefit” comes back to effectiveness Timing Challenge

Timing Challenge: Steps We’ve Taken • Avoid aggressive class hierarchy optimizations during startup • Real-Time: lower compilation thread priority so real-time tasks take precedence • Real-Time: Avoid compiling while GC is active • Dynamic Loop Transfer • Identify interpreted methods “stuck” in a loop • Generate a compiled body that can accept transition from interpreter on the loop back-edge Compilation Technology

Summary • Three big challenges facing effective native code generation in modern JITs: • Identification Challenge: find the right methods to compile • Effectiveness Challenge: compile methods in the right way • Timing Challenge: compile methods at the right time • Complex system: lots of overlap in these challenges • Any functional JIT must deal with these challenges • But degree of success varies! Compilation Technology

Questions? • Mark Stoodley • mstoodle@ca.ibm.com Compilation Technology

Backup Slides Compilation Technology

When do you find the method? • After program completes • Good knowledge of what methods matter • But no opportunity to improve program execution • Before program executes • Most opportunity to improve execution time • But no knowledge of what to focus on • People tend to get stuck on “most opportunity” • But how much improvement will native code bring? • Answer can depend on when you compile it Compilation Technology

Some Challenges Facing Effective Native Code Compilation in a Modern Just-In-Time Compiler