1 / 24

Java Bytecode Optimization

Optimizing Java Bytecode for Embedded Systems Stefan Hepp. Java Bytecode Optimization. Overview. Toolchain JOP, JIT vs. ahead-of-time compilation Existing open source tools JOPtimizer framework and code representations Inlining Results. Toolchain Overview.

atara
Download Presentation

Java Bytecode Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimizing Java Bytecode for Embedded Systems Stefan Hepp Java Bytecode Optimization

  2. Overview • Toolchain JOP, JIT vs. ahead-of-time compilation • Existing open source tools • JOPtimizer framework and code representations • Inlining • Results

  3. Toolchain Overview • Sourcecode compiled withjavac to Java bytecode • Optimization defered to JVM, profiling information and JIT compiler is used • Not feasable on embedded processors like JOP

  4. Toolchain Overview • Ahead-of-timeoptimization needed • Optimization of bytecode for target platform • Output is Java bytecode • Profiling vs. static WCET

  5. Toolchain Overview • Advantages over JIT • Runtime is not critical • No warm-up phase to gather profiling information and to do JIT compiling • Disadvantages • Less accurate/no profiling information available at design-time, class hierarchy may change dynamically • Target platform must be known

  6. Existing Tools • Soot framework looks promising, but not designed for embedded systems and very complex • Other open source tools usually only remove unused methods and obfuscate code

  7. JOPtimizer • JOPtimizer: a new framework for optimizations • Intermediate code representations • Inlining which respects method size restrictions

  8. Assumtions • Assumptions about embedded applications • No dynamic class loading or class modifications at runtime • Reflection is not used • All class files are available at compile-time (except native classes) • Allows more optimizations (but assumtions can be disables) • Exclude “library” code (like java.*) • Define: library classes must not extend/reference application classes

  9. Java Class Files • A class consists of: • ConstantPool: indexed table of constants (numbers, Strings, class names, method names, signatures, .. ) • Classname, super-class, interfaces (references to CP) • Fields, methods: name, signature, flags • Method code as attribute of methods • Stack architecture with variable length encoding • Parsing and compiling of classfiles done by existing Libraries (BCEL, ASM, ...)

  10. The JVM Instruction Set • (partially) typed stack instructions • 32bit (int, float, reference, byte, short, ..) and 64bit (long, double) variables • exception-handling, synchronization, subroutines • Stack- and variable table entries always 32bit • No indirect jumps, stack size must be static private test(I)V ICONST_2 NEWARRAY T_INT ASTORE 2 FCONST_2 FSTORE 3 ALOAD 2 ICONST_0 ILOAD 1 FLOAD 3 F2I private Map m; private void test(int i) { int j[] = new int[2]; float a = 2.0f; j[0] = i * (int) a; m.put(this, j); } IMUL IASTORE ALOAD 0 GETFIELD #4 ALOAD 0 ALOAD 2 INVOKEINTERFACE #7 POP RETURN

  11. Stackcode Representation • Internal representation (“stackcode”) • Types and constant values as parameters of instructions to reduce number of different instructions (~40 stackcode instructions) • Stack emulation to determine operand types for all instructions (swap, dup, ..) • Variables and types instead of 32-bit slots • Constant values instead of references into CP • split basic blocks at exception handler ranges too • Still a stack architecture • Stackcode can be mapped directly to bytecode (allows analysis of code size and execution time)

  12. Quadcode Representation • Stack creates implicit dependencies between instructions and blocks, makes optimizations more complex • Quadruple form of code (“quadcode”) • Create local variable per stack slot, emulate stack to determine the arguments of instructions • Instructions with types and constants as parameter • Instructions to manupulate stack not needed (pop) or replaced with copy instructions (load, swap, dup, ..)

  13. Quadcode Representation • Quadcode representation enables simpler implementation of optimizations, but code cannot be mapped to bytecode directly • Stackcode and Quadcode similar to Soot internal representations (Baf, Jimple, Shimple) public int calc(int a, int b) { copy.ref s0, l0 // load.ref l0 // aload_0 getfield.'Test.fField' s0, s0 // getfield 'Test.fField' // getfield #3 copy.float s1, 2.0f // push.float 2.0f // fconst_2 binop.float.div s0, s0, s1 // binop.float.div // fdiv copy.float l3, s0 // store.float l3 // fstore_3 copy.int s0, l1 // load.int l1 // iload_1 return.int s0 // return.int // ireturn }

  14. Creation of Bytecode • Transformation back from quadcode to bytecode • Create complete expressions from instructions (“Decompile” code), compile expression trees to JVM instructions like javac (Soot does this (Grimp)) • Create stack form of quadruple instructions, compile to bytecode (JOPtimizer does this, optional in Soot) • Per quadcode instruction: load parameters on stack, execute operation and store result back • load/store elimination and local variable allocation for stackcode needed before bytecode can be created • Decompilation method of Soot gets slightly better results

  15. Inlining • Invocations are expensive on JOP • Inline methods to eliminate invokation overhead • Inlining is not always possible • Callee code restrictions • Code size and variable table size restrictions of JOP • Inlining comes at a price • Caller code size increases, makes caller cache miss more costly • Overall program size increases if callee is not removed (p.e. is called somewhere else)

  16. Inlining methods • Traverse callgraph bottom-up (leaves first) • Find and devirtualize invocations • static, final, private invokations not virtual • Check class hierarchy for overloading methods • Check if inlining is admissible • Estimate gain • Replace invocation with copy of callee • insert nullpointer-check for callee class reference • map local variables of callee above caller variables

  17. Inlining Checks • Inlining is not possible if • new code size or variable table size of caller exceeds platform limits • the callee uses exception handlers or synchronized code • throwing an exception clears the stack • stack of the caller needs to be saved and restored if an exception is handled within the inlined method (NYI in JOPtimizer) • the method or class is excluded from inlining by configuration (caller or callee, p.e. Native class)

  18. Inlining Checks (cont.) class A public a() tmp = new C() invoke tmp.b() class B public b() if (v == null) invoke B.c() private c() class C extends B private c() • Check field- and method references in callee code • Must be accessible from caller • Else make field or method public if possible • Always possible for fields as they are not virtual in Java • All overloading methods must be made public too • If a private method is made public, all invocations have to be changed from invokespecial to invokevirtual (luckily only methods of callee class have to be searched) • Naming conflicts or dynamic class loading can prevent changes, thus preventing inlining

  19. Inlining Checks (cont.) • Estimate gain of inlining • Depends on cache state • Possible degredation of performance if inlined method is seldom invoked • Calculate gain based on invocation frequency and cache state estimations • Decrease weight of callees with multiple call sites to reduce increase of application code size • Select method with highest (positive) weight for inlining • Add inlined invocations to inlining candidate list, repeat inlining (check with new codesize)

  20. Benchmark Results • Inlining of stackcode, jbe @60Mhz • Inlining limited by maximal code size imposed by JOP's memory cache • Removing of unused code should be implemented

  21. Inlining Improvements • Many improvements possible • Type analysis/callgraph thinning for better devirtualization • Better cache state and invocation frequency estimation(WCET-driven?) • Run optimizations to reduce code size prior to inlining • Allow inlining of synchronized code/exception handlers • Try to find invocations with highest gain application-wide • ...

  22. Summary • Optimizing code at runtime not feasible for (realtime) embedded systems • Existing open source tools not designed for embedded systems • Inlining implemented in JOPtimizer which takes target platform into account (code restrictions, caching, ..), up to 14% speedup of JBE benchmark • load/store elimination and local variable allocation needed for further optimizations to be implemented • Still many improvements possible ..

  23. Thanks for your attention! Questions? Q&A

  24. Transformations

More Related