1 / 37

Dynamic Compilation and Optimization

Dynamic Compilation and Optimization. CS 471 December 3, 2007. Compiler. High-Level Programming Languages. Machine Code. Error Messages. High-Level Programming Languages. Front End. Back End. Machine Code. So Far… Static Compilation. Digging Deeper…. Compiler. Error Messages.

george
Download Presentation

Dynamic Compilation and Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Compilation and Optimization CS 471 December 3, 2007

  2. Compiler High-Level Programming Languages Machine Code Error Messages High-Level Programming Languages Front End Back End Machine Code So Far… Static Compilation Digging Deeper… Compiler Error Messages

  3. Alternatives to the Traditional Model • Static Compilation • All work is done “ahead-of-time” • Just-in-Time Compilation • Postpone some compilation tasks • Multiversioning and Dynamic Feedback • Include multiple options in binary • Dynamic Binary Optimization • Traditional compilation model • Executables can adapt

  4. Move More of Compilation to Run Time • Execution environment may be quite different from the assumptions made at compile time • Dynamically loaded libraries • User inputs • Hardware configurations • Dependence on software vendors • Apps on tap • Incorporate profiling

  5. Just-in-Time Compilation • Ship bytecodes (think IR) rather than binaries • Binaries execute on machines • Bytecodes execute on virtual machines Compiler High-Level Programming Languages Machine Code Front End Back End Error Messages

  6. javac source bytecode java bytecode execute Just-in-Time Compilation • javac the Java bytecode compiler • java the Java virtual machine • Bytecode: machine independent, portable • Step One: “Compile” Circle.java • % javac Circle.java -> Circle.class • Step Two: “Execute” • % java Circle.class

  7. Bytecodes • Each frame contains local variables and an operand stack • Instruction set • Load/store between locals and operand stack • Arithmetic on operand stack • Object creation and method invocation • Array/field accesses • Control transfers and exceptions • The type of the operand stack at each program point is known at compile time

  8. Bytecodes (cont.) • Example: • iconst 2 • iload a • iload b • iadd • imul • istore c • Computes: c := 2 * (a + b)

  9. Bytecodes (cont.) • Example: • iconst 2 • iload a • iload b • iadd • imul • istore c • Computes: c := 2 * (a + b) 42 a 7 b c 0

  10. Bytecodes (cont.) • Example: • iconst 2 • iload a • iload b • iadd • imul • istore c • Computes: c := 2 * (a + b) 42 a 7 b c 0 2

  11. Bytecodes (cont.) • Example: • iconst 2 • iload a • iload b • iadd • imul • istore c • Computes: c := 2 * (a + b) 42 a 7 b c 0 42 2

  12. Bytecodes (cont.) • Example: • iconst 2 • iload a • iload b • iadd • imul • istore c • Computes: c := 2 * (a + b) 42 a 7 b 7 c 0 42 2

  13. Bytecodes (cont.) • Example: • iconst 2 • iload a • iload b • iadd • imul • istore c • Computes: c := 2 * (a + b) 42 a 7 b c 0 49 2

  14. Bytecodes (cont.) • Example: • iconst 2 • iload a • iload b • iadd • imul • istore c • Computes: c := 2 * (a + b) 42 a 7 b c 0 98

  15. Bytecodes (cont.) • Example: • iconst 2 • iload a • iload b • iadd • imul • istore c • Computes: c := 2 * (a + b) 42 a 7 b c 98

  16. Executing Bytecode • java Circle.class - What happens? • Interpreting • map each bytecode to a machine code sequence, • for each bytecode, execute the sequence • Translation to machine code • map all the bytecodes to machine code (or a higher level intermediate representation) • massage them (e.g., remove redundancies) • execute the machine code

  17. Hotspot Compilation • A hybrid approach • Initially interpret • Find the “hot” (frequently executed) methods • Translate only hot methods to machine code

  18. MyApp JVM VM VM VM VM VM VM P III P IV 21164 21264 PA-8000 PA-7000 The Virtual Machine • An extreme version of an old idea • Previously: • Now: MyApp MyApp MyApp x86 alpha pa-risc P III P IV 21164 21264 PA-8000 PA-7000

  19. Compile-Time Multiversioning • Multiple versions of code sections are generated at compile-time • Most appropriate variant is selected at runtime based upon characteristics of the input data and/or machine environment • Multiple variants can cause code explosion • Thus typically only a few versions are created

  20. optimized binary ???? binary Another Alternative • Optimize a traditional application as it executes • Why? • Don’t have source code!

  21. What is a Dynamic Optimization System? • Transforms* an application at run time • * {translate, optimize, extend} Application Transform Profile Code Cache Execute

  22. Classification • Dynamic binary optimizers(x86  x86opt) • Complement the static compiler • User inputs, phases, DLLs, hardware features • Examples: DynamoRIO, Mojo, Strata • Dynamic translators(x86  PPC) • Convert applications to run on a new architecture • Examples: Rosetta, Transmeta CMS, DAISY • Binary instrumentation(x86  x86instr) • Inspect and/or add features to existing applications • Examples: Pin, Valgrind • JITs + adaptive systems(Java bytecode  x86)

  23. Dynamic Instrumentation Demo • Pin • Four architectures – IA32, EM64T, IPF, XScale • Four OSes – Linux, FreeBSD, MacOS, Windows

  24. What are the Challenges? • Performance! • Solutions: • Code caches – only transform code once • Trace selection – focus on hot paths • Branch linking – only perform cache lookup once • Indirect branch hash tables / chaining • Memory “management” • Correctness – self-modifying code, munmaps • Transparency – context switching, eflags

  25. Improving Performance: Code Caches Exit Stub Hash Table Lookup Hit Code Cache Interpret Branch Target Address Start Counter++ Miss Delete Insert Update Hash Table Room in Code Cache? Yes Code is Hot? No Region Formation & Optimization Yes Evict Code No

  26. A A A B B Exit to C B C C D Call D E D E I G Exit to F Call H F G I I H E Return F Return G H Layout in Code Cache CFG Layout in Memory Improving Performance: Trace Selection • Interprocedural path • Single entry, multiple exit • Trace (superblock)

  27. Optimizing Traces • Remove fall-through branches • Backward, then forward optimization pass Optimizations: Constant Propagation Copy Propagation Loop Invariant Strength Reduction Redundancy Removal A A C B C D Must be lightweight!! D

  28. Trace formation – Partial procedure inline & code layout • Slowdowns resulting from: • Short execution (ijpeg), or too many dynamic paths (go, vortex)

  29. O4 performs global interprocedure and link-time opt. • Dynamo + O2 ≈ Native O4 • Cannot win over O4 + profile • Static inlining & path-sensitive optimizations

  30. Improving Performance: Cache Linking Trace #1 Trace #2 Trace #3 Dispatch Exit #1a Exit #1b

  31. Importance of Linking • Slowdown when linking is disabled

  32. Code Cache Visualization

  33. SPC TPC Challenge: Achieving Transparency • Pretend as though the original program is executing Push 0x1006 on stack, then jump to 0x4000 Original Code: 0x1000call 0x4000 Translated Code: 0x7000push 0x1006 0x7006jmp 0x8000 • Code cache address mapping: • 0x1000  0x7000 “caller” • 0x4000  0x8000 “callee”

  34. Challenge: Self-Modifying Code • The problem • Code cache must detect SMC and invalidate corresponding cached traces • Solutions • Many proposed … but without HW support, they are very expensive! • Changing page protection • Memory diff prior to execution • On ARM, there is an explicit instruction for SMC!

  35. False Self-Modifying Code • The problem • On some architectures (x86) code may be mixed with data • Write to data – OK • Write to code – need to synch • Solution? • No great solution! (Yet…)

  36. Dynamic Optimization Summary • Complement the static compiler • Shouldn’t compete with static compilers • Observe execution pattern • Optimize frequently executed code • Optimization overhead could degrade performance • Exploits opportunities • Arise only at runtime • DLLs • Runtime constants • Hardware features, user patterns, etc. • Too expensive to fully exploit statically • Path-sensitive optimizations

  37. Next Time… • Course Summary and Wrap-Up • Preparation for Compiler Wars

More Related