1 / 39

Dynamic Binary Translation

Dynamic Binary Translation. Lecture 24 acknowledgement: E. Duesterwald (IBM), S. Amarasinghe (MIT). Lecture Outline. Binary Translation: Why, What, and When. Why: Guarding against buffer overruns What, when: overview of two dynamic translators: Dynamo-RIO by HP, MIT

chi
Download Presentation

Dynamic Binary Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Binary Translation Lecture 24 acknowledgement: E. Duesterwald (IBM), S. Amarasinghe (MIT) Ras Bodik CS 164 Lecture 24

  2. Lecture Outline • Binary Translation: Why, What, and When. • Why: Guarding against buffer overruns • What, when: overview of two dynamic translators: • Dynamo-RIO by HP, MIT • CodeMorph by Transmeta • Techniques used in dynamic translators • Path profiling Ras Bodik CS 164 Lecture 24

  3. Motivation: preventing buffer overruns Recall the typical buffer overrun attack: • program calls a method foo() • foo() copies a string into an on-stack array: • string supplied by the user • user’s malicious code copied into foo’s array • foo’s return address overwritten to point to user code • foo() returns • unknowingly jumping to the user code Ras Bodik CS 164 Lecture 24

  4. Preventing buffer overrun attacks Two general approaches: • static (compile-time): analyze the program • find all array writes that may outside array bounds • program proven safe before you run it • dynamic (run-time): analyze the execution • make sure no write outside an array happens • execution proven safe (enough to achieve security) Ras Bodik CS 164 Lecture 24

  5. Dynamic buffer overrun prevention the idea, again: • prevent writes outside the intended array • as is done in Java • harder in C: must add “size” to each array • done in CCured, a Berkeley project Ras Bodik CS 164 Lecture 24

  6. A different idea perhaps less safe, but easier to implement: • goal: detect that return address was overwritten. instrument the program so that • it keeps an extra copy of the return address: • store aside the return address when function called (store it in an inaccessible shadow stack) • when returning, check that the return address in AR matches the stored one; • if mismatch, terminate program Ras Bodik CS 164 Lecture 24

  7. Commercially interesting • Similar idea behind the product by determina.com • key problem: • reducing overhead of instrumentation • what’s instrumentation, anyway? • adding statements to an existing program • in our case, to x86 executables • Determina uses binary translation Ras Bodik CS 164 Lecture 24

  8. What is Binary Translation? • Translating a program in one binary format to another, for example: • MIPS  x86 (to port programs across platforms) • We can view “binary format” liberally: • Java bytecode  x86 (to avoid interpretation) • x86  x86 (to optimize the executable) Ras Bodik CS 164 Lecture 24

  9. When does the translation happen? • Static (off-line): before the program is run • Pros: no serious translation-time constraints • Dynamic (on-line): while the program is running • Pros: • access to complete program (program is fully linked) • access to program state (including values of data struct’s) • can adapt to changes in program behavior • Note: Pros(dynamic) = Cons(static) Ras Bodik CS 164 Lecture 24

  10. Linker Loader • Instrumenters Why? Translation Allows Program Modification Static Dynamic Runtime System Compiler Program • Debuggers • Interpreters • Just-In-Time Compilers • Dynamic Optimizers • Profilers • Dynamic Checkers • instrumenters • Etc. • Load time optimizers • Shared library mechanism Ras Bodik CS 164 Lecture 24

  11. Applications, in more detail • profilers: • add instrumentation instructions to count basic block execution counts (e.g., gprof) • load-time optimizers: • remove caller/callee save instructions (callers/callees known after DLLs are linked) • replace long jumps with short jumps (code position known after linking) • dynamic checkers • finding memory access bugs (e.g., Rational Purify) Ras Bodik CS 164 Lecture 24

  12. Dynamic Program Modifiers Running Program Dynamic Program Modifier: Observe/Manipulate Every Instruction in the Running Program Hardware Platform Ras Bodik CS 164 Lecture 24

  13. In more detail common setup application application application DLL OS DLL OS DLL OS CodeMorph Dynamo CPU CPU=VLIW CPU=x86 CodeMorph(Transmeta) Dynamo-RIO (HP, MIT) Ras Bodik CS 164 Lecture 24

  14. Dynamic Program Modifiers Requirements: • Ability to intercept execution at arbitrary points • Observe executing instructions • Modify executing instructions • Transparency - modified program is not specially prepared • Efficiency - amortize overhead and achieve near-native performance • Robustness • Maintain full control and capture all code - sampling is not an option (there are security applications) Ras Bodik CS 164 Lecture 24

  15. HP Dynamo-RIO • Building a dynamic program modifier • Trick I: adding a code cache • Trick II: linking • Trick III: efficient indirect branch handling • Trick IV: picking traces • Dynamo-RIO performance • Run-time trace optimizations Ras Bodik CS 164 Lecture 24

  16. System I: Basic Interpreter next VPC fetch next instruction update VPC decode execute exception handling Instruction Interpreter • Intercept execution • Observe & modify executing instructions • Transparency Efficiency?- up to several 100 X slowdown Ras Bodik CS 164 Lecture 24

  17. Trick I: Adding a Code Cache next VPC lookup VPC exception handling fetch block at VPC execute block emit block context switch BASIC BLOCK CACHE non-control-flow instructions Ras Bodik CS 164 Lecture 24

  18. Example Basic Block Fragment add %eax, %ecx cmp $4, %eax jle $0x40106f frag7: stub1: stub2: add %eax, %ecx cmp $4, %eax jle <stub1> jmp <stub2> mov %eax, eax-slot # spill eax mov &dstub1, %eax # store ptr to stub table jmp context_switch mov %eax, eax-slot # spill eax mov &dstub2, %eax # store ptr to stub table jmp context_switch Ras Bodik CS 164 Lecture 24

  19. Runtime System with Code Cache next VPC basic block builder context switch BASIC BLOCK CACHE non-control-flow instructions • Improves performance: • slowdown reduced from 100x to 17-26x • remaining bottleneck: frequent (costly) context switches Ras Bodik CS 164 Lecture 24

  20. Linking a Basic Block Fragment add %eax, %ecx cmp $4, %eax jle $0x40106f frag7: stub1: stub2: add %eax, %ecx cmp $4, %eax jle <frag42> jmp <frag8> mov %eax, eax-slot mov &dstub1, %eax jmp context_switch mov %eax, eax-slot mov &dstub2, %eax jmp context_switch Ras Bodik CS 164 Lecture 24

  21. Trick II: Linking next VPC lookup VPC exception handling fetch block at VPC link block emit block execute until cache miss context switch BASIC BLOCK CACHE non-control-flow instructions Ras Bodik CS 164 Lecture 24

  22. Performance Effect of Basic Block Cache with direct branch linking Performance Problem: mispredicted indirect branches Ras Bodik CS 164 Lecture 24

  23. Indirect Branch Handling Conditionally “inline” a preferred indirect branch target as the continuation of the trace ret <preferred target> mov %edx, edx_slot# save app’s edx pop %edx # load actual target <save flags> cmp %edx, $0x77f44708 # compare to # preferred target jne <exit stub > mov edx_slot, %edx # restore app’s edx <restore flags> <inlined preferred target> Ras Bodik CS 164 Lecture 24

  24. Indirect Branch Linking Shared Indirect Branch Target (IBT) Table <load actual target> <compare to inlined target> if equal goto <inlined target> lookup IBT table if (! tag-match) goto <exit stub> jump to tag-value original target F original target H linked targets H K I L <inlined target> J <exit stub>

  25. Trick III: Efficient Indirect Branch Handling next VPC basic block builder context switch miss BASIC BLOCK CACHE miss indirect branch lookup non-control-flow instructions Ras Bodik CS 164 Lecture 24

  26. Performance Effect of indirect branch linking Performance Problem: poor code layout in code cache Ras Bodik CS 164 Lecture 24

  27. Trick IV: Picking Traces Block Cache has poor execution efficiency: • Increased branching, poor locality Pick traces to: • reduce branching & improve layout and locality • New optimization opportunities across block boundaries Block Cache Trace Cache A G D J A G K B J E B E H K F H C I F L D Ras Bodik CS 164 Lecture 24

  28. PickingTraces trace selector basic block builder START dispatch context switch BASIC BLOCK CACHE TRACE CACHE indirect branch lookup non-control-flow instructions non-control-flow instructions Ras Bodik CS 164 Lecture 24

  29. Picking hot traces • The goal: path profiling • find frequently executed control-flow paths • Connect basic blocks along these paths into contiguous sequences, called traces. • The problem: find a good trade-off between • profiling overhead (counting execution events), and • accuracy of the profile. Ras Bodik CS 164 Lecture 24

  30. Alternative 1: Edge profiling The algorithm: • Edge profiling: measure frequencies of all control-flow edges, then after a while • Trace selection: select hot traces by following highest-frequency branch outcome. Disadvantages: • Inaccurate: may select infeasible paths (due to branch correlation) • Overhead: must profile all control-flow edges Ras Bodik CS 164 Lecture 24

  31. Alternative 2: Bit-tracing path profiling The algorithm: • collect path signatures and their frequencies • path signature = <start addr>.history • example: <label7>.0101101 • must include addresses of indirect branches Advantages: • accuracy Disadvantages: • overhead: need to monitor every branch • overhead: counter storage (one counter per path!) Ras Bodik CS 164 Lecture 24

  32. Alternative 3: Next Executing Tail (NET) This is the algorithm of Dynamo: • profiling: count only frequencies of start-of-trace points (which are targets of original backedges) • trace selection: when a start-of-trace point becomes sufficiently hot, select the sequence of basic blocks executed next. • may select a rare (cold) path, but statistically selects a hot path! Ras Bodik CS 164 Lecture 24

  33. NET (continued) • Advantages of NET: • very light-weight #instrumentation points = #targets of backward branches #counters = #targets of backward branches • statistically likely to pick the hottest path • pick only feasible paths • easy to implement D J A G B E H K C I F L Ras Bodik CS 164 Lecture 24

  34. Spec2000 Performance on Windows(w/o trace optimizations) Ras Bodik CS 164 Lecture 24

  35. Spec2000 Performance on Linux(w/o trace optimizations) Ras Bodik CS 164 Lecture 24

  36. Performance on Desktop Applications Ras Bodik CS 164 Lecture 24

  37. Performance Breakdown Ras Bodik CS 164 Lecture 24

  38. Trace optimizations • Now that we built the traces, let’s optimize them • But what’s left to optimize in a statically optimized code? • Limitations of static compiler optimization: • cost of call-specific interprocedural optimization • cost of path-specific optimization in presence of complex control flow • difficulty of predicting indirect branch targets • lack of access to shared libraries • sub-optimal register allocation decisions • register allocation for individual array elements or pointers Ras Bodik CS 164 Lecture 24

  39. Maintaining Control (in the real world) • Capture all code: execution only takes place out of the code cache • Challenging for abnormal control flow • System must intercept all abnormal control flow events: • Exceptions • Call backs in Windows • Asynchronous procedure calls • Setjmp/longjmp • Set thread context Ras Bodik CS 164 Lecture 24

More Related