1 / 42

Efficient, Transparent, and Comprehensive Managed Program Execution

Efficient, Transparent, and Comprehensive Managed Program Execution. Derek Bruening. Determina Corporation. Typical Modern Application: IIS. Design Goals. Efficient Near-native performance Transparent Match native behavior Comprehensive Control every instruction, in any application

evelia
Download Presentation

Efficient, Transparent, and Comprehensive Managed Program Execution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient, Transparent, and ComprehensiveManaged Program Execution Derek Bruening Determina Corporation

  2. Typical Modern Application: IIS

  3. Design Goals • Efficient • Near-native performance • Transparent • Match native behavior • Comprehensive • Control every instruction, in any application • Customizable

  4. Managed Program Execution Engine • First software system that can manipulate, at runtime, every instruction an arbitrary application executes, with: • Minimal performance penalty • Full transparency • Exports interface for building custom tools • No modifications to the hardware, operating system, or application

  5. Challenges of Real-World Apps • Multiple threads • Cache management • Application introspection • Inter-process manipulation: hooks • Transparency corner cases are the norm • Scalability • Must adapt to varying code sizes, thread counts, etc.

  6. Outline • Efficient • Transparent • Comprehensive • Customizable

  7. Outline • Efficient • Software code cache • Traces • Base performance • Transparent • Comprehensive • Customizable

  8. Basic Interpreter START fetch decode execute Slowdown: ~300x

  9. Interpreter + Basic Block Cache basic block builder START dispatch context switch BASIC BLOCK CACHE Non-control-flow instructions executed from software code cache non-control-flow instructions Slowdown: 300x 25x Shade [Cmelik 1994]

  10. Linking Direct Branches basic block builder START dispatch context switch BASIC BLOCK CACHE Direct branch to existing block can bypass dispatch non-control-flow instructions Slowdown: 300x 25x 3x Shade [Cmelik 1994]

  11. Linking Indirect Branches basic block builder START dispatch context switch BASIC BLOCK CACHE Application address mapped to code cache indirect branch lookup non-control-flow instructions Slowdown: 300x 25x 3x 1.2x Dynamo [Bala 2000]

  12. Picking Traces trace selector basic block builder START dispatch context switch BASIC BLOCK CACHE TRACE CACHE indirect branch lookup non-control-flow instructions non-control-flow instructions indirect branch stays on trace? Slowdown: 300x 26x 3x 1.2x <1.1x Dynamo [Bala 2000]

  13. Outline • Efficient • Transparent • Rules of transparency • Cache consistency • Comprehensive • Customizable

  14. Transparency • Do not want to interfere with the semantics of the program • Dangerous to make any assumptions about: • Register usage • Calling conventions • Stack layout • Memory/heap usage • I/O and other system call use

  15. Painful, But Necessary • Difficult and costly to handle corner cases • Many applications will not notice… • …but some will! • Non-exceptional exceptions: Adobe Photoshop • Stack convention violations: Microsoft Office • Self-modifying code: Adobe Premiere

  16. Windows Rule 1: Avoid resource conflicts Linux

  17. Rule 2: If it’s not broken, don’t change it • Threads • Executable on disk • Application data • Including the stack!

  18. Example Transparency Violation Error Error Error Error Error Error Error Error Error Error SPEC CPU2000 Server Desktop

  19. Rule 3: If you change it, emulate original behavior’s visible effects • Application addresses • Address space • Error transparency • Code cache consistency

  20. Cache Consistency

  21. Detecting Code Changes • Memory unmap • Example: shared library being unloaded • Detect by monitoring system calls (munmap, NtUnmapViewOfSection) • Memory modification • Dynamically modified code • IA-32 keeps icache consistent in hardware...

  22. Detecting Code Changes • Solution: • Page protection when rarely written • Instrumentation when frequently written or when writer and target on same page

  23. Outline • Efficient • Transparent • Comprehensive • Kernel-mediated control transfers • Customizable

  24. Kernel-Mediated Control Transfers user mode kernel mode message pendingsave user context majority of executed code in a typical Windows application message handler time no message pendingrestore context

  25. Challenges • Interception • Set up own handler in place of original • Continuation • May never return to interrupted state • Self-interruption • Kernel emulation

  26. Intercept and Re-direct Messages user mode kernel mode message pendingsave user context intercept time message handler no message pendingrestore context Mojo [Chen 2000]

  27. Kernel Emulation user context user context • Exception and signal handlers are passed machine context of the faulting instruction • For transparency, that context must be translated from the code cache to the original code location faulting instr. faulting instr.

  28. Outline • Efficient • Transparent • Comprehensive • Customizable • Client Hooks • API • Examples

  29. Clients • The engine exports an API for building a client • System details abstracted away: client focuses on manipulating the code stream

  30. Client Hooks client client START trace selector basic block builder client dispatch context switch BASIC BLOCK CACHE TRACE CACHE indirect branch lookup non-control-flow instructions non-control-flow instructions indirect branch stays on trace?

  31. Client Hooks: Code Stream • Application code stream • Basic block creation • Trace creation • Client has opportunity to inspect and potentially modify every single application instruction, immediately before it executes

  32. Client Hooks: Bookkeeping • Initialization and Exit • Entire process • Each thread • Basic block and trace deletion during cache management

  33. Client API • Code manipulation • IR • Saving eflags, spilling registers • Processor feature identification • Transparency support • Separate I/O and memory allocation • Thread support • Thread-local memory, simple mutexes

  34. Instruction Representation • Costly to decode and encode IA-32 • Variable length • Specialized instruction templates • Complex decoding/encoding heuristics • Often only interested in high-level information for subset of instructions • Many instructions copied to cache unmodified • Solution: adaptive level of detail

  35. API Highlights • Clean calls • Branch instrumentation • Adaptive code transformation • Custom traces • Custom exit stubs and prefixes • Standalone library support

  36. Adaptive Code Transformation • Re-decode fragment in cache • Replace fragment in cache • Even while executing inside of it • Works by creating a new fragment and shifting all incoming links to it

  37. Example Client EXPORT void dynamorio_basic_block(void *cxt, app_pc tag, InstrList *bb) { Instr *instr; for (instr = instrlist_first(bb); instr != NULL; instr = instr_get_next(instr)) { if (instr_is_syscall(instr)) { dr_save_arith_flags(cxt, bb, instr, &OF_slot); instrlist_preinsert(bb, instr, INSTR_CREATE_inc(cxt, OPND_CREATE_MEM32(REG_NULL, &counter))); dr_restore_arith_flags(cxt, bb, instr, &OF_slot); } } }

  38. Dynamic Optimization Examples • Adaptive • Tune for current behavior, not single profile run • Microarchitecture-specific • Specialize to underlying processor • Inter-module • All code is available • Traditional static optimizations • Vendor may not have applied all optimizations

  39. Pentium 4? EXPORT void dynamorio_init() { enable = (proc_get_family()==FAMILY_PENTIUM_IV); } EXPORT void dynamorio_trace(void *drcontext, app_pc tag, InstrList *trace) { Instr *instr, *next_instr; int opcode; if (!enable) return; for (instr =instrlist_first_expanded(bb); instr != NULL; instr = next_instr) { next_instr =instr_get_next_expanded(instr); opcode =instr_get_opcode(instr); if (opcode ==OP_inc|| opcode ==OP_dec) replace_inc_with_add(drcontext, instr, trace); } } } static bool replace_inc_with_add(void *drcontext, Instr *instr, InstrList *trace) { Instr *in; uint eflags; int opcode =instr_get_opcode(instr); bool ok_to_replace = false; for (in = instr; in != NULL; in =instr_get_next_expanded(in)) { eflags =instr_get_arith_flags(in); if ((eflags &EFLAGS_READ_CF) != 0) return false; if ((eflags &EFLAGS_WRITE_CF) != 0) { ok_to_replace = true; break; } if (instr_is_exit_cti(in)) return false; } if (!ok_to_replace) return false; if (opcode ==OP_inc) in =INSTR_CREATE_add(drcontext, instr_get_dst(instr, 0), OPND_CREATE_INT8(1)); else in =INSTR_CREATE_sub(drcontext,instr_get_dst(instr, 0), OPND_CREATE_INT8(1)); instr_set_prefixes(in,instr_get_prefixes(instr)); instrlist_replace(trace, instr, in); instr_destroy(drcontext, instr); return true; } Look for inc / dec Ensure eflags change ok Replace with add / sub

  40. Summary • First software system that can manipulate code at runtime in a manner that is: • Efficient: minimal performance penalty • Transparent: unperturbed native behavior • Comprehensive: every instruction an arbitrary application executes • Customizable: can build runtime tools

More Related