1 / 36

JIT Instrumentation – A Novel Approach To Dynamically Instrument Operating Systems

JIT Instrumentation – A Novel Approach To Dynamically Instrument Operating Systems. Marek Olszewski Keir Mierle Adam Czajkowski Angela Demke Brown University of Toronto. Instrumenting Operating Systems. Operating systems are growing in complexity Kernel instrumentation can help

jolene
Download Presentation

JIT Instrumentation – A Novel Approach To Dynamically Instrument Operating Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. JIT Instrumentation – A Novel Approach To Dynamically Instrument Operating Systems Marek Olszewski Keir Mierle Adam Czajkowski Angela Demke Brown University of Toronto

  2. Instrumenting Operating Systems • Operating systems are growing in complexity • Kernel instrumentation can help • Used for: debugging, profiling, monitoring, and security auditing... • Dynamic instrumentation • No recompilation & no reboot • Good for debugging systemic problems • Feasible in production settings

  3. Current Approach: Probe-Based • Dynamic instrumentation tools for OSs are probe based • Overwrite existing code with jump/trap • Efficient on fixed length architectures • Slow on variable length architectures • Not safe to overwrite multiple instructions with jump: • Branch to between instructions might exist • Thread might be sleeping in between the instructions • Must use trap instruction

  4. Current Approach: Trap-based Area of interest Trap Handler Instrumentation Code sub $6c,esp • Save processor state • Lookup which instrumentation to call • Call instrumentation • Emulate overwritten instruction • Restore processor state mov $ffffe000,edx add $1,count_l adc $0,count_h and esp,edx int3 inc 14(edx) mov 28(edi),eax mov 2c(edi),ebx mov 30(edi),ebp add $1,eax and $3,eax or $c, eax mov eax,(ebx) add $2,ebp or $f, ebp mov ebp,4(ebx) Very Expensive!

  5. Alternative: JIT Instrumentation • Propose to use just-in-time dynamic instrumentation • Rewrite code to insert new instructions in between existing ones • More Efficient. • More Powerful. Supports: • Instrumenting branch directions • Basic block-level instrumentation • Per execution-path instrumentation • Proven itself in user space (Pin, Valgrind)

  6. JIT Instrumentation sub $6c,esp mov $ffffe000,edx and esp,edx inc 14(edx) mov 28(edi),eax mov 2c(edi),ebx mov 30(edi),ebp add $1,eax and $3,eax or $c, eax mov eax,(ebx) add $2,ebp or $f, ebp mov ebp,4(ebx) Area of Interest Code Cache Instrumentation Code sub $6c,esp mov $ffffe000,edx add $1,count_l and esp,edx adc $0,count_h inc 14(edx) mov 28(edi),eax mov 2c(edi),ebx mov 30(edi),ebp add $1,eax and $3,eax or $c, eax mov eax,(ebx) add $2,ebp or $f, ebp mov ebp,4(ebx)

  7. JIT Instrumentation inc 14(edx) mov 28(edi),eax mov 2c(edi),ebx mov 30(edi),ebp add $1,eax and $3,eax or $c, eax mov eax,(ebx) add $2,ebp or $f, ebp mov ebp,4(ebx) Area of Interest Code Cache Instrumentation Code sub $6c,esp sub $6c,esp mov $ffffe000,edx mov $ffffe000,edx add $1,count_l and esp,edx and esp,edx adc $0,count_h inc 14(edx) pushf mov 28(edi),eax callinstrmtn mov 2c(edi),ebx popf mov 30(edi),ebp add $1,eax and $3,eax or $c, eax mov eax,(ebx) add $2,ebp or $f, ebp mov ebp,4(ebx)

  8. JIT Instrumentation add $1,count_l adc $0,count_h popf inc 14(edx) mov 28(edi),eax mov 2c(edi),ebx mov 30(edi),ebp add $1,eax and $3,eax or $c, eax mov eax,(ebx) add $2,ebp or $f, ebp mov ebp,4(ebx) Area of Interest Code Cache Instrumentation Code sub $6c,esp sub $6c,esp mov $ffffe000,edx mov $ffffe000,edx and esp,edx and esp,edx inc 14(edx) pushf mov 28(edi),eax callinstrmtn mov 2c(edi),ebx mov 30(edi),ebp add $1,eax and $3,eax or $c, eax mov eax,(ebx) add $2,ebp or $f, ebp mov ebp,4(ebx)

  9. Dynamic Binary Rewriting • Use dynamic binary rewriting to insert the new instructions. • Interleaves binary rewriting with execution • Performed by a runtime system • Typically at basic block granularity • Code is rewritten into a code cache • Rewritten code must be: • Efficient • Unaware of its new location

  10. Dynamic Binary Rewriting Original Code: Code Cache: bb1 bb1 bb1 bb2 bb3 bb4 Runtime System

  11. Dynamic Binary Rewriting Original Code: Code Cache: bb1 bb1 bb2 bb2 bb3 bb2 bb4 Runtime System

  12. Dynamic Binary Rewriting Original Code: Code Cache: bb1 bb1 bb1 bb2 bb3 bb2 bb2 bb4 bb4 bb4 No longer need to enter runtime system Runtime System

  13. Dynamic Binary Rewriting • Used for rewriting operating systems • Virtualization (VMware) • Emulation (QEMU) • Never used for instrumentation of OSs • Never used to rewrite host OS in a general manner • Allows instrumentation of live system

  14. Outline • Prototype (JIFL) • Design • OS Issues • Performance comparison • Kprobes vs JIFL • Example Plugin • Checking branch hint directions

  15. Prototype Design • JIFL - JIT Instrumentation Framework for Linux • Instruments code reachable from system calls

  16. JIFL Software Architecture JIFL Plugin Starter JIFL Plugin Starter User Space Linux Kernel (All code reachable from system calls) JIFL Plugin (Loadable Kernel Module) Kernel Space JIFL (Loadable Kernel Module) Runtime System Code Cache Dispatcher JIT Compiler Heap Memory Allocator

  17. Gaining Control • Runtime System must gain control before it can start rewriting/instrumenting OS • Update system call table entry to point to dynamically emitted entry stub • Calls per-system call instrumentation • Calls dispatcher • Passing original system call pointer

  18. Dispatcher • Saves registers and condition code states • Dispatcher checks if target basic block is in code cache • If so it jumps to this basic block • Otherwise it invokes the JIT to compile and instrument the new basic block

  19. JIT Compiler • Like conventional JIT compiler, except its input/output is x86 machine code • Compiles at a dynamic basic block granularity • All but the last control flow instruction are copied directly into the code cache • Control flow instructions are modified to account for the new location of the code • Communicates with the JIFL plugin to determine what instrumentation to insert

  20. JIT: Inserting Instrumentation • Instrumentation is added by inserting a call instruction into the basic block • Additional instructions are also needed to: • Push/Pop instrumentation parameters • Save/Restore volatile registers (eax, edx, ecx) • Save/Restore condition code register • Several optimizations can be performed to reduce instrumentation cost

  21. Eliminating Redundant State Saving • Eliminate dead register and condition code saving code • Perform liveness analysis • Reduce state saving overhead • Per-basic block Instrumentation • Search for the cheapest place to insert it

  22. Inlining Instrumentation • Small instrumentation can be inlined into the basic block • Removes the call and ret instructions • Constant parameters are propagated to remove stack accesses • Copy propagation and dead-code elimination is applied to specialize the instrumentation routine for context • All done on native x86 code. No IR!

  23. Effect of Optimizations • Average system call latencies with per-basic block instrumentation Normalized Execution Time

  24. Prototype • Operating System Issues

  25. Memory Allocator • While JITing JIFL often needs to allocate dynamic memory • Cannot rely on Linux kmalloc and vmalloc routines as they are not reentrant • Instead, we created our own memory allocator • Pre-allocate a heap when JIFL starts up

  26. Releasing Control • Calls to schedule() have to be redirected • Otherwise, JIFL keeps control even after context switch • Have to: • Save return address in hash table • Call schedule() • Look up and call dispatcher

  27. Performance Comparison • JIFL vs. Kprobes

  28. Performance Evaluation • Instrument every system call with three types of instrumentation: • System Call Monitoring (Coarse Grained) • Call Tracing (Medium Grained) • Basic Block Counting (Fine Grained) • LMbench and ApacheBench2 benchmarks • Test Setup • 4-way Intel Pentium 4 Xeon SMP - 2.8GHz • Linux 2.6.17.13 • With SMP support and no preemption

  29. System Call Monitoring Normalized Execution Time

  30. Call Tracing Normalized Execution Time Log Scale

  31. Basic Block Counting Normalized Execution Time Log Scale

  32. Apache Throughput Normalized Requests / Second

  33. Example Plugin • Checking Correctness of Branch Hints

  34. Example Plugin: Checking Branch Hints Int correct_count 0 Int incorrect_count  0 // Called for every newly discovered basic block. Procedure Basic_Block_Callback if last instruction is not a hinted branch return if hinted in the branch not taken direction call Insert_Branch_Not_Taken_Instrumentation( Increment_Counter, &correct_count) call Insert_Branch_Taken_Instrumentation( Increment_Counter, &incorrect_count) else // Insert same instrumentation but for reverse // branch directions // Executed for every instrumented branch. Procedure Increment_Counter(Int *Counter) *Counter  *Counter + 1

  35. Example Plugin: Checking Branch Hints • 5 system calls with bad branch hint performance • Misprediction rates > 75% • Contained > 30% of hinted branch executed • Examined using a second plugin • Monitored individual branches • Found 4 greatest contributors • Mapped back to source code • Can’t fix: Not hinted by programmer!

  36. Conclusions • JIT instrumentation viable for operating systems • Developed a prototype for the Linux kernel (JIFL) • Results are very competitive • JIFL outperforms Kprobes by orders of magnitude • Enables more powerful instrumentation • e.g. Branch Hints

More Related