Efficient x86 Instrumentation :

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Itai Gurari gurari@cs.wisc.edu Computer Science Department University of Wisconsin 1210 W. Dayton St. Madison, WI 53706-1685 Paradyn/Condor Week Madison, WI March 12-14, 2001

Introduction Dynamic Instrumentation: • Insert instrumentation into application in execution • Used by Paradyn to gather performance data • Paradyn instrumentation is inserted for three types of points • function entry, exit, and call Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Paradyn Instrumentation Points Executable Code foo () { call <bar> } Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Paradyn Instrumentation Points Executable Code Entry foo () { call <bar> } Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Paradyn Instrumentation Points Instrumentation Executable Code Entry startTimer() foo () { call <bar> } counter++ Call Exit stopTimer() Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Goal Transfer from function to instrumentation code as quickly as possible Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Control Transfer To switch execution from a function to its instrumentation code: • Overwrite instructions in function with a control transfer instruction. • Equivalent of overwritten instructions are copied to the code patch area. • On the x86, Paradyn uses, by default, a 5- byte jump to transfer control the instrumentation code. • 5-byte jump range is whole address space • If a 5-byte instruction won’t fit, we use a 1-byte traps (int3 instruction). Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Inserting Control Transfer Instructions • Dynamically rewrite function in place • Different techniques for different types of instrumentation points Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Entry Point Case 1 push mov sub Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Entry Point Case 1 push mov sub Enough room to replace instruction with a jump jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Entry Point Case 2 push mov jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Entry Point Case 2 push mov jmp Inserting a jump instruction interferes with the target of the backwards jump jmp <instrumentation> jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Entry Point Case 2 push mov jmp Must use a trap instruction to get to instrumentation int3 mov jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Call Point call <Foo> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Call Point call <Foo> Enough room to replace instruction with a jump jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 1 mov leave ret Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 1 mov leave ret Back up far enough to replace instructions with a jump jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2 call <Foo> leave ret Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2 call <Foo> leave ret Jump interferes with the preceding call call jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2a call <Foo> leave ret Beginning of next function (4-byte boundary) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2a Compiler pads with “bonus bytes” ? ? ? call <Foo> leave ret Beginning of next function (4-byte boundary) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2a Compiler pads with “bonus bytes” ? ? ? call <Foo> leave ret Beginning of next function (4-byte boundary) Replace instructions with a jump jmp <instrumentation> call <Foo> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2b Not enough “bonus bytes” to overwrite with a jump (if any) ? call <Foo> leave ret Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2b Not enough “bonus bytes” to overwrite with a jump (if any) ? call <Foo> leave ret Overwrite return with a trap call <Foo> leave int3 ? Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Extra slot No jumps to first ten bytes of function push mov sub mov Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Extra slot No jumps to first ten bytes of function push mov sub mov Enough space to overwrite entry with a jump jmp <instrumentation> mov Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Extra slot No jumps to first ten bytes of function push mov sub mov Enough space to overwrite entry with a jump Make 2-byte jump to “extra slot”, overwrite “extra slot” with jump to instrumentation jmp <instrumentation> jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Control Transfer Traps on x86 • Generate an exception that is caught by either the application (Solaris, Linux) or the paradyn daemon (Windows NT). • Address of trap instruction is used to calculate which instrumentation code to execute. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Problem Trap handling is slow: • On Solaris 2.6 jumps are over 1000 times faster than traps. • On Linux 2.2 jumps are over 200 times faster than traps Traps Limit Instrumentation: • can’t insert as much or at as fine a granularity Trap handling logic is difficult: • Susceptible to bugs • Difficult to understand and maintain Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Solution Rewrite functions that do not have enough room for jumps, into functions that do have enough room for jumps. • Rewrite the function, on-the-fly: combines dynamic instrumentation, binary rewriting. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Dynamic Rewriting Dynamic Rewriting Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Dynamic Rewriting Dynamic Rewriting overwrite existing instructions Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Dynamic Rewriting Dynamic Rewriting overwrite existing instructions expand instrumentation points Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Dynamic Rewriting Dynamic Rewriting overwrite existing instructions expand instrumentation points Relocate Function Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Function Rewriting and Relocation In Paradyn we rewrite a function: • only if the function contains an instrumentation point that would require using a trap to instrument • the first time a request to instrument the function is made • even if the instrumentation to be inserted is not for a point that requires using a jump • e.g. the exit needs a trap, the entry can use a jump, request is to instrument the entry Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Function Rewriting and Relocation(continued) • all instrumentation points that cannot use a jump are expanded. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function Entry Call push mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function Entry Call Insert nop at entry push nop mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function Entry Call Insert nop at entry jmp < instrumentation > call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function Entry Call Insert nop at entry jmp < instrumentation > call <Foo> nop nop nop nop call <Bar> ret Insert nops at exit Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function Entry Call Insert nop at entry jmp < instrumentation > call <Foo> call <Bar> jmp < instrumentation > Insert nops at exit Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function Original Function Entry Call push mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function Original Function Entry Overwrite entry of original function with jump to rewritten function jmp < rewritten function> call <Foo> call <Foo> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Update Jumps and Calls • PC-relative jump and call instructions: • with destinations outside the function will have incorrect displacements • some jumps to locations inside the function will have incorrect displacements • 2-byte jumps: • have range of 128 bytes forward, 127 bytes backwards • if target address is no longer in range, replace 2-byte instruction with 5-byte instruction that has further reach Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Status Dynamic rewriting and function relocation is operational in Paradyn release 3.2 for x86 (Solaris, Linux, Windows NT). Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Current Limitations We do not relocate a function if: • the application is executing within the function we want to instrument • it has a jump table Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps vs. Traps Trap handling: Average time to get to instrumentation and back Trap Jump Solaris Linux 37.6 .03 .04 8.3 • time in microseconds Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps vs. Traps • Relocating functions that are performance bottlenecks, leads to greatest speedup • More instrumentation can be inserted since perturbation to system is minimized. • In Paradyn, ratio of speedup depends on type of metric (e.g. CPU time, number of procedure calls) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Some Results bubba (circuit layout) • instrumented 9 functions for CPU • all required trap for exit point • 5 relocated functions • called 400 thousand times • consumed 20% of CPU. • 23 seconds to execute using relocation • 42 seconds to execute without relocation Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Some Results fspx (2-D heat transfer simulation) • 4 of 46 functions required traps • all for exit points • instrumented __atan for CPU • required trap for exit • called 107 million times • consumed 25% of CPU. • 7.5 minutes to execute using relocation • 115 minutes to execute without relocation Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Efficient x86 Instrumentation :