290 likes | 461 Views
Binary Analysis and Rewriting. Min Gyung Kang Stephen McCamant Pongsin Poosankam Dawn Song UC, Berkeley. Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University. Motivation.
E N D
Binary Analysis and Rewriting Min Gyung Kang Stephen McCamant Pongsin Poosankam Dawn Song UC, Berkeley Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University
Motivation • A popular approach for protecting applications from untrusted OS is to rely on a trusted VMM • Binary translation is one of the commonly used implementation technologies in VMMs • QEMU, earlier versions of VMWare, … • Benefits: No need for hardware support, applicable to COTS binaries, whole system can be instrumented • Unfortunately, existing binary translators unsuited for enforcing higher level properties • Information flow, control-flow integrity, object-granularity memory safety, … • Incur very high overheads (4x to 10x slowdown), or are simply unable to express certain properties
Our Approach • Develop novel static analysis based methods to overcome the drawbacks of today’s techniques • Robust, scalable static analysis of low-level code • From different compilers, or hand-coded assembly • Accurate disassembly of binary code • Indirect control-flow transfers, non-standard call/return conventions, mixing of data and code, … • Accurate reasoning about key properties • Dynamic taint analysis
Static analysis of low-level code • Scalability: requires modular analysis • Analyze functions individually, compose results • Avoids repeated analysis of same code (esp. libraries) • Strength: requires accurate reasoning about variables (esp. local variables) • Challenges in low-level binary code • Difficult to identify parameter passing in optimized code • Missing pushes, parameter passing via registers,… • Difficult to distinguish local variables from other accesses • Caller/callee-saved registers, stack pointer conventions, …
Static analysis of low-level code • To solve these challenges, previous approaches • make optimistic assumptions, or rely on compiler idioms • often fail on optimized code and/or large programs • don’t work for other compilers, or hand-written assembly • Our solution: Develop a new approach that • Uses systematic analysis to reduce assumptions/heuristics • Accurately tracks local variables by analyzing values held in registers and on the stack
Stack Analysis • Analyzes one function at a time • Examines the use of stack to • Determine parameters • Number of them, whether in registers or on stack • Caller- and callee-saved registers • Summarize effect on parameters • Preservation of SP, return to caller, changes in parameter or register contents,… ESP RETURN ADDR ƒ
Abstract Interpretation for Stack Analysis EBP ESP Base_BP +[0,0] Base_SP +[0,0] <ƒ> : push %ebp mov %esp, %ebp sub $16, %esp Base_SP 0 Activation Record LATTICE
Abstract Interpretation for Stack Analysis EBP ESP Base_BP +[0,0] Base_SP +[-4,-4] <ƒ> : push %ebp mov %esp, %ebp sub $16, %esp Base_SP 0 Activation Record Base_BP+[0,0] -4 LATTICE
Stack Analysis (contd) <f>: push %ebp mov %esp, %ebp sub $16, %esp mov 8(%ebp), %eax add $3, %eax mov %eax, 8(%ebp) mov $7, -12(%ebp) mov 12(%ebp), %edx mov %edx, -8(%ebp) leave ret • Summary for f: • No change to ESP • Two input parameters on stack • EAX, EDX, arg1 changed as shown • Others unchanged Base_SP+[-20,-20] Base_SP args locals
Background: Disassembly Techniques • Linear sweep algorithm • Start with program entry point, proceed to disassemble instructions sequentially • Key assumption: all instructions appear one after the next, without any gaps • Violated in most code (presence of data or padding) • Recursive Traversal Algorithm • After a control-flow transfer instruction (CTI), proceed to disassemble target address • For conditional CTI and non-CTI, proceed to disassemble next instruction • Key problems • Code reached only through indirect CTIs • Functions that don’t return in the usual way
Our Approach for Disassembly • Assumption • No code obfuscation • Non-assumptions • Function prologue and epilogue patterns • Compiler idioms or (lack of) optimizations • Approach • Use recursive traversal • Use stack analysis to compute/verify return targets • Develop new analysis to determine targets of indirect control-flow transfers
Our Approach: Type inference • Key insight: Code pointer values don’t undergo arithmetic or other transformations • Implication: values assigned to code pointers must represent indirect CTI targets • Achieves much better results than data flow analysis • Avoids global def-use problem, which is very hard in low-level languages • Compute sets C of possible code addresses and C of definite code addresses • Code at addresses in C can be safely disassembled • Code at addresses not in C can be safely relocated
Static Disassembly: Preliminary Results • Analysis of disassembler on 'ls' binary
Static Disassembly: Preliminary Results • Gap in dhclient due to incomplete implementation, dealing with global arrays
DTA++: Improving accuracy of Dynamic Taint Analysis [NDSS 2011]
Under-tainting and Over-tainting • Results vary based on which values are considered to depend on others:
Under-tainting and Over-tainting • Results vary based on which values are considered to depend on others: • Too few dependencies lead to under-tainting
Under-tainting and Over-tainting • Results vary based on which values are considered to depend on others: • Too many dependencies lead to over-tainting
Key Idea • Under-tainting occurs when control flow state represents (almost) all of the information in inputs • Key idea: propagate taint only for control dependencies that would cause under-tainting (culprit implicit flows)
Key Idea • Under-tainting occurs when control flow state represents (almost) all of the information in inputs • Key idea: propagate taint only for control dependencies that would cause under-tainting (culprit implicit flows) 1 char output[256]; 2 char input = next_in(); 3 long len = 0; 4 if (input == '{') { 5 output[0] = '\\'; 6output[1] = '{'; 7 len = 2; 8 }
DTA++ Approach Overview • Hypothesis: under-tainting occurs at just a few locations in a program (culprit branches) • Approach: find these locations in advance, and construct new taint propagation rules for them • Assumption: we are given test inputs that demonstrate the under-tainting
Approach Details • Under-tainting Detection Predicate • Given a (partial) execution trace t, φ(t) holds if t contains a culprit implicit flow • Implementation • Use symbolic execution to count how many other inputs could take the same execution path as t • Few or none → φ(t) = true • Search for Culprit Branches • Find shortest prefix of t that satisfies φ • the last instruction in the prefix is the culprit • Remove culprit, repeat the search to find others
Summary and Future Work • Develop novel static analysis based methods to overcome the drawbacks of today’s techniques • Robust, scalable static analysis of low-level code • Accurate disassembly of binary code • Accurate reasoning about key properties • Dynamic taint analysis • Future work • Experimentation and evaluation of stack analysis and disassembly • Robust and efficient binary instrumentation for information flow and related properties • Application to hostile OS defense