1 / 29

Binary Analysis and Rewriting

Binary Analysis and Rewriting. Min Gyung Kang Stephen McCamant Pongsin Poosankam Dawn Song UC, Berkeley. Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University. Motivation.

levana
Download Presentation

Binary Analysis and Rewriting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Binary Analysis and Rewriting Min Gyung Kang Stephen McCamant Pongsin Poosankam Dawn Song UC, Berkeley Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University

  2. Motivation • A popular approach for protecting applications from untrusted OS is to rely on a trusted VMM • Binary translation is one of the commonly used implementation technologies in VMMs • QEMU, earlier versions of VMWare, … • Benefits: No need for hardware support, applicable to COTS binaries, whole system can be instrumented • Unfortunately, existing binary translators unsuited for enforcing higher level properties • Information flow, control-flow integrity, object-granularity memory safety, … • Incur very high overheads (4x to 10x slowdown), or are simply unable to express certain properties

  3. Our Approach • Develop novel static analysis based methods to overcome the drawbacks of today’s techniques • Robust, scalable static analysis of low-level code • From different compilers, or hand-coded assembly • Accurate disassembly of binary code • Indirect control-flow transfers, non-standard call/return conventions, mixing of data and code, … • Accurate reasoning about key properties • Dynamic taint analysis

  4. Robust and scalable Static analysis of low-level code

  5. Static analysis of low-level code • Scalability: requires modular analysis • Analyze functions individually, compose results • Avoids repeated analysis of same code (esp. libraries) • Strength: requires accurate reasoning about variables (esp. local variables) • Challenges in low-level binary code • Difficult to identify parameter passing in optimized code • Missing pushes, parameter passing via registers,… • Difficult to distinguish local variables from other accesses • Caller/callee-saved registers, stack pointer conventions, …

  6. Static analysis of low-level code • To solve these challenges, previous approaches • make optimistic assumptions, or rely on compiler idioms • often fail on optimized code and/or large programs • don’t work for other compilers, or hand-written assembly • Our solution: Develop a new approach that • Uses systematic analysis to reduce assumptions/heuristics • Accurately tracks local variables by analyzing values held in registers and on the stack

  7. Stack Analysis • Analyzes one function at a time • Examines the use of stack to • Determine parameters • Number of them, whether in registers or on stack • Caller- and callee-saved registers • Summarize effect on parameters • Preservation of SP, return to caller, changes in parameter or register contents,… ESP RETURN ADDR ƒ

  8. Abstract Interpretation for Stack Analysis EBP ESP Base_BP +[0,0] Base_SP +[0,0] <ƒ> : push %ebp mov %esp, %ebp sub $16, %esp Base_SP 0 Activation Record LATTICE

  9. Abstract Interpretation for Stack Analysis EBP ESP Base_BP +[0,0] Base_SP +[-4,-4] <ƒ> : push %ebp mov %esp, %ebp sub $16, %esp Base_SP 0 Activation Record Base_BP+[0,0] -4 LATTICE

  10. Stack Analysis (contd) <f>: push %ebp mov %esp, %ebp sub $16, %esp mov 8(%ebp), %eax add $3, %eax mov %eax, 8(%ebp) mov $7, -12(%ebp) mov 12(%ebp), %edx mov %edx, -8(%ebp) leave ret • Summary for f: • No change to ESP • Two input parameters on stack • EAX, EDX, arg1 changed as shown • Others unchanged Base_SP+[-20,-20] Base_SP args locals

  11. Stack Analysis: Preliminary results

  12. Static disassembly of binary code

  13. Background: Disassembly Techniques • Linear sweep algorithm • Start with program entry point, proceed to disassemble instructions sequentially • Key assumption: all instructions appear one after the next, without any gaps • Violated in most code (presence of data or padding) • Recursive Traversal Algorithm • After a control-flow transfer instruction (CTI), proceed to disassemble target address • For conditional CTI and non-CTI, proceed to disassemble next instruction • Key problems • Code reached only through indirect CTIs • Functions that don’t return in the usual way

  14. Our Approach for Disassembly • Assumption • No code obfuscation • Non-assumptions • Function prologue and epilogue patterns • Compiler idioms or (lack of) optimizations • Approach • Use recursive traversal • Use stack analysis to compute/verify return targets • Develop new analysis to determine targets of indirect control-flow transfers

  15. Our Approach: Type inference • Key insight: Code pointer values don’t undergo arithmetic or other transformations • Implication: values assigned to code pointers must represent indirect CTI targets • Achieves much better results than data flow analysis • Avoids global def-use problem, which is very hard in low-level languages • Compute sets C of possible code addresses and C of definite code addresses • Code at addresses in C can be safely disassembled • Code at addresses not in C can be safely relocated

  16. Static Disassembly: Preliminary Results • Analysis of disassembler on 'ls' binary

  17. Static Disassembly: Preliminary Results • Gap in dhclient due to incomplete implementation, dealing with global arrays

  18. DTA++: Improving accuracy of Dynamic Taint Analysis [NDSS 2011]

  19. Under-tainting and Over-tainting • Results vary based on which values are considered to depend on others:

  20. Under-tainting and Over-tainting • Results vary based on which values are considered to depend on others: • Too few dependencies lead to under-tainting

  21. Under-tainting and Over-tainting • Results vary based on which values are considered to depend on others: • Too many dependencies lead to over-tainting

  22. Key Idea • Under-tainting occurs when control flow state represents (almost) all of the information in inputs • Key idea: propagate taint only for control dependencies that would cause under-tainting (culprit implicit flows)

  23. Key Idea • Under-tainting occurs when control flow state represents (almost) all of the information in inputs • Key idea: propagate taint only for control dependencies that would cause under-tainting (culprit implicit flows) 1 char output[256]; 2 char input = next_in(); 3 long len = 0; 4 if (input == '{') { 5 output[0] = '\\'; 6output[1] = '{'; 7 len = 2; 8 }

  24. DTA++ Approach Overview • Hypothesis: under-tainting occurs at just a few locations in a program (culprit branches) • Approach: find these locations in advance, and construct new taint propagation rules for them • Assumption: we are given test inputs that demonstrate the under-tainting

  25. Approach Details • Under-tainting Detection Predicate • Given a (partial) execution trace t, φ(t) holds if t contains a culprit implicit flow • Implementation • Use symbolic execution to count how many other inputs could take the same execution path as t • Few or none → φ(t) = true • Search for Culprit Branches • Find shortest prefix of t that satisfies φ • the last instruction in the prefix is the culprit • Remove culprit, repeat the search to find others

  26. DTA++ Results: Diagnosis Time

  27. DTA++ Results: Over-tainting

  28. Summary and Future Work • Develop novel static analysis based methods to overcome the drawbacks of today’s techniques • Robust, scalable static analysis of low-level code • Accurate disassembly of binary code • Accurate reasoning about key properties • Dynamic taint analysis • Future work • Experimentation and evaluation of stack analysis and disassembly • Robust and efficient binary instrumentation for information flow and related properties • Application to hostile OS defense

More Related