1 / 31

Labeling Library Functions in Stripped Binaries

Labeling Library Functions in Stripped Binaries. Emily R. Jacobson, Nathan Rosenblum , and Barton P. Miller Computer Sciences Department University of Wisconsin - Madison. Why Binary Code?. Source code isn’t available Source code isn’t the right representation.

chung
Download Presentation

Labeling Library Functions in Stripped Binaries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Labeling Library Functions in Stripped Binaries Emily R. Jacobson, Nathan Rosenblum, and Barton P. Miller Computer Sciences Department University of Wisconsin - Madison

  2. Why Binary Code? • Source code isn’t available • Source code isn’t the right representation Labeling Library Functions in Stripped Binaries

  3. Binary Tools Need Symbol Tables • Debugging Tools • GDB, IDA Pro… • Instrumentation Tools • PIN, Dyninst,… • Static Analysis Tools • CodeSurfer/x86,… • Security Analysis Tools • IDA Pro,… Labeling Library Functions in Stripped Binaries

  4. Restoring Information Function locations Complicated by: • Missing symbol information • Variability in function layout (e.g. code sharing, outlined basic blocks) • High degree of indirect control flow program binary targ80c3bd0 targ80c3df4 targ80c3df4 Labeling Library Functions in Stripped Binaries

  5. Restoring Information What about semantic information? • Program’s interaction with the operating system (system calls) encapsulated by wrapper functions program binary targ80c3bd0 targ80c3df4 targ80c3df4 Library fingerprinting: identify functions based on patterns learned from exemplar libraries Labeling Library Functions in Stripped Binaries

  6. unstrip stripped binary parsing + library fingerprinting + binary rewriting targ80c3df4 targ80c3bd0 targ80c3df4 getpid accept Labeling Library Functions in Stripped Binaries

  7. <accept>: • mov %ebx, %edx • mov %0x66,%eax • mov $0x5,%ebx • lea 0x4(%esp),%ecx • int $0x80 • mov %edx, %ebx • cmp %0xffffff83,%eax • jaesyscall_error • ret Save registers mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx Set up system call arguments Invoke a system call int $0x80 mov %edx, %ebx cmp %0xffffff83,%eax jaesyscall_error ret Error check and return

  8. The same function can be realized in a variety of ways in the binary • <accept>: • cmpl $0x0,%gs:0xc • jne 80f669c • mov %ebx, %edx • mov %0x66,%eax • mov $0x5,%ebx • lea 0x4(%esp),%ecx • int $0x80 • mov %edx, %ebx • cmp %0xffffff83,%eax • jaesyscall_error • ret • push %esi • call enable_asyncancel • mov %eax,%esi • mov %ebx,%edx • mov $0x66,%eax • mov $0x5,%ebx • lea 0x8(%esp),%ecx • int $0x80 • mov %edx, %ebx • xchg %eax,%esi • call disable_acynancel • mov %esi,%eax • pop %esi • cmp $0xffffff83,%eax • jaesyscall_error • ret • <accept>: • mov %ebx, %edx • mov %0x66,%eax • mov $0x5,%ebx • lea 0x4(%esp),%ecx • int $0x80 • mov %edx, %ebx • cmp %0xffffff83,%eax • jaesyscall_error • ret mov %ebx, %edx mov %0x66,%eax mov $0x5,%ebx lea 0x4(%esp),%ecx glibc 2.2.4 on RHEL with GCC 2.95.3 glibc 2.5 on RHEL with GCC 4.1.2 int $0x80 mov %edx, %ebx cmp %0xffffff83,%eax jaesyscall_error ret • <accept>: • cmpl $0x0,%gs:0xc • jne 80f669c • mov %ebx, %edx • mov %0x66,%eax • mov $0x5,%ebx • lea 0x4(%esp),%ecx • call *0x814e93c • mov %edx, %ebx • cmp %0xffffff83,%eax • jaesyscall_error • ret • push %esi • call enable_asyncancel • mov %eax,%esi • mov %ebx,%edx • mov $0x66,%eax • mov $0x5,%ebx • lea 0x8(%esp),%ecx • call *0x8181578 • mov %edx, %ebx • xchg %eax,%esi • call disable_acynancel • mov %esi,%eax • pop %esi • cmp $0xffffff83,%eax • jaesyscall_error • ret glibc 2.5 on RHEL with GCC 3.4.4

  9. Binary-level Code Variations • Function inlining • Code reordering • Minor code changes • Alternative code sequences Labeling Library Functions in Stripped Binaries

  10. Semantic Descriptors • Rather than recording byte patterns, we take a semantic approach • Record information that is likely to be invariant across multiple versions of the function • <accept>: • mov %ebx, %edx • mov %0x66,%eax • mov $0x5,%ebx • lea 0x4(%esp),%ecx • int $0x80 • mov %edx, %ebx • cmp %0xffffff83,%eax • jae 8048300 • ret • mov %esi,%esi mov %0x66,%eax mov $0x5,%ebx {<socketcall >} int $0x80 , 5 Labeling Library Functions in Stripped Binaries

  11. Building Semantic Descriptors • reboot: • push %ebp • mov %esp,%ebp • sub $0x10,%esp • push %edi • push %ebx • mov 0x8(%ebp),%edx • mov $0xfee1dead,%edi • mov $0x28121969,%ecx • push %ebx • mov %edi,%ebx • mov $0x58,%eax • int $0x80 • … 0xfee1dead (reboot) binary 0x58 %edi 0x28121969 EAX EAX EBXECX SYSTEM CALL {<reboot, 0xfee1dead, 0x2812969>} We parse an input binary, locate system calls and wrapper function calls, and employ dataflow analysis. Labeling Library Functions in Stripped Binaries

  12. Building Semantic Descriptors Recursively • open: • … • mov $0x5, eax • int $0x80 • … {<open, “/etc/hostid”, 577, 420>} • sethostid: • … • call open • … • call write • … • mov $0x6, eax • int $0x80 • … • write: • … • mov $0x4, eax • int $0x80 • … {<write,?,?,4>} { <close>} { <close>, <open, “/etc/hostid”, 577,420>, <write,?,?,4>} Labeling Library Functions in Stripped Binaries

  13. Building a Descriptor Database Locate wrapper functions • <accept>: • mov %ebx, %edx • mov %0x66,%eax • mov$0x5,%ebx • lea 0x4(%esp),%ecx • int $0x80 • … glibc reference library Descriptor Database {<socketcall, 5>}: accept {<socketcall, 4>}: listen {<getpid>}: getpid … Build semantic descriptors unstrip Labeling Library Functions in Stripped Binaries

  14. Building a Descriptor Database Locate wrapper functions glibc reference library glibc reference library 1 1 1 1 • <accept>: • mov %ebx, %edx • mov %0x66,%eax • mov $0x5,%ebx • lea 0x4(%esp),%ecx • int $0x80 • … • <accept>: • mov %ebx, %edx • mov %0x66,%eax • mov $0x5,%ebx • lea 0x4(%esp),%ecx • int $0x80 • … • <accept>: • mov %ebx, %edx • mov %0x66,%eax • mov $0x5,%ebx • lea 0x4(%esp),%ecx • int $0x80 • … • <accept>: • mov %ebx, %edx • mov %0x66,%eax • mov $0x5,%ebx • lea 0x4(%esp),%ecx • int $0x80 • … glibc reference library glibc reference library Descriptor Database {<socketcall, 5>}: accept {<socketcall, 4>}: listen {<getpid>}: getpid … {<socketcall, 5>}: accept {<socketcall, 4>}: listen {<getpid>}: getpid … {<socketcall, 5>}: accept {<socketcall, 4>}: listen {<getpid>}: getpid … {<socketcall, 5>}: accept {<socketcall, 4>}: listen {<getpid>}: getpid … Build semantic descriptors unstrip Labeling Library Functions in Stripped Binaries

  15. Pattern Matching Criteria • Two stages • Exact matches • Best match based on coverage criterion • Handle minor code variations by allowing flexible matches Labeling Library Functions in Stripped Binaries

  16. Pattern Matching Criteria • A B = { b B | b A } A: {<socketcall,5>} fingerprint from the database <socketcall,5> B: {<socketcall,5>, <socketcall,5>, <futex>} <socketcall,5> <futex> coverage(A,B) = semantic descriptor from the code coverage(A,B) = Labeling Library Functions in Stripped Binaries

  17. Multiple Matches • It’s possible that two or more functions are indistinguishable • Policy decision: return set of potential matches • In practice, we’ve observed 8% of functions have multiple matches, but the size of the match set is small (≤ 3) Labeling Library Functions in Stripped Binaries

  18. Identifying Functions in a Stripped Binary • For each wrapper function { • 1. Build the semantic descriptor. • 2. Search the database for a match (apply two-stage matching process). • 3. Add label to symbol table. • } stripped binary Descriptor Database unstripped binary unstrip Labeling Library Functions in Stripped Binaries

  19. Implementation stripped binary parsing + library fingerprinting + binary rewriting Labeling Library Functions in Stripped Binaries

  20. Evaluation • To evaluate across three dimensions of variation, we constructed three data sets: • GCC version • glibc version • distribution vendor • In each set, compile statically-linked binaries, build a DDB, compare unstrip to IDA Pro’s FLIRT • Evaluation measure is accuracy Labeling Library Functions in Stripped Binaries

  21. Evaluation Results: GCC Version Study Labeling Library Functions in Stripped Binaries

  22. Evaluation Results: glibc Version Study Labeling Library Functions in Stripped Binaries

  23. Evaluation Results: Distribution Study Labeling Library Functions in Stripped Binaries

  24. unstripis available at http://www.paradyn.org/html/tools/unstrip.html Labeling Library Functions in Stripped Binaries

  25. Backup slides follow

  26. Evaluation Results: GCC Version Study(Temporal: backwards) Labeling Library Functions in Stripped Binaries

  27. Evaluation Results: glibc Version Study(Temporal: backwards) Labeling Library Functions in Stripped Binaries

  28. Evaluation Results: Distribution Study(one predicts the rest) Labeling Library Functions in Stripped Binaries

  29. Evaluation Results: GCC Version Study (one predicts the rest) Labeling Library Functions in Stripped Binaries

  30. Evaluation Results: glibc Version Study(one predicts the rest) Labeling Library Functions in Stripped Binaries

  31. Evaluation Results: Distribution Study(one predicts the rest) Labeling Library Functions in Stripped Binaries

More Related