1 / 21

Dynamic Data Structure Excavation or “ Gimme back my symbol table!”

Dynamic Data Structure Excavation or “ Gimme back my symbol table!”. Asia Slowinska , Traian Stancescu , Herbert Bos VU University Amsterdam. Anonymous bytes only…. Goals. Long term: reverse engineer complex software. struct employee { char name [ 128 ]; int year ; int month ;

urbana
Download Presentation

Dynamic Data Structure Excavation or “ Gimme back my symbol table!”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Data Structure Excavationor “Gimme back my symbol table!” Asia Slowinska, TraianStancescu, Herbert Bos VU University Amsterdam

  2. Anonymous bytes only…

  3. Goals • Long term: reverse engineer complex software structemployee{ charname[128]; intyear; intmonth; intday; }; structemployee* foo (structemployee* src) { structemployeedst; // init dst }

  4. Goals • Long term: reverse engineer complex software • Short term: reverse engineer data structures structs1{ charf1[128]; intf2; intf3; intf4; }; structs1* fun1(structs1* a1) { structs1l1; }

  5. WHY?

  6. Application I: legacy binary protection • Legacy binaries everywhere • We suspect they are vulnerable But… How to protect legacy code from memory corruption? Answer: find the buffers and make sure that all accesses to them do not stray beyond array bounds.

  7. Application II: binary analysis • We found a suspicious binary – is it malware? • A program crashed… - let’s investigate! But… Without symbols, what can we do? Answer: generate the symbols ourselves!

  8. (demo later)

  9. Why is it difficult? 1. structemployee { 2.char name[128]; 3.int year; 4.int month; 5.intday; 6. }; 7. 8.structemployeee; 9.e.year = 2010; • MISSING • Data structures • Semantics Instr 1 Instr 2

  10. Data structures: key insight 1. structemployee { 2.char name[128]; 3.int year; 4.int month; 5.int day 6. }; 7. 8.structemployeee; 9.e.year = 2010; Yes, data is unstructured… But – usage is NOT!

  11. Data structures: key insight 1. structemployee { 2.char name[128]; 3.int year; 4.int month; 5.int day 6. }; 7. 8.structemployeee; 9.e.year = 2010; Yes, data is unstructured… But – usage is NOT!

  12. Data structures: key insight Analyse dynamically 1. structemployee { 2.char name[128]; 3.int year; 4.int month; 5.int day 6. }; 7. 8.structemployeee; 9.e.year = 2010; test app KLEE/ S2E Emulator inputs data structures

  13. Intuition 2. and A is an address of a structure, then *(A + 8) is perhaps a field in this structure field3 • Observe how memory is used at runtime to detect data structures • E.g., if A is a pointer… field2 field1 A field0 3. and A is an address of an array, then *(A + 8) is perhaps an element of this array 3. and A is an address of an array, then *(A + 8) is perhaps an element of this array and A is a function frame pointer, then *(A + 8) is perhaps a function argument fun arg2 elem5 elem4 fun arg1 return addr elem3 A parent EBP elem2 elem1 A elem0

  14. Arrays are tricky Access pattern & detection: • elem = next++; • Look for chains of accesses in a loop

  15. Arrays are tricky Access pattern & detection: • elem = next++; • Look for chains of accesses in a loop • elem = array[i]; • Look for sets of accesses with the same base in a linear space

  16. Arrays are tricky Access pattern & detection: • elem = next++; • Look for chains of accesses in a loop • elem = array[i]; • Look for sets of accesses with the same base in a linear space Challenges: • Boundary elements accessed outside the loop • Nested loops • Multiple loops in sequence

  17. More challenges structure array 1 array 2 Examples: • Decide which memory accesses are relevant • Problems caused by e.g.,memset-like functions Suggested by memset

  18. More challenges structure array 1 array 2 Examples: • Decide whichmemory accessesare relevant • Problems caused by e.g.,memset-like functions • Even more in the paper  Suggested by memset

  19. Results in terms of accuracy – heap memory bytes variables

  20. demo now

  21. Conclusions • We can recover data structures by tracking memory accesses • We believe we can protect legacy binaries • We are working on data coverage

More Related