1 / 25

Recovery of Variables and Heap Structure in x86 Executables

Recovery of Variables and Heap Structure in x86 Executables. Gogul Balakrishnan Thomas Reps University of Wisconsin. Overview. Introduction Challenges Background Recovering A-locs via Iteration An Abstraction for Heap-Allocated Storage Experiments. Introduction.

nelia
Download Presentation

Recovery of Variables and Heap Structure in x86 Executables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin

  2. Overview • Introduction • Challenges • Background • Recovering A-locs via Iteration • An Abstraction for Heap-Allocated Storage • Experiments

  3. Introduction • The Need of Analyzing Executables • What You See Is Not What You eXecute • Many Obstacles in Analyzing Executables • Data Objects are Not Easily Identifiable. • Absence of Symbol Table & Debugging Information • DeterminingtheMemoryAddressesofDataObjects • Difficult to Track the Flow of Data through Memory • Challengingtogetusefulinformationabouttheheap e.g) memset(password, ‘\0’, len); free(password);

  4. Challenges(1/3) • Recovering Variable-like Entities • The layout of Memory is known at Compile time or Assembly time (IDAPro’ Approach) • To Recover y, the Set of Values that eax Holds at 5 Needs to be Determined. void main() { int x, y; x = 1; y = 2; return; } proc main 1 mov ebp, esp 2 sub esp, 8 3 mov [ebp-8], 1 4 mov eax, ebp 5 mov [eax-4], 2 6 add esp, 8 7 retn

  5. Challenges(2/3) • GranularityofRecoveredVariable-likeEntities • Affects the complexity and accuracy of subsequent analyses • The Structure of Heap-Allocated Objects • Only the Size of the Allocated Block is Known. • Using Abstract-Refinement Algorithm

  6. Challenges(3/3) • Resolving Virtual-Function Calls • A Definite Link between the Object and the Virtual Function Table is Never Established. (Weak Update) one-variable-per-malloc-site abstraction

  7. Background(1/6) • Abstract Locations (A-locs) • Memory Region • A Set of Disjoint Memory Areas • Represents a Group of Locations that have Similar Runtime Properties • Abstract Locations • Locations between two addresses/offsets in Memory-Region • Address & Offsets are Statically Determined

  8. Background(2/6) • Abstract Locations (cont’d) proc main 0 mov ebp,esp 1 sub esp,40 2 mov ecx,0 3 lea eax,[ebp-40] L1: mov [eax], 1 5 mov [eax+4],2 6 add eax, 8 7 inc ecx 8 cmp ecx, 5 9 jl L1 10 mov eax,[ebp-36] 11 add esp,40 12 retn

  9. Background(3/6) • Value-Set Analysis (VSA) • Combined Numeric-Analysis & Pointer-Analysis • Over-Approximation of the values that each a-loc holds at each program point • Value-Set • The Set of Addresses and Numeric Values • N-tuple of strided intervals of the form s[l, u] • (Global Region, Procedure Region, …) • (1[0, 9], ∮) versus (∮, -8[-40, -8]) N : the number of memory-regions e.g) 8[-40, -8] = {-40, -32, -24, -16, -8}

  10. Background(4/6) • Value-Set Analysis (cont’d) • The Value-Set of eax at L1 • (∮, 8[-40, -8]) • eax holds the offsets {-40, -32, -24, -16, -8} • Starting Addresses of Field x of p proc main 0 mov ebp,esp 1 sub esp,40 2 mov ecx,0 3 lea eax,[ebp-40] L1: mov [eax], 1 5 mov [eax+4],2 6 add eax, 8 7 inc ecx 8 cmp ecx, 5 9 jl L1 10 mov eax,[ebp-36] 11 add esp,40 12 retn Typedef struct { int x, y; } Point; int main() { int i; Point p[5]; for(i=0; i<5; ++i) { p[i].x = 1; p[i].y = 2; } return p[0].y; }

  11. Background(5/6) • Aggregate Structure Identification (ASI) • Can Distinguish between Accesses to Different Parts of the Same Aggregate • Aggregate is broken up into smaller parts (atoms) • Data-Access Constraint Language (DAC) • Specifying Data-Access Pattern in the Program

  12. Background(6/6) • Aggregate Structure Identification (cont’d) • Data-Access Constraint Language (DAC) • DataRef [l : u] refers to bytes l through u in DataRef • DataRef n : n isthe number of elements • ASI DAG e.g) P[0:11] 3 = P[0:3], P[4:7], or P[8:11] return_main p[0:39] 5[0:3] ≈ const_1[0:3]; p[0:39] 5[4:7] ≈ const_2[0:3]; return_main[0:3] ≈ p[4:7]

  13. Recovering A-locs via Iteration • Problems of VSA • Can only Represent a Contiguous Sequence of Memory Locations • Cannot Detect Internal Substructure • Basic Idea • VSA is used to obtain memory-access patterns in the executable; • ASI is used as a heuristic to determine a set of a-locs according to the memory-access patterns obtained from the information recovered by VSA. Final Value-Sets ASI VSA IDAPro

  14. Recovering A-locs via Iteration • Generating Data-Access Constraints from Value Input : (r, s[l, u], length) Output : (ASI Ref, Boolean) <Algorithm 1 SI2ASI> ifs[l,u] is a singleton then return <“r[l : l+length-1]”, true> else size ← max(s, length) n ← (u – l + size – 1) / size ref ← “r[l : u+size-1] n[0 : size-1]” return <ref, (s = size)> enf if e.g) s[l, l] Actual Byte Range The number of array elements (AR_main, 8[-40, -8], length) => {AR_main[(-40):(-1)] 5[0:7]} AR_main[-40:-33][0:7] AR_main[-32:-25][0:7] AR_main[-24:-17][0:7] AR_main[-16:-9][0:7] AR_main[-8:-1][0:7]

  15. Recovering A-locs via Iteration • Generating Data-Access Constraints from Value e.g) eax : (1[0:9], ∮) ecx : (∮, 16[-160, -16]) In case of [ecx+eax] => AR[-160:-1] 10[0:15] [0:9] 10[0:0] Row-major order <Algorithm 2> if (s1[l1,u1] or s2[l2,u2] is a singleton then return SI2ASI(r, s1[l1, u1] ⊕ s2[l2, u2], length) end if if s1≥ (u2 – l2 + length) then baseSI ← s1[l1, u1] indexSI ← s2[l2, u2] else if s2≥ (u1 – l1 + length) then baseSI ← s2[l2, u2] indexSI ← s1[l1, u1] else return SI2ASI(r, s1[l1, u1] ⊕ s2[l2, u2], length) end if <baseRef, exactRef> ← SI2ASI(r, baseSI, stride(baseSI)) if exactRef is false then return SI2ASI(r, s1[l1, u1] ⊕ s2[l2, u2], length) else return concat(baseRef, SI2ASI(‘’, indexSI, length)) endif Base Addr Determine base register Index Addr Base Addr

  16. Recovering A-locs via Iteration • Interpreting Indirect Memory-References • Lookup Algorithm • NodeDesc : <name, length> • NodeDescList :An Ordered List of NodeDesc • Three Operations name :the name associated with the ASI tree node length : the length of above node e.g) [nd1, nd2, …, ndn]

  17. Recovering A-locs via Iteration • Lookup Algorithm Examples e.g) Lookup p[0:39] 5[0:3] GetChildren(p) = [<a3, 4>, <a4, 4>, <i2, 32>] GetRange(0, 39) = [<a3, 4>, <a4, 4>, <i2, 32>] GetArrayElements(5) = [<a3, 4>, <a4, 4>], [<a5, 4>, <a6, 4>] GetRange(0, 3) = [<a3, 4>, <a5, 4>]

  18. An Abstraction for Heap-Allocated Storage • Previous Abstraction • Recency Abstraction • Allowing VSA & ASI to recover Info. About virtual-function tables • Use Two Memory-Regions per allocation site s • MRAB[s] : Most Recently Allocated Block • NMRAB[s] : Non-Most Recently Allocated Block • count : How many concrete blocks the memory-region represents (MRAB[s].count, NMRAB[s].count) • SmallRange = {[0, 0], [0, 1], [1, 1], [0, ∞], [1, ∞], [2, ∞]} • size : over-approximation of the size of block (MRAB[s].size, NMRAB[s].size) All of the nodes allocated at a given allocation site s are folded together into a single summary node ns.

  19. An Abstraction for Heap-Allocated Storage • Operation • AbsEnv[s]:MRAB[s]/NMRAB[s]→<count,size,alocEnv> • AlocEnv = a-loc → ValueSet • Allocation site s transforms absEnv to absEnv’ • absEnv’(MRAB[s]) = <[0,1], size, a-loc.Value-Set> • absEnv’(NMRAB[s]).count = absEnv(NMRAB[s]).count + absEnv(MRAB[s]).count • absEnv’(NMRAB[s]).size = absEnv(NMRAB[s]).size ∪ absEnv(MRAB[s]).size • absEnv’(NMRAB[s]).alocEnv = absEnv(NMRAB[s]).alocEnv ∪ absEnv(MRAB[s]).alocEnv

  20. An Abstraction for Heap-Allocated Storage

  21. Experiments • Environments • Software

  22. Experiments • Results of Virtual-Function Call Resolution

  23. Experiments • Results of A-loc Identification • Comparing the Results of Algorithm with Debugging Information The structure of 87% of the local variables is correct

  24. Experiments • Results of A-loc Identification The structure of 72% of the objects in the heap is correct

  25. Q & A

More Related