1 / 39

Decompilers and beyond

2. (c) 2008 Hex-Rays SA. Presentation Outline. Why do we need decompilers?Complexity must be justifiedTypical decompiler designThere are some misconceptionsDecompiler based analysisNew analysis type and tools become possibleFuture

bellona
Download Presentation

Decompilers and beyond

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Decompilers and beyond

    2. 2 (c) 2008 Hex-Rays SA Presentation Outline Why do we need decompilers? Complexity must be justified Typical decompiler design There are some misconceptions Decompiler based analysis New analysis type and tools become possible Future “...is bright and sunny” Your feedback Online copy of this presentation is available at http://www.hex-rays.com/idapro/ppt/decompilers_and_beyond.ppt

    3. 3 (c) 2008 Hex-Rays SA Disassemblers We need disassemblers to analyze binary code Simple disassemblers produce a listing with instructions Better disassemblers assist in analysis by annotating the code, good navigation etc. You know the difference. Even the ideal disassembler stays at low level: the output is an assembler listing The main output of a disassembler is still one-to-one mapping of opcodes to instruction mnemonics No leverage, no abstractions, little insight The analyst must mentally map assembly instructions to higher level abstractions and concepts A boring and routine task after a while

    4. 4 (c) 2008 Hex-Rays SA Disassembler limitations The output is Boring Inhuman Repetitive Error prone Requires special skills Did I say repetitive? Yet some geeks like it?...

    5. 5 (c) 2008 Hex-Rays SA Decompilers The need: Software grows like gas Time spent on analysis skyrockets Malware proliferates and mutates We need better tools to handle this Decompilation is the next logical step, yet a tough one

    6. 6 (c) 2008 Hex-Rays SA Building ideal decompiler The answer is clear and easy to give: ideal decompilers do not exist It is customary to compare compilers and decompilers: Preprocessing Lexical analysis Syntax analysis Code generation Optimization This comparison is correct but superficial

    7. 7 (c) 2008 Hex-Rays SA Compilers are privileged Strictly defined input language Anything nonconforming – spit out an error message Reasonable amount of information on all functions, variables, types, etc. The output may be ugly Who will ever read it but some geeks? :)?

    8. 8 (c) 2008 Hex-Rays SA Machine code decompilers are impossible Informal and sometimes hostile input Many problems are unsolved or proved to be unsolvable in general The output is examined in detail by a human being, any suboptimality is noticed because it annoys the analyst Conclusion: robust decompilers are impossible What if we address the common cases? For example, if we cover 90%, will the rest be handled manually?

    9. 9 (c) 2008 Hex-Rays SA Easy for humans, hard for computers In fact, many (all?) problems encountered during decompilation are hard For every problem, there is a naďve solution, which, unfortunately, does not work Just a few examples...

    10. 10 (c) 2008 Hex-Rays SA Function calls are a problem Function calls require answering the following questions: Where does the function expect its input registers? Where does it return the result? What registers or memory cells does it spoil? How does it change the stack pointer? Does it return to the caller or somewhere else?

    11. 11 (c) 2008 Hex-Rays SA Function return values are a problem Does the function return anything? How big is the return value?

    12. 12 (c) 2008 Hex-Rays SA Function input arguments are a problem When a register is accessed, it can be To save its value To allocate stack frame Used as function argument

    13. 13 (c) 2008 Hex-Rays SA Indirect accesses are a problem Pointer aliases No precise object boundaries

    14. 14 (c) 2008 Hex-Rays SA Indirect jumps are a problem Indirect jumps are used for switch idioms and tail calls Recognizing them is necessary to build the control flow graph

    15. 15 (c) 2008 Hex-Rays SA Problems, problems, problems... Save-restore (push/pop) pairs Partial register accesses (al/ah/ax/eax)? 64-bit arithmetic Compiler idioms Variable live ranges (for stack variables)? Lost type information Pointers vs. numbers Virtual functions Recursive functions

    16. 16 (c) 2008 Hex-Rays SA Hopeless situation? Well, yes and no While fully automatic decompiler capable of handling arbitrary input is impossible, approximative solutions exist We could start with a “simple” case: Compiler generated output (no hostile adversary generating increasingly complex input)? Only 32-bit code No floating point, exception handling and other fancy stuff

    17. 17 (c) 2008 Hex-Rays SA Basic ideas Make some configurable assumptions about the input (calling conventions, stack frames, memory model, etc)? Use sound theoretical approach to solvable problems (data flow analysis on registers, peephole optimization within basic blocks, instruction simplification, etc)? Use heuristics for unsolvable problems (indirect jumps, function prolog/epilogs, call arguments)? Prefer to generate ugly but correct output rather than nice but incorrect code Let the user guide the decompilation in difficult cases (specify indirect call targets, function prototypes, etc)? Interactivity is necessary to achieve good results

    18. 18 (c) 2008 Hex-Rays SA Decompiler architecture Overall, it could look like this:

    19. 19 (c) 2008 Hex-Rays SA Decompilation phases - 1

    20. 20 (c) 2008 Hex-Rays SA Decompilation phases - 2

    21. 21 (c) 2008 Hex-Rays SA Microcode – just generated It is very detailed Redundant One basic block at a time

    22. 22 (c) 2008 Hex-Rays SA After preoptimization

    23. 23 (c) 2008 Hex-Rays SA After local optimization This is much better Please note that the condition codes are still present because they might be used by other blocks Use-def lists are calculated dynamically

    24. 24 (c) 2008 Hex-Rays SA After global optimization Condition codes are gone The LDX instruction got propagated to jz and all references to eax are gone Note that the jz target has changed (@3) since global optimization removed some unused code and blocks We are ready for local variable allocation

    25. 25 (c) 2008 Hex-Rays SA After local variable allocation All registers have been replaced by local variables (ecx0, esi1; except ds)? Use-def lists are useless now but we do not need them anymore Now we will perform structural analysis and create pseudocode

    26. 26 (c) 2008 Hex-Rays SA Control graphs Original graph view Control flow graph

    27. 27 (c) 2008 Hex-Rays SA Graph structure as a tree Structural analysis extracts the standard control flow constructs from CFG The result is a tree similar to the one below. It will be used to generate pseudocode The structural analysis algorithm is robust and can handle any graphs, including irreducible ones

    28. 28 (c) 2008 Hex-Rays SA Initial pseudocode is ugly Almost unreadable...

    29. 29 (c) 2008 Hex-Rays SA Transformations improve it Some casts still remain

    30. 30 (c) 2008 Hex-Rays SA Interactive operation allows us to fine tune it Final result after some renamings and type adjustments: The initial assembly is too long to be displayed on a slide Pseudocode is much shorter and more readable

    31. 31 (c) 2008 Hex-Rays SA What decompilation gives us Obvious benefits Saves time Eliminates routine tasks Makes source code recovery easier (...)? New things Next abstraction level - closer to application domain Data flow based tools (vulnerability scanner, anyone? :)? Binary translation

    32. 32 (c) 2008 Hex-Rays SA Base to build on... To be useful and make other tools possible, decompiler must have a programmable API It already exists but it needs some refinement Microcode is not accessible yet Decompiler is retargetable (x86 now, ARM will be next)? Both interactive and batch modes are possible In addition to being a tool to examine binaries, decompiler could be used for...

    33. 33 (c) 2008 Hex-Rays SA ...program verification Well, “verification” won't be strict but it can help to spot interesting locations in the code: Missing return value validations (e.g. for NULL pointers)? Missing input value validations Taint analysis Insecure code patterns Uninitialized variables etc..

    34. 34 (c) 2008 Hex-Rays SA ...assembly listing improvement Hardcore users who prefer to work with assembly instructions can benefit from data flow analysis results Hover the mouse over a register or data to get: Its possible values or value ranges Locations where is is defined Locations where it is used Highlight definitions or uses of the current register in two different colors Show list of indirect call targets, calling conventions, etc Gray out dead instructions Determine if a value comes from a system call (ReadFile)? etc...

    35. 35 (c) 2008 Hex-Rays SA ...more insight into the application domain One could reconstruct data types used by the application In fact, serious reverse engineering is impossible without knowing data types (.,,)? Fortunately API already exposes all necessary information for type handling Plenty of work ahead

    36. 36 (c) 2008 Hex-Rays SA ...more abstract representations Tools to build more abstract representations Function clustering (think of modules or libraries)? Global data flow diagrams (functions exposed to tainted data in red)? Statistical analysis of pseudocode C++ template detection, generic code detection

    37. 37 (c) 2008 Hex-Rays SA ...binary code comparison You know better than me the possible applications To find code plagiarisms To detect changes between program versions To find library functions (high-gear FLIRT)? etc... (you know better than me :)?

    38. 38 (c) 2008 Hex-Rays SA Back to the earth The tools and possibilities described on the previous slides do not exist yet Yes they become possible thanks to decompilation We have a long way to go More processors and platforms Floating point calculations Exception handling Type recovery Handling hostile code In fact, too many ideas to enumerate them here The future is bright... is it?...

    39. 39 (c) 2008 Hex-Rays SA The “thank you” slide Thank you for your attention! Questions?

More Related