1 / 25

Triaging Bugs with Dynamic Dataflow Analysis

Julio Auto [julio {funny a} julioauto com]. Triaging Bugs with Dynamic Dataflow Analysis. Agenda. The Problem The Solution Demo Solution Details What’s Next? Greetings & References. Preface. We will be talking about analyzing closed-source software here

amergin
Download Presentation

Triaging Bugs with Dynamic Dataflow Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Julio Auto [julio {funny a} julioauto com] Triaging Bugs with Dynamic Dataflow Analysis

  2. Agenda • The Problem • The Solution • Demo • Solution Details • What’s Next? • Greetings & References

  3. Preface • We will be talking about analyzing closed-source software here • Absolutely no debugging information needed • However... • Depending on the complexity of the bug, even people with the source might opt for this analysis too • E.g. Vendors receiving crash reports

  4. The Problem • Sometimes people just have to analyze bugs in closed-source software • These bugs may come from: • A fuzzing session • Contributor-sent Proof-of-Concept codes • In-the-wild exploit code • Etc... • As varying as the sources of bugs are the reasons why one wants to analyze them, but this is irrelevant. The fact is...

  5. The Problem (2) • ANALYZING BUGS CAN BE HARD! • A seasoned reverse engineer may take weeks to get somewhere • If the target software is too big • If the data consumed is in a very complex and/or undisclosed format • If bugs in this target are so rare that your reversing team has no previous experience with it • But which bugs do we mostly care for?

  6. The Problem (3) • “Analyzing bugs” is very broad • No ./write-me-a-very-detailed-advisory • We will concentrate in answering one question: what exact part of my data made the program crash? • Understanding that and how such data is transformed is primordial

  7. The Solution • Dynamic Dataflow Analysis • Watching data and its ramifications as the doomed program executes • What we do really is Taint Analysis • We start with a subset of the program’s data: the attacker’s input – assume it’s evil • Its ‘ramifications’ are tainted memory, tainted registers • ... but we do it backwards.

  8. The Solution (2) TaintANalysis Backwardstaintanalysis This is theEvil Input This is ofinterest Is anyoftheseofinterest? Is anyofthesefromtheEvil Input?

  9. The Solution (3) • So we really don’t care about every tainted piece of data in the process space • Most of it is legitimate, anyway • Thus, we avoid the explosion of watched data • Plus we can do stuff like: • Bug: mov eax, [esi] (where esi = DEADBEEFh) • Analysis runs... • ... and reports: esi = user[4] + var_unk * 8

  10. The Solution (4) • This is all done in two steps: tracing and analysis • First we trace the program from a “good” point until it crashes • The trace is incrementally dumped to a file • Not just the disassembly, but also some extra info • E.g.: In the past slide’s example, effective address ([esi]) == DEADBEEFh • Then the trace file goes under analysis

  11. The Solution (5) • The “good” starting point

  12. The Solution (6) • So we feed the trace file to the analyzer and tell it: • “Address ranges from ABCDh to ACCDh and from DCBAh to DCCAh held Evil Input” • “I wanna know if ‘esi’ was tainted by Evil Input” • And magic happens! 

  13. The Solution (7) • Considerations • Tracing is verytime-consuming • For the bug I’ll analyze as an example, it takes about 2 hours to dump the 650,000+ instructions it executes • Theanalysis... notsomuch • 1 to 2 minutes • May sound like much, but how long would take to do it manually? • Plus, youcanalways use this time to do somethingelsewhilethecomputer is working for you

  14. Demo • Introducing... Visual Data Tracer!

  15. Solution Details • The VDT Tracer is implemented as a WinDbg extension • Because WinDbg is free and it’s a great debugger • The VDT Analyzer is a stand-alone C++ app • The tracer needs to understand some simple instruction “semantics” • E.g.: The source and destination operands • Currently only the basic x86 subset is implemented (no x87, MMX, etc)

  16. Solution Details (2) • The semantic rules are simplified to avoid dumping useless info to the trace file • E.g.: a ‘push’ does not meaninfgully change ‘esp’ (same for ‘inc’, ‘dec’, and their destination ops) • They are also written to fit the very simplistic format of the trace file entries • All of this makes the analysis easier, thus faster, and yet useful

  17. Solution Details (3) • Trace file entry: • Mnemonic • Destination operand • Source operand • Up to three source operand “dependences” • Dependences are, for example, the elements of an indirectly addressed memory operand • This effectively exposes the dataflow relations as a Tree (rooted at the crash instruction) • Performing the backwards taint analysis becomes then a matter of searching the tree, which VDT does with a BFS algorithm

  18. Solution Details (4) • Putting it together so far movedi, 0x1234 ; dst=edi, src=0x1234 moveax, [0xABCD] ; dst=eax, src=ptr 0xABCD ; Note 0xABCD is evil addr leaebx, [eax+ecx*8] ; dst=ebx, src=eax, srcdep1=ecx mov [edi], ebx ; dst=ptr 0x1234, src=ebx movesi, [edi] ; dst=esi, src=ptr 0x1234, srcdep1=edi movedx, [esi] ; Crash!!!

  19. Solution Details (5) • Simplifying semantic rules to fit that format is not always easy • CMPXCHG r/m32, r32 • “Compare EAX with r/m32. If equal, ZF is set and r32 is loaded into r/m32. Else, clear ZF and load r/m32 into EAX.” • The aftermath: the need for “conditional taints” • i.e. One of the possibilities of controlling ‘r/m32’ is controlling ‘r32’ AND ‘eax’ • Note that “alternative taints” is also existant, implemented in the form of srcdep{1,2,3}

  20. Solution Details (6) • Other subtleties to watch for • AH defines EAX • EAX defines AL • AL does not define AH • Similar problem for 1-byte and 2-byte memory accesses

  21. Release • This is a private tool • Have not gone under public release so far • SOURCE attendees will get it, though • PLEASE, do not redistribute • In the next few hours, downloadable at: • http://www.julioauto.com/VDT.zip • After I remove it from there, you can get it by e-mailing me

  22. What’s Next? • Extending the coverage of x86 • Enhancing speed • God knows how... • Heuristically detecting user input • e.g. By making the tracer understand CreateFile() • Automatic exploit generation • What else? • Any ideas, let me know...

  23. References • SpiderPig Project - http://piotrbania.com/all/spiderpig/ • Very similar ideas, different approach • !exploitable - http://www.codeplex.com/msecdbg • A more superficial (but much faster) tool for bug triaging • If you have many bugs to triage, you can first run !exploitable on them and, then, use VDT on those that seem really interesting

  24. Greetings • iSight Partners • For sponsoring this work! • Julien Vanegue • For all the lecturing, motivating and supporting • Piotr Bania • For discussing DDF analysis and much more • People from PSV (http://www.unprotectedhex.com/psv) • For letting me idle on IRC, leeching their knowledge • Everyone else who talks to me about security and similarly cool stuff

  25. Julio Auto [julio {funny a} julioauto com] Triaging Bugs with Dynamic Dataflow Analysis

More Related