Fuzzing with complexities

Fuzzing with complexities Vishwas Sharma http://nullcon.net/

Introduction http://nullcon.net/ We all have been a witness to major threats in the past years and I guess no one could forget names like ‘Conficker’ (1), ‘Stuxnet’ (2) and ‘Aurora Project’ (3). All these malware had a unique delivery system which was based on exploiting the host operating system and further talking control of the OS. These threats are always there and only thing we expect to achieve is that, we find vulnerability before a bad guy do and do something about it. Software companies spend a lot of their time and money in making their product more stable, more reliable and more secure. Vista Microsoft has made sure that functions like strcpy, sprintf etc. are eliminated at the Software development lifecycle (SDL)

Introduction Figure 1: Microsoft Simplified SDL (4) http://nullcon.net/ In fact all major vendors have realized the importance of having a secure SDL and importance of testing in their product. Google and Firefox have a policy of rewarding any researcher who comes up with a bug or a resulting exploit.

Software Testing http://nullcon.net/ Software testing is any activity aimed at evaluating an attribute or capability of a program or system and determining that it meets its required results. Unlike most physical systems, most of the defects in software are design errors, not manufacturing defects

Code Coverage http://nullcon.net/ Code coverage is one of the most important metrics used to decide on the completeness of the test cases. This metrics gives us the relationship between the test conducted and instructions executed with the application.

Code Coverage http://nullcon.net/ Of course this metrics can be further broken down into more detailed metrics • Function coverage - Has each function (or subroutine) in the program been called? • Statement coverage - Has each node in the program been executed? • Decision coverage - Has every edge in the program been executed? For instance, have the requirements of each branch of each control structure (such as in IF and CASE statements) been met as well as not met? • Condition coverage - Has each Boolean sub-expression evaluated both to true and false? • Condition coverage - Both decision and condition coverage should be satisfied.

Code Coverage An example of Code coverage http://nullcon.net/

Test needed to find bugs Code Coverage Tests needed for coverage Shows that even on a good coverage some bugs would still be left alone http://nullcon.net/

BlackBox Testing No knowledge of the inner working of the software, neither of the protocol or kind of input expected, this situation is rightly named as Black-box Testing http://nullcon.net/

Whitebox Testing Information on internal data structure and algorithms is completely shared between the product development team and the tester’s team Information can be used to test API’s, Code Coverage, fault injection, Mutation of testing and many more. http://nullcon.net/

Fuzzing http://nullcon.net/ The first person credit of working and formulating this technique is Barton Miller and his students from University of Wisconsin-Madison in 1989 In simple words it is the technique in which repeated invalid or mutated or malformed input is supplied to application with only intention to find bugs the application It is observed that fuzzing is most effective against application developed in C/C++, these languages make the programmer responsible for memory management whereas managed code i.e. developed in C#, Java etc. would yield bugs of a very different class

Fuzzing http://nullcon.net/

Fuzzing http://nullcon.net/ Important distinction between Fuzzing and other testing activity. This distinction is the intent. A testing team knows a lot about the program and basically test that whether a program is behaving as it is supposed to behave where as a security researcher only care that his fuzzer crashes your tested application.

Fuzzer http://nullcon.net/ I would like to make note of two python based fuzzing framework available in the open source community that I use most extensively. • PeachFuzzer - Peach is a SmartFuzzer that is capable of performing both generation and mutation based fuzzing (10). • Sulley - Sulley is a fuzzer development and fuzz testing framework consisting of multiple extensible components. Sulley (IMHO) exceeds the capabilities of most previously published fuzzing technologies, commercial and public domain

Fuzzer Peach Fuzzing Platform • Sulley - Sulley is a fuzzer development and fuzz testing framework consisting of multiple extensible components. Sulley exceeds the capabilities of most previously published fuzzing technologies, commercial and public domain http://nullcon.net/ I would like to make note of two python based fuzzing framework available in the open source community that I use most extensively. • PeachFuzzer - Peach is a SmartFuzzer that is capable of performing both generation and mutation based fuzzing.

Fuzzer http://nullcon.net/ Peach is been improved day in and day out and it is the only other open source fuzzer that is maintained apart from Metasploit fuzzer. Peach is written as primary data fuzzer, but as it open source it can be extended to secondary and even nth-class fuzzer. Peach fuzzer is also used by adobe in its testing of Adobe reader Sulley is not maintained but is as good as you can get when it comes to generation based fuzzing Collection of fuzzers http://packetstormsecurity.org/fuzzer/

Complexity http://nullcon.net/ “Software bugs will almost always exist in any software module with moderate size: not because programmers are careless or irresponsible, but because the complexity of software is generally intractable -- and humans have only limited ability to manage complexity. It is also true that for any complex systems, design defects can never be completely ruled out” - Jiantao Pan, Carnegie Mellon University In many of the fuzzers it is observed that test cases produced fails to achieve the basic packet sanitation test of the target application if the fuzzer is has improper understanding of the input type and structure

Complexity Analysis based on Effort in producing fuzzer and defects found correlated with kind of fuzzer http://nullcon.net/ A study done by Microsoft on a 450 lines of code and then testing it with various fuzz combinations to see the effective results that was produced is shown below :

Packets http://nullcon.net/ • An example of ASCII based packet (irc) • There are few other examples quite popularly known eg. • HTML • CSS • FTP • And many more

Binary based Packets But what happens when the formats no longer sticks to one data format? What happens when our data switches from one set of data format like ASCII to binary and then binary to ASCII again and to add a cherry on top sections are encoded differently even the ascii portion can be encoded and even imported from other binary or ASCII based formats http://nullcon.net/

Example of one such format Example of one such complex formats ie. PDF We see these being used in every day applications like office documents, Adobe PDF, SMB protocols and more. One cannot try to randomly fuzz these files as they have pretty good input validation modules which prevent any dumb attempt to fuzz them http://nullcon.net/

What we know so far http://nullcon.net/ What we have gathered until here is summarized here as we move ahead you will find answers to these problems

Some answers http://nullcon.net/ Code Coverage fails for these applications Protocol awareness can be used as once we have all the information of a protocol that we could have, we can intuitively say that the packet which contains the most number of tags or objects would require more code to be covered with that module. Now this could be said that we cannot guarantee the code coverage still because if we do not find a packet that contains all the tags or object Testing all cases in one go was never the idea but multiple tests covering every tag is what will be fruitful. Data format inconsistency One can easy write a fuzzer of either and ASCII based packet or for binary based packet. But when these formats get together in a packet, it becomes unnaturally difficult to write one. The solution lays in visualizing and breaking problem in parts which we most comfortable in. We can use the separate out the data generation capability from both ASCII and Binary format. Remember here I have trying to separate out these capabilities not necessarily for fuzzing.

Some answers http://nullcon.net/ Multiple Files Embedded in a single packets With separating of types we can further separate to a secondary level data production module ie. A different level of generating data. What this means is that if a PDF file if we have a font and image embedded inside the file we can actually write a different fuzzer for font and for an image and combines each of these result with the PDF files in the manner similar to multiple encoding level problem. Multiple Encoding levels As we have separated ASCII with Binary in the same format one can further add custom encoding in each packet as one like. They will all fall back together when we combine them later. See the case study for more clarification. For example in a PDF file if we have a multiple font embedded inside the file we can make use of different encoders for each such font as each is generated separately

Strategy http://nullcon.net/ Now is the right time to talk about the strategy that I have used when fuzzing one such format, PDF. You will find different definition of these terms, but this is what I understand out of them. This process is typically described in the terms of system under test and called for directed area with the system, where as in my study I have taken it out of box and placed these conditions on Data packet itself.

Attack point selection Attack Point Selection The attach point selection is a simple process in which I have tried to specify a specific point within the packet which needs to be tested. Now selection of these points depends a lot upon some gathered intelligence of the system, including pervious vulnerabilities. As this eliminates a few attack point as they have already been attacked before. For example if working on a simple PDF file which contain a U3D file which is known to previously cause a vulnerability in Adobe reader one can say this format is previously been tested primarily (after looking at the vulnerability) so a lot more efforts would be required in finding a vulnerability next time. One can focus his time and energy in finding other routes into the application which has still not been tested by security researchers. http://nullcon.net/

To Fuzz Directed Fuzzing Whenever a vulnerability is released it is released with a very few information. One such disclosure example would be. Adobe Flash Player Multiple Tag JPEG Parsing Remote Code Execution Vulnerability -- Vulnerability Details: This vulnerability allows remote attackers to execute arbitrary code on vulnerable installations of Adobe Flash Player. User interaction is required in that a target must visit a malicious website. The specific flaw exists within the code for parsing embedded image datawithin SWF files. The DefineBits tag and several of its variations are prone to a parsing issue while handling JPEG data. Specifically, the vulnerability is due to decompression routines that do not validate image dimensions sufficiently before performing operations on heap memory. An attacker can exploit this vulnerability to execute arbitrary code under the context of the user running the browser. http://nullcon.net/ Figure 7: An example of Vulnerability disclosure

Demo CVE 2010-2862 Integer overflow in CoolType.dll in Adobe Reader 8.2.3 and 9.3.3, and Acrobat 9.3.3, allows remote attackers to execute arbitrary code via a TrueType font with a large maxCompositePoints value in a Maximum Profile (maxp) table. http://nullcon.net/ Figure 7: An example of Vulnerability disclosure

Fuzzing with complexities