340 likes | 471 Views
TAintscope. A Checksum-Aware Directed fuzzing Tool for Automatic Software Vulnerability Detection. Tielei Wang 1 , Tao Wei 1 , Guofei Gu 2 , Wei Zou 1 1 Peking University, China 2 Texas A&M University, US. 2. terms.
 
                
                E N D
TAintscope A Checksum-Aware Directed fuzzing Tool for Automatic Software Vulnerability Detection Tielei Wang1, Tao Wei1, Guofei Gu2, Wei Zou1 1Peking University, China 2Texas A&M University, US
2 terms • Checksum – a way to check the integrity of data. Used in network protocols and files. • Fuzzing – generating malformed inputs and feeding them to the application. • Dynamic Taint Analysis – runs a program and observes which computations are affected by predefined taint sources (e.g. input) data Checksum function data Checksum field
3 The problem • The input mutation space is enormous . • Most malformed inputs dropped at an early stage, if the program employs a checksum mechanism.
4 The problem 1 void decode_image(FILE* fd){ 2 ... 3 int length = get_length(fd); 4 intrecomputed_chksum = checksum(fd, length); 5 intchksum_in_file = get_checksum(fd); //line 6 is used to check the integrity of inputs 6 if(chksum_in_file != recomputed_chksum) 7 error(); 8 int Width = get_width(input_file); 9 int Height = get_height(input_file); 10 int size = Width*Height*sizeof(int); 11 int* p = malloc(size); 12 ... 13 for(i=0; i<Height; i++){// read ith row to p 14 read_row(p+Width*i, i, fd);
5 The IDEA • To infer whether/where a program checks the integrity of input. • Identify which input bytes can flow into sensitivepoints: Taint analysis at byte level – monitors how application uses the input data. • Create malformed input focusing the “hot bytes”. • Repair checksum fields in input, to expose vulnerability. • Fully automatic • Found 27 new vulnerability – acrobat reader, googlepicasa and more.
6 How does it work? • Dynamic taint tracing • Detecting checksum • Directed fuzzing • Repairing crashed samples
7 How does it work? Modified Program Crashed Samples Checksum Locator Directed Fuzzer Checksum Repairer Instruction Profile Hot Bytes Info Reports Execution Monitor
8 How does it work? • Dynamic taint tracing • Runs the program with well-formed input. • Execution monitor records: • Which input bytes related to arguments of API functions (e.g. malloc, strcpy) – “hot bytes” report. • Which bytes each conditional jump instruction depends on (e.g. JZ, JE, JB) – checksum report. • Considering only data flow (no control flow).
9 How does it work? • Dynamic taint tracing • Instruments instructions – movement (e.g. MOV, PUSH), arithmetic (e.g. SUB, ADD), logic (e.g. AND, XOR) • Taints all values written by an instruction with union of all taint labels associated with values used by that instruction. • Considering also eflags register. eax {0x6, 0x7}, ebx {0x8, 0x9} add eax, ebx eax {0x6, 0x7, 0x8, 0x9}, eflags {0x6, 0x7, 0x8, 0x9}
10 How does it work? • Dynamic taint tracing - EXAMPLE Input size is 1024 bytes “hot bytes” report: 8 int Width = get_width(input_file); 9 int Height = get_height(input_file); 10 int size = Width*Height*sizeof(int); 11 int* p = malloc(size); … 0x8048d5b: invoking malloc: [0x8,0xf] …
11 How does it work? • Dynamic taint tracing - EXAMPLE Input size is 1024 bytes checksum report: 6 if(chksum_in_file != recomputed_chksum) 7 error(); … 0x8048d4f: JZ: 1024: [0x0,0x3ff] …
12 How does it work? 2. Detectingchecksum Checksum detector: • identify potential checksum check points • the recomputed checksum value depends on many input bytes • Instruments conditional jump. Before execution, checks whether the number of marks associated with eflags register exceeds a threshold. • Problem with decompressed bytes.
13 How does it work? 2. Detectingchecksum Refinement: • Well-formedinputs can pass the checksum test, • but most malformed inputs cannot
14 How does it work? 2. Detectingchecksum Refinement: • Well-formedinputs can pass the checksum test, • but most malformed inputs cannot • Run well-formed inputs, identify the always-taken and always-not-taken instructions.
15 How does it work? 2. Detectingchecksum Refinement: • Well-formedinputs can pass the checksum test, • but most malformed inputs cannot • Run well-formed inputs, identify the always-taken and always-not-taken instructions. • Run malformed inputs, also identify the always-taken and always-not-taken instructions.
16 How does it work? 2. Detectingchecksum Refinement: • Well-formedinputs can pass the checksum test, • but most malformed inputs cannot • Run well-formed inputs, identify the always-taken and always-not-taken instructions. • Run malformed inputs, also identify the always-taken and always-not-taken instructions. • Identify the conditional jump instructions that behaves completely different when processing well-formed and malformed inputs.
17 How does it work? 2. Detectingchecksum Checksum detector: • Creates bypass rules – always-taken, always-not-taken 6 if(chksum_in_file != recomputed_chksum) 7 error(); … 0x8048d4f: JZ: 1024: [0x0,0x3ff] … 0x8048d4f: JZ: always-taken
18 How does it work? 2. Detectingchecksum Checksum detector: • Checksum field identification Input bytes that affects chksum_in_file are the checksum field. 6 if(chksum_in_file != recomputed_chksum) 7 error();
19 How does it work? 3. Directed fuzzing • Generates malformed test cases – feeds them to the original or instrumented program. • According to the bypass rules, alters the execution traces at check points – sets the eflags register.
20 How does it work? 3. Directed fuzzing • All malformed test cases are constructed based on the “hot bytes” information • Using attack heuristics: bytes that influence memory allocation are set to small, large or negative. bytes that flow into string functions are replaced by characters such as %n, %p. • Output – test cases that could cause to crash or consume 100% CPU.
21 How does it work? 3. Directed fuzzing 6 if(chksum_in_file != recomputed_chksum) 7 error(); 8 int Width = get_width(input_file); 9 int Height = get_height(input_file); 10 int size = Width*Height*sizeof(int); 11 int* p = malloc(size); … 0x8048d4f: JZ: 1024: [0x0,0x3ff] … … 0x8048d5b: invoking malloc: [0x8,0xf] … Checksum report “hot bytes” report Bypass info 0x8048d4f: JZ: always-taken
22 How does it work? 3. Directed fuzzing 6 if(chksum_in_file != recomputed_chksum) 7 error(); 8 int Width = get_width(input_file); 9 int Height = get_height(input_file); 10 int size = Width*Height*sizeof(int); 11 int* p = malloc(size); Before executing 0x8048d4f, the fuzzer sets the flag ZF in eflags to an opposite value … 0x8048d4f: JZ: 1024: [0x0,0x3ff] … … 0x8048d5b: invoking malloc: [0x8,0xf] … Checksum report “hot bytes” report Bypass info 0x8048d4f: JZ: always-taken
23 How does it work? 4. Repairing crashed samples • Fixing is expensive - fixes checksum fields only in test cases that caused crashing. • How? Cr – row data in the checksum field D – input data protected by checksum filed Checksum() – the complete checksum algorithm T – transformation We want to pass the constraint: Checksum(D) == T(Cr)
24 How does it work? 4. Repairing crashed samples Using symbolic execution to solve: Checksum(D) is a runtime determinable constant: Only Cr is a symbolic value. • Common transformations (e.g. converting from hex/oct to decimal), can be solved by existing solvers (STP). Checksum(D) == T(Cr) c== T(Cr)
25 How does it work? 4. Repairing crashed samples If the new test case cause the original program to crash, a potential vulnerability is detected!
26 evaluation An incomplete list of applications:
27 evaluation “hot bytes” identification results – memory allocation
28 evaluation Checksum identification results: Threshold = 16
29 evaluation Correct checksum fields:
30 evaluation 27 previous unknown Vulnerabilities: MS Paint Google Picasa Adobe Acrobat ImageMagick irfanview gstreamer Winamp XEmacs wxWidgets PDFlib Amaya dillo
31 evaluation Vulnerabilities detected by TaintScope:
32 Discussion • TaintScope cannot deal with secure integrity check schemes (e.g. cryptographic hash algorithms, digital signature) – impossible to generate valid test cases. • Limited effectiveness when all input data are encrypted (tracking decrypted data). • Checksum check points identification can be affected by the quality of inputs. • Not tracks control flow propagation. • Not all instructions of x86 are instrumented by the execution monitor.
33 Conclusion TaintScope can perform: • Directed fuzzing • Identify which bytes flow into system/library calls. • dramatically reduce the mutation space. • Checksum-aware fuzzing • Disable checksum checks by control flow alternation. • Generate correct checksum fields in invalid inputs.
34 questions