260 likes | 468 Views
Delta: Heuristically Minimize “Interesting” Files delta.tigris.org. Daniel S. Wilkerson work with Scott McPeak. This quater million line file crashes my tool!. We had a quarter million line (preprocessed) C++ file that crashed our C++ front-end (Elsa)
E N D
Delta:Heuristically Minimize “Interesting” Filesdelta.tigris.org Daniel S. Wilkerson work with Scott McPeak
This quater million line file crashes my tool! • We had a quarter million line (preprocessed) C++ file that crashed our C++ front-end (Elsa) • How long would it take you to minimize that by hand? • Delta reduced it in a few hours to a page or two of code • While we did something else!
Delta Debugging Algorithm • Andreas Zeller’s Delta Debugging Algorithm • For file minimization, reduces to this: for each granularity g from 0 to log2 N • partition the file into 2g parts • for each part • test if the file minus part is still interesting • if so, permanently throw out that part • Result is “one minimal” • removing any one line will make test fail
Example: both blue needed • a • b • c • d • e • f • g • h
both blue needed: g = 0 • a • b • c • d • e • f • g • h can’t delete the box since it contains both b and e
both blue needed: g = 1 • a • b • c • d • e • f • g • h can’t delete; contains b can’t delete; contains e
both blue needed: g = 2 • a • b • c • d • e • f • g • h can delete can delete
both blue needed: g = 3 can delete • a • b • c • d • e • f • g • h can delete
both blue needed: final • a • b • c • d • e • f • g • h
You could do this manually... • and be much more clever ...but delta is often faster • I find it surprising that minimizing a file exibiting a certain behavior, brute force mostly wins over cleverness • “Computers are as dumb as hell but they go like 60” -- Richard Feynman
Do a controlled experiment • An experiment does many things • the interesting bit • and the boilerplate just to make it go • A control is another experiment • that only does the boilerplate • Do both and “subtract”; finds interesting bit gcc -c $F control: $F passes gcc &&oink $F | grep 'error:...‘ but not oink
topformflat: “explaining hierarchical structure” • To delta, a file is a sequence of lines • topformflat “explains” the nesting of C/C++ • Simple flex filter that copies input to output • but doesn’t print newlines nested deeper than a nesting-depth argument • Strategy: repeatedly minimize with increasing nesting depths
topformflat Example void foo() { for(...) { x -= 5; bar(); } while(...) { j++; } } void bar() { z |= 17; foo(); } void baz() {...}
topformflat Example, level=0 void foo() {for(...){x -= 5;bar();}while(...){j++;}} void bar() {z |= 17;foo();} void baz() {...}
topformflat Example, level=1 void foo() { for(...) { x -= 5; bar(); } while(...) { j++; } } void bar() { z |= 17; foo(); } void baz() {...} deleted
topformflat Example, level=2 void foo() { for(...) { x -= 5; bar(); } while(...) { j++; } } void bar() { z |= 17; foo(); } void baz() {...}
Science: Most bugs exhibitableby small inputs • On any input size, the result is almost always small • for C++ input to a compiler, 1-2 pages of code. • Seems to be a phenomenon of computation • there actually is Science in Computer Science! • but not always • delta worked for a week and still had 50 files • a buffer had to fill up and then flush
The “Configuration File Trick” • Delta generalizes to many situations if you • parameterize the process with a file • minimize the file. • Simon Goldsmith was instrumenting Java system binaries • “during class-loading JVM would seg-fault; nothing really comprehensible would happen” • wrote a script to read a config file for which instrumented classes to put into the jar file • use delta to minimize the config file
Simulated Annealing • Simulated Annealing • Large, non-convex sub-space • Gradient of goodness • Random local moves • likely to find another point in the sub-space • Moves parameterizable by a temperature. • Some say the ability to sometimes get worse is essential • I say: locality, randomness, and temperature
Delta as Simulated Annealing • space: files that pass your test • goodness: smaller file is better • local moves: chop out a chunk of file • note that we never “get worse” • so delta is greedy • temperature: chunk size • we have an exponential “annealing schedule”, which is not unusual, says wikipedia anyway.
Delta surprisingly effective • Especially given how ignorant and general it is • Most ideas for improvements are how to make the local moves better at staying in the space • These ideas generally require knowing what the file means. • Important point: But note how well delta already does knowing nothing! • and topformflat only knows nesting and quotes!
Improvement: use knowledge of dependencies to improve moves If you know the language semantics, reject moves that would violate it, or only make moves that would produce a legal file decl use
Fan Mail • From: Flash Sheridan • This is just a quick thank-you note for Delta. ... it immediately reduced a ... bug file from 16K lines to ten (GCC bug 22604). • Oddly enough, it initially found a different bug (22603), since I'd only specified "internal compiler error", not "segmentation fault".
Fan Mail, p.2 • From: Flash Sheridan • Delta has become even more valuable since my initial thank-you note. • I'm not sure it's helped with all of the GCC bugs I've been filing... but I couldn't have filed most of them without Delta. • Delta has always been able to find a radically smaller file, which I have been able to attach to my bug report.
Fan Mail, p.3 • From: Richard Guenther • delta is saving a lot of gcc developers life ;) I would guess 1 of 3 bugs sumitted to the gcc bugzilla get their testcase reduced using delta. • ... a little bit more accurate would be to say we're using delta to reduce all testcases from the gcc bugzilla in case they get entered unreduced.
Delta: This simple dumb script is everywhere! One class devoted to it in both Berkeley and Stanford Software Engineering Courses • Berkeley: “We've just assigned a delta-related homework to the students today” • Stanford: “I gave them a homework assignment for CS295 using delta. Feedback was positive but unquantified.” Why did it take so long to think of this simple thing?