1 / 29

# Today - PowerPoint PPT Presentation

Today. Random testing again Some background (Hamlet) Why not always use random testing? More YAFFS & project Grill Alex! Maybe I’ve forgotten something important CUTE: “concolic” testing. Random Testing. “Random testing is, of course, the most used and least useful method”

Related searches for Today

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Today' - nemo

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

• Random testing again

• Some background (Hamlet)

• Why not always use random testing?

• More YAFFS & project

• Grill Alex!

• Maybe I’ve forgottensomething important

• CUTE: “concolic” testing

• “Random testing is, of course, the most used and least useful method”

• Original slang meaning of “random” to mean “wrong” or “disorganized and useless”

• We mean random in the mathematical sense

• Take a stream of pseudo-random numbers and map them into test operations/cases

• Hamlet talks about one advantage of random testing (that often doesn’t really appear):

• With random testing and an operational profile giving usage patterns for the program, with probabilities

• random testing can establish statistically meaningful estimates of program reliability

“In program testing, with systematic methods we know what we are doing, but not what it means; only by giving up all systematization can the significance of testing be known.” - Hamlet, “Random Testing”

• Can make statements like:

• “It’s 99% certain that P will fail no more than 1 in 1,000,000 times.”

• “It’s 95% certain that P has a mean-time-to-failure greater than 100 hours of operation.”

• Real statistics!

• Sadly, usable operational profiles with probabilities attached are very rare

• And the numbers mean nothing if the profile is something you make up

• Hamlet also notes that random testing is a good “baseline” for other methods to compare to

• Keeps us honest

• If systematic is no better, then it may not be a very good approach

• What’s good about 80% (no loop) path coverage?

“If, on the other hand, a comparison with random testing as the standard were available, it might help us to write better standards, or to improve the significance of systematic methods.” - Hamlet, “Random Testing”

• Two cases “when only random testing will do” (Hamlet, Workshop on Random Testing 06)

• Well, maybe not only random testing

• Cases where systematic testing is meaningless (no plan has a rational basis)

• Cases where systematic testing is too difficult to carry out

• Hamlet emphasizes the dangers of adding systematic choice without justification: confusing what software should do with what it does do

• Danger of ignoring a test case because

• “Oh come on, it couldn’t possibly fail to handle that correctly” or

• “Nobody would ever do that”

• Compare to game theory: cases where if we really know something about opponent’s play we can take advantage

• But, lacking that, random strategy may be “inefficient” but is the only strategy that cannot be “gamed” if opponent knows what we’re up to

This is not to imply that programs we testare adversaries, “out to get us” – but it’ssometimes useful to act as if they are

• Why not use random testing for everything?

• Oracle problem: figuring out if a random test is successful is often much harder than with a systematic test

• Sometimes we can’t do differential testing

• Why not use random testing for everything?

• Generation problem: how do we make a random input?

• What, exactly, is a random C program?

• Is a random C program going to fit any sane (but unknown) operational profile?

• Are these the bugs we care about most?

• For some programs, producing well-formed input that makes for interesting tests is fundamentally hard

• Why not use random testing for everything?

• Even with feedback, produces lots of redundant or uninteresting operations

• Not good at testing boundary conditions where the boundaries are drawn from a large range

• If the program only breaks when x = 2^31 don’t expect to find that randomly

• Why not use random testing for everything?

• Related problem: not good when an error depends on an unlikely relationship between inputs

• Program only fails when x + y = MAXINT?

• Good luck finding that if you don’t bake it into the “random” tester explicitly. . .

• Project due date: May 13

• What to submit:

• Test report

• Document, preferably a pdf

• Tester

• More on how to submit in a second

• Two buggy versions of YAFFS

• Submit as .c (or .h I guess) file, where the name is original_yaffs_name.login.bug#.c

• And two test cases (more on this too)

• Give me a tarball

• I want to be able to go to a YAFFS install

cd direct

make clean; make

./directtest2k

• And see it run

• Use whatever language you see fit, so long as that holds true

• Admittedly, if I can’t make head or tail of your tester (say it’s in FORTRAN or unlambda), grading it fairly will be harder

• If YAFFS passes the test, your tester should terminate with error code 0 and print (on standard output) the string:

• TEST SUCCESSFULLY COMPLETED

• If YAFFS fails terminate with code 2 and print (again, on stdout):

• TEST FAILED

• See my (very) stupid tester on the website

• I’ll count it as a case where you find a bug in YAFFS if the program hangs:

• Hasn’t terminated by the time limit of 60 minutes

• Is not producing any new output

• Sanity check

• I’m going to make sure none of your testers say “TEST FAILED” or hang when run with the original YAFFS

• So let me know if you have found a YAFFS bug

• If you want to use a script to have your “directtest2k” run another tool on YAFFS, and then parse the output to produce that result, that’s ok with me

• It’s worth some points, but not strictly required, that your tool also be able to produce a test case when a test fails – something more specific than “run the tester”

• Bonus if you include delta-debugging tools for your test case format

• C programs (or python scripts) are very nice test cases, and easily delta-debuggable

• Document your test case format and why you chose it in the test report

• Again, any format you like, so long as I can replay and see exactly what to do with YAFFS to produce the bug

• One-minimal test cases are worth more credit

• Oh, I forgot to mention – your bugs should be ones that my stupid tester can’t find

• Shouldn’t be hard, I give some examples for you to look at

• Use the same output

• TEST SUCCESSFULLY COMPLETED

• vs.

• TEST FAILED

• I know our hardware will vary

• If you want, send me early versions of your tester and I’ll try to run them on my machine and let you know

• If it works

• How long it takes to run

• You must test these functions:

• yaffs_StartUp

• yaffs_mount

• yaffs_unmount

• yaffs_open

• yaffs_write

• yaffs_close

• yaffs_mkdir

• yaffs_rmdir

• Can use other functions to figure out what’s going on with YAFFS

• Might make it easier to find some bugs

• But only use these basics in the test cases you submit for your bugs – make sure the bug can be exposed using only the core operations!

• For open, need to test these options:

• O_TRUNC, O_APPEND, _O_RDONLY, O_WRONLY, O_EXCL, O_CREAT, O_RDWR

• Perform all tests on /ram2k

• Use my replacement yaffscfg2k.c

• On the website

• Might want to look at CUTE and SPLAT (links on the web page)

• Warning: academic software, don’t expect it to work (I’m having difficulties right now)

• CIL is a very useful tool if your testing ambitions involve instrumenting the code somehow (http://hal.cs.berkeley.edu/cil)

• E.g., want to compute path coverage? Instrument every branch with a bit vector insertion

• Lots of other tools out there for testing

• Look around – you might find something useful that will save you a lot of work