1 / 19

Comparison of Unit-Level Automated Test Generation Tools

Comparison of Unit-Level Automated Test Generation Tools. Shuang Wang Co-authored with Jeff Offutt April 4, 2009. 1. Motivation. We have more software, but insufficient resources We need to be more efficient Frameworks like JUnit provide empty boxes

imala
Download Presentation

Comparison of Unit-Level Automated Test Generation Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparison of Unit-Level Automated Test Generation Tools Shuang Wang Co-authored with Jeff Offutt April 4, 2009 1

  2. Motivation • We have more software, but insufficient resources • We need to be more efficient • Frameworks like JUnit provide empty boxes • Hard question: what do we put in there? • Automated test data generation tools • Reduce time and effort • Easier to maintain • Encapsulate knowledge of how to design and implement high quality tests

  3. What are our criteria? • Two commercial tools • AgitarOne • JTest • Free • Unit-level • Automated test generation • Java What’s available out there? 3

  4. Experiment Goals and Design • Compare three unit level automatic test data generators • Evaluate them based on their mutation scores • Subjects • Three free automated testing tools • - JCrasher, TestGen4j, and JUB • Control groups • - Edge Coverage and Random Test • Metric • Mutant score results 4

  5. Experiment Design muJava JCrasher Mutation Score JCrasher Test Set JC TestGen4J Mutation Score TestGen4J Test Set TG JUB Mutation Score P JUB Test Set JUB Mutants Manual Random Test Set Ram Random Mutation Score Manual Edge Cover Test Set EC Edge Cover Mutation Score 5

  6. Experiment Design muJava JCrasher Mutation Score JCrasher Test Set JC TestGen4J Mutation Score TestGen4J Test Set TG JUB Mutation Score P JUB Test Set JUB Mutants Manual Random Test Set Ram Random Mutation Score Manual Edge Cover Test Set EC Edge Cover Mutation Score 6

  7. Java Programs Used 7

  8. Experiment Design muJava JCrasher Mutation Score JCrasher Test Set JC TestGen4J Mutation Score TestGen4J Test Set TG JUB Mutation Score P JUB Test Set JUB Mutants Manual Random Test Set Ram Random Mutation Score Manual Edge Cover Test Set EC Edge Cover Mutation Score 8

  9. Subjects (Automatic Test Data Generators) Control groups • Edge Coverage • one of the weakest and most basic test criterion • Random Test • the “weakest effort” testing strategy 9

  10. Experiment Design muJava Jcrasher Mutation Score JCrasher Test Set JC TestGen4J Mutation Score TestGen4J Test Set TG JUB Mutation Score P JUB Test Set JUB Mutants Manual Random Test Set Ram Random Mutation Score Manual Edge Cover Test Set EC Edge Cover Mutation Score 10

  11. muJava • Create mutants • Run tests 11

  12. Results & findings Total % Killed 12

  13. Results & findings Efficiency 13

  14. Results & findings 14

  15. Example • For vendingMachine, except for edge coverage, the other four mutation scores are below 10% • MuJava creates dozens of mutants on these predicates, and the mostly random values created by the three generators have a small chance of killing those mutants 15

  16. Example • Scores for BoundedStack were the second lowest for all the test sets except edge coverage • only two of the eleven methods have parameters. The three testing generators depend largely on the method signature, so fewer parameters may mean weaker tests 16

  17. Example • JCrasher got the highest mutation score among the three generators • JCrasher uses invalid values to attempt to “crash” the class 17

  18. Conclusion • These three tools by themselves generate tests that are very poor at detecting faults • Among public-accessible tools, criteria-based testing is hardly used • We need better Automated Test Generation Tools 18

  19. Contact Shuang Wang Computer Science Department George Mason University SWANGB@gmu.edu 18

More Related