1 / 64

A Systematic Study of Automated Program Repair: Fixing 55 out of 105 bugs for $8 each

A Systematic Study of Automated Program Repair: Fixing 55 out of 105 bugs for $8 each. Claire Le Goues. Michael Dewey-Vogt. Stephanie Forrest. Westley Weimer. “Everyday, almost 300 bugs appear […] far too many for only the Mozilla programmers to handle.” – Mozilla Developer, 2005.

max
Download Presentation

A Systematic Study of Automated Program Repair: Fixing 55 out of 105 bugs for $8 each

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Systematic Study of Automated Program Repair: Fixing 55 out of 105 bugs for $8 each • Claire Le Goues • Michael Dewey-Vogt • Stephanie Forrest • Westley Weimer http://genprog.cs.virginia.edu

  2. “Everyday, almost 300 bugs appear […] far too many for only the Mozilla programmers to handle.” • – Mozilla Developer, 2005 • Annual cost of software errors in the US: $59.5 billion (0.6% of GDP). • Average time to fix a security-critical error: 28 days. Problem: Buggy Software 10%: Everything Else 90%: Maintenance http://genprog.cs.virginia.edu

  3. How bad is it? http://genprog.cs.virginia.edu

  4. http://genprog.cs.virginia.edu

  5. http://genprog.cs.virginia.edu

  6. …Really? • Tarsnap: 125 spelling/style 63 harmless 11 minor • 1 major • 75/200 = 38% TP rate • $17 + 40 hours per TP http://genprog.cs.virginia.edu

  7. …Really? • Tarsnap: 125 spelling/style 63 harmless 11 minor • 1 major • 75/200 = 38% TP rate • $17 + 40 hours per TP http://genprog.cs.virginia.edu

  8. …Really? http://genprog.cs.virginia.edu

  9. Solution: Pay Strangers http://genprog.cs.virginia.edu

  10. Solution: Pay Strangers http://genprog.cs.virginia.edu

  11. Solution: Automate http://genprog.cs.virginia.edu

  12. Automated Program Repair GenProg: automatic1, scalable, competitive bug repair. 1C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer, “GenProg: A generic method for automated software repair,” Transactions on Software Engineering, vol. 38, no. 1, pp. 54– 72, 2012. W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest, “Automatically finding patches using genetic programming,” in International Conference on Software Engineering, 2009, pp. 364–367. http://genprog.cs.virginia.edu

  13. Automated Program Repair GenProg: automatic1, scalable, competitive bug repair. 1C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer, “GenProg: A generic method for automated software repair,” Transactions on Software Engineering, vol. 38, no. 1, pp. 54– 72, 2012. W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest, “Automatically finding patches using genetic programming,” in International Conference on Software Engineering, 2009, pp. 364–367. http://genprog.cs.virginia.edu

  14. Automated Program Repair GenProg: automatic, scalable, competitive bug repair. http://genprog.cs.virginia.edu

  15. Automated Program Repair GenProg: automatic, scalable, competitive bug repair. http://genprog.cs.virginia.edu

  16. Automated Program Repair GenProg: automatic, scalable, competitive bug repair. http://genprog.cs.virginia.edu

  17. INPUT EVALUATE FITNESS DISCARD ACCEPT OUTPUT MUTATE

  18. INPUT EVALUATE FITNESS DISCARD ACCEPT OUTPUT MUTATE

  19. Bird’s Eye View http://genprog.cs.virginia.edu • Search: random (GP) search through nearby patches. • Approach: compose small random edits. • Where to change? • How to change it?

  20. Input: 1 2 4 3 7 5 6 9 10 8 11 12 http://genprog.cs.virginia.edu

  21. Input: 1 2 4 3 7 5 6 Legend: High change probability. Low change probability. Not changed. 9 10 8 11 12 http://genprog.cs.virginia.edu

  22. 1 2 4 3 7 5 6 An edit is: • Replace statement X with statement Y • Insert statement X after statement Y • Delete statement X 9 10 8 11 12 http://genprog.cs.virginia.edu

  23. 1 2 4 3 7 5 6 An edit is: • Replace statement X with statement Y • Insert statement X after statement Y • Delete statement X 9 10 8 11 12 http://genprog.cs.virginia.edu

  24. 1 2 4 3 7 5 6 An edit is: • Replace statement X with statement Y • Insert statement X after statement Y • Delete statement X 9 10 8 11 12 http://genprog.cs.virginia.edu

  25. 1 2 4 3 7 5 6 An edit is: • Replace statement X with statement Y • Insert statement X after statement Y • Delete statement X 9 10 8 11 12 http://genprog.cs.virginia.edu

  26. 1 2 4 3 4 7 5 6 An edit is: • Replace statement X with statement Y • Insert statement X after statement Y • Delete statement X 9 10 8 11 12 http://genprog.cs.virginia.edu

  27. 1 2 4 3 4 7 5 6 An edit is: • Replace statement X with statement Y • Insert statement X after statement Y • Delete statement X 9 10 8 11 12 http://genprog.cs.virginia.edu

  28. 1 2 4 3 4 7 5 6 An edit is: • Replace statement X with statement Y • Insert statement X after statement Y • Delete statement X 9 10 4’ 11 12 http://genprog.cs.virginia.edu

  29. 1 2 4 3 4 7 5 6 An edit is: • Replace statement X with statement Y • Insert statement X after statement Y • Delete statement X 9 10 4’ 11 12 http://genprog.cs.virginia.edu

  30. Automated Program Repair GenProg: automatic, scalable, competitive bug repair. http://genprog.cs.virginia.edu

  31. Automated Program Repair GenProg: automatic, scalable, competitive bug repair. http://genprog.cs.virginia.edu

  32. Scalable: Search Space 1 2 4 3 5 6 7 9 8 10 11 12 http://genprog.cs.virginia.edu http://genprog.cs.virginia.edu 32 http://genprog.cs.virginia.edu 32

  33. Scalable: Search Space 1 2 4 3 5 6 7 9 8 10 11 12 http://genprog.cs.virginia.edu http://genprog.cs.virginia.edu 33 http://genprog.cs.virginia.edu 33

  34. Scalable: Search Space 1 2 4 3 5 6 7 9 8 10 11 12 http://genprog.cs.virginia.edu http://genprog.cs.virginia.edu 34 http://genprog.cs.virginia.edu 34

  35. Scalable: Search Space 1 2 4 3 5 6 7 9 8 10 11 12 http://genprog.cs.virginia.edu http://genprog.cs.virginia.edu 35 Fix localization: intelligently choose code to move. http://genprog.cs.virginia.edu 35

  36. Scalable: representation 1 2 3 New: Naïve: Delete(3) 4 5 5’ 1 5 2 1 4 2 Replace(3,5) 4 5 http://genprog.cs.virginia.edu Input:

  37. Scalable: representation 1 2 3 New: Naïve: Delete(3) 4 5 New fitness, crossover, and mutation operators to work with a variable-length genome. 5’ 1 5 2 1 4 2 Replace(3,5) 4 5 http://genprog.cs.virginia.edu Input:

  38. Scalable: Parallelism • Fitness: • Subsample test cases. • Evaluate in parallel. • Random runs: • Multiple simultaneous runs on different seeds. http://genprog.cs.virginia.edu

  39. Automated Program Repair GenProg: automatic, scalable, competitive bug repair. http://genprog.cs.virginia.edu

  40. Automated Program Repair GenProg: automatic, scalable, competitive bug repair. http://genprog.cs.virginia.edu

  41. How manybugs can GenProg fix? • How much does it cost? Competitive http://genprog.cs.virginia.edu

  42. Setup http://genprog.cs.virginia.edu • Goal: systematically test GenProg on a general, indicative bug set. • General approach: • Avoid overfitting: fix the algorithm. • Systematically create a generalizable benchmark set. • Try to repair every bug in the benchmark set, establish grounded cost measurements.

  43. Setup http://genprog.cs.virginia.edu • Goal: systematically evaluate GenProg on a general, indicative bug set. • General approach: • Avoid overfitting: fix the algorithm. • Systematically create a generalizable benchmark set. • Try to repair every bug in the benchmark set, establish grounded cost measurements.

  44. Challenge: Indicative Bug set http://genprog.cs.virginia.edu

  45. Systematic Benchmark Selection • Goal: a large set of important, reproduciblebugs in non-trivialprograms. • Approach: use historical data to approximate discovery and repair of bugs in the wild. http://genprog.cs.virginia.edu

  46. Systematic Benchmark Selection http://genprog.cs.virginia.edu • Consider top programs from SourceForge, Google Code, Fedora SRPM, etc: • Find pairs of viable versions where test case behavior changes. • Take all tests from most recent version. • Go back in time through the source control. • Corresponds to a human-written repair for the bug tested by the failing test case(s).

  47. Benchmarks http://genprog.cs.virginia.edu

  48. Benchmarks http://genprog.cs.virginia.edu

  49. Challenge: Grounded Cost Measurements http://genprog.cs.virginia.edu

  50. http://genprog.cs.virginia.edu

More Related