1 / 27

Nov 6, 2008 Presented by Amy Siu and EJ Park

An Empirical Study of Regression Test Selection Techniques Todd L. Graves, Mary Jean Harrold , Jung-Min Kim, Adam Porter, and Gregg Rothermel ACM TSE2001. Nov 6, 2008 Presented by Amy Siu and EJ Park. What is Regression Testing?. Application Release 1. Application Release 2. R1 Test

lark
Download Presentation

Nov 6, 2008 Presented by Amy Siu and EJ Park

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Empirical Study of Regression Test Selection TechniquesTodd L. Graves, Mary Jean Harrold, Jung-Min Kim, Adam Porter, and Gregg RothermelACM TSE2001 Nov 6, 2008 Presented by Amy Siu and EJ Park

  2. What is Regression Testing? Application Release 1 Application Release 2 R1 Test Cases R1 Test Cases R2 Test Cases • Validate modified software • Often with existing test cases from previous release(s) • Ensure existing features are still working Regression testing is expensive!

  3. What is Regression Test Selection? • A strategy to • Minimize the test suite • Maximize fault detection ability • Considerations and trade-offs • Cost to select test cases • Time to execute test suite • Fault detection effectiveness

  4. ProblemStatement • Regression test case selection techniques affect the cost-effectiveness of regression testing • Empirical evaluation of 5 selection techniques • No new technique proposed

  5. Formal Definition of Regression Testing • Programs: P, P' • Test suite: T • Test cases: T’ ⊆ T • New test cases: T'' for P‘ • New test suite: T''' for P’ including selection from T’ P Application Release 2 P' Application Release 1 R1 Test Cases T T' T'' T''' Regression test selection problem

  6. State of the Art • 5 test case selection techniques • Minimization • Dataflow • Safe • Ad Hoc / Random • Retest-All

  7. Regression Test Selection Technique- Minimization • Select minimal sets of test cases T' • Only cover modified or affected portions of P • '81 Fischer et. al • '90 Hartman and Robson Minimization Dataflow Safe Ad Hoc / Random Retest-All

  8. Regression Test Selection Technique- Dataflow • Select test cases T' that exercise data interactions that have been affected by modifications in P' • '88 Harrold and Soffa • '88 Ostrand and Weyuker • '89 Taha et. al Minimization Dataflow Safe Ad Hoc / Random Retest-All

  9. Regression Test Selection Technique- Safe • Guarantee that T' contains all test cases in T that can reveal faults in P' • '92 Laski and Szermer • '94 Chen et. al • '97 Rothermel and Harrold • '97 Vokolos and Frankl Minimization Dataflow Safe Ad Hoc / Random Retest-All

  10. Regression Test Selection Technique- Ad Hoc / Random • Select T' based on hunches, or loose associations of test cases with functionality Minimization Dataflow Safe Ad Hoc / Random Retest-All

  11. Regression Test Selection Technique- Retest-All • “Select” all the test cases in T to test P' Minimization Dataflow Safe Ad Hoc / Random Retest-All

  12. Open Questions • How techniques differ? • The ability to reduce regression testing cost • The ability to detect faults • Trade-offs between test size reduction and fault detection • The Cost-effectiveness comparison • Factors affect the efficiency and effectiveness of test selection techniques

  13. Modeling Costs • Calculating the cost of RST (Regression Test Selection) Techniques • They measure • Reduction of E(T’) by calculating the size reduction • Average of A by simulating on several machines A: The cost of analysis required to select test cases E(T’): The cost of executing and validating the selected test cases

  14. Modeling Fault-Detection Effectiveness • On a Per-Test-Case Basis • Effectiveness = # of test cases revealing fault of P’ in T, but not in T’ • On a Per-Test-Suite Basis • Classify the result of test selection • No test case in T is fault revealing then T’ too, or • Some test cases in T and T’ both revealing fault, or • Some test cases in T is revealing fault, but not in T’. • Effectiveness = 1 – (% of no fault revealing test cases) Their choice

  15. Programs Programs • Programs: All C programs • The Siemens Programs: 7 C programs • Space: Interpreter for an array definition language • Player: Subsystem of Empire (Internet game) How do the authors create test pool and suite? Faulty version

  16. Programs Test Pool Design Test Pool Design • Siemens Programs • Constructing test pool of black-box test cases from Hutchins et al. • Adding additional white-box test cases • Space • 10000 test cases from Vokolos and Frankl, randomly generated • Adding new test cases from executing CFG • Player • 5 different unique version of player – named “base” version • Creating own test cases from Empire information files

  17. Programs Siemens / Space Player Test Pool Design command1 command1 Siemens: 0.06%~19.77% Space: 0.04%~94.35% Player: 0.77%~4.55% Test Suite Design command2 … P1 … Random Selection P8 … TC1 TC2 TC3 … Test Pool Test Suites for each program TC1 TC2 TC3 Tp(E) … … … Random Number Generator

  18. RTS Techniques • Minimization • Created simulator tool • Dataflow • Simulating dataflow testing tool • Def-use pairs affected by modification • Safe • DejaVu: Rothermel and Harrold’s RTS algorithm • Detect “dangerous edge” • Aristole: program analysis system • Random: n % of test cases from T randomly  Only for Siemens

  19. Experimental Design • Variables • Independent • 9 Programs (Siemens, Space and Player) • RTS technique (safe, dataflow, minimization, random(25, 50, 75), retest-all • Test suite creation criteria • Dependent • The average reduction in test suite size • Fault detection effectiveness • Design • Test suites: 100 coverage-based + 100 random

  20. Threats to Validity • Internal • Instrumentation effects can bias results  They run each test selection algorithm on each test suite and each subject program • External • Limitation to generalize results to industrial practice • Small size/simple fault pattern of test programs • Only for corrective maintenance process • Construct • Adequate measurement • Cost and effectiveness measurement is too coarse!

  21. Analysis Strategy • Comparison1 • Test Size Reduction • Fault Detection Effectiveness • Comparison2 • Program Analysis Based Techniques • minimization, safe, and data-flow • Random Technique

  22. Random Techniques: Constant percentage of test cases Safe and Dataflow: Similar behavior on Siemens Minimization: Always choose 1 test case Safe: Best on Space and Player Test Suite Size Reduction

  23. Random Techniques: Increase rate diminished as size increased. Random Techniques: Effectiveness increased by test suit size Safe & Dataflow: Similar median performance on Siemens Minimization: overall had the lowest effectiveness Fault Detection Effectiveness

  24. Cost Benefit Trade-Offs • Minimization vs. Random • Assumption: k value = analysis time • Comparison Method • Start from a trial value of k • Choose test suite from minimization • Choose |Test suite| + k test suits from random • Adjust k until the effectiveness is equal • Comparison Result • For coverage-based test suite: k = 2.7 • For random test suite: k = 4.65 • Safe vs. Random • Same assumption about k • Find k to make fixed 100(1-p)% of fault detect of Random techniques • Comparison Results • Coverage-based  k =0, 96.7%  k = 0.1, 99% • Random  k = 0, 89%  k = 10, 95%  k = 25, 99% • Random Techniques • Effective general • Selection Ratio ↑ •  Effectiveness ↑ •  Increase Rate ↓ • Minimization • Reduction is very high • Various Effectiveness • Safe • 100% Effectiveness • Various Test Suite Size • Dataflow • 100% Effectiveness too •  Not safe • Safe vs. Retest-all • When Safe is desirable? • Analysis cost is less than running the unselected test cases • Test suite reduction depends on program

  25. Conclusions of Experimental Studies • Minimization • Smallest code size but least effective • “on the average” applies to long-run behavior • The number of test cases to choose depends on run-time • Safe and Dataflow • Nearly equivalent average behavior in cost-effective • Safe is better than Dataflow, why? • When dataflow is useful? • Better analysis required for Safe • Random • Constant percentage of size reduction • Size ↑, fault detect effectiveness ↑ • Retest-All • No size reduction, 100% fault detect effectiveness

  26. Future Works & Discussion • Improve Cost Model with Other Factors • Extend analysis to Multiple Types of Faults • Develop Time-Series-Based Models • Scalability with More Complex Fault Distribution With more factors [3],[4] Multiple Types of Faults [10] Test Prioritization [2] Java Software[1] Larger Software[7] Improved Cost Model [9] Current Paper 2003 2005 2 papers 4 papers 2004 2006 2007 2008 2001 2002 Larger and complex Software[8] Using Field Data [5],[6]

  27. References [1] Mary Jean Harrold, James A. Jones, Tongyu Li, Donglin Liang, Alessandro Orso, Maikel Pennings, Saurabh Sinha, Steven Spoon, “Regression Test Selection for Java Software”, OOPSLA 2001, October 2001. [2] Jung-Min Kim , Adam Porter, “A history-based test prioritization technique for regression testing in resource constrained environments”, 24th International Conference on Software Engineering, May 2002. [3] A. G. Malishevsky, G. Rothermel, and S. Elbaum, “Modeling the Cost-Benefits Tradeoffs for Regression Testing Techniques”, Proceedings of the International Conference on Software Maintenance, October 2002. [4] S. Elbaum, P. Kallakuri, A. Malishevsky, G. Rothermel, and S. Kanduri, “Understanding the Effects of Changes on the Cost-Effectiveness of Regression Testing Techniques”, Technical Report 020701, Department of Computer Science and Engineering, University of Nebraska -- Lincoln, July 2002 [5] Alessandro Orso, Taweesup Apiwattanapong, Mary Jean Harrold, “Improving Impact Analysis and Regression Testing Using Field Data”. RAMSS 2003, May 2003. [6] Taweesup Apiwattanapong, Alessandro Orso, Mary Jean Harrold, “Leveraging Field Data for Impact Analysis and Regression Testing”, ESEC9/FSE11 2003, September 2003. [7] Alessandro Orso, Nanjuan Shi, Mary Jean Harrold, “Scaling Regression Testing to Large Software Systems”, FSE 2004, November 2004. [8] J. M. Kim, A. Porter, and G. Rothermel, “An Empirical Study of Regression Test Application Frequency”, Journal of Software Testing, Verification, and Reliability, V. 15, no. 4, December 2005, pages 257-279. [9] H. Do and G. Rothermel, “An Empirical Study of Regression Testing Techniques Incorporating Context and Lifecycle Factors and Improved Cost-Benefit Models”, FSE2006, November 2006 [10] H. Do and G. Rothermel, “On the Use of Mutation Faults in Empirical Assessments of Test Case Prioritization Techniques”, IEEE Transactions on Software Engineering, V. 32, No. 9, September 2006, pages 733-752

More Related