an experimental evaluation of continuous testing during development n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
An experimental evaluation of continuous testing during development PowerPoint Presentation
Download Presentation
An experimental evaluation of continuous testing during development

Loading in 2 Seconds...

play fullscreen
1 / 49

An experimental evaluation of continuous testing during development - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

An experimental evaluation of continuous testing during development. David Saff 2003 October 1 PAG group meeting. Overview. Continuous testing automatically runs tests in the background to provide feedback as developers code.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'An experimental evaluation of continuous testing during development' - malo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
an experimental evaluation of continuous testing during development

An experimental evaluation of continuous testing during development

David Saff

2003 October 1

PAG group meeting

overview
Overview
  • Continuous testing automatically runs tests in the background to provide feedback as developers code.
  • Previous work suggested that continuous testing would speed development
  • We performed a user study to evaluate this hypothesis
  • Result: Developers were helped, and not distracted
outline
Outline
  • Introduction
  • Tools Built
  • Experimental Design
  • Quantitative Results
  • Qualitative Results
  • Conclusion
continuous testing
Continuous Testing
  • Continuous testing uses excess cycles on a developer's workstation to continuously run regression tests in the background as the developer edits code.

developer changes code

system notifies about errors

system notified about changes

system runs tests

previous work
Previous Work
  • Monitored two single-developer software projects
  • A model of developer behavior interpreted results and predicted the effect of changes on wasted time:
    • Time waiting for tests to complete
    • Extra time tracking down and fixing regression errors
previous work findings
Previous Work: Findings
  • Delays in notification about regression errors correlate with delays in fixing these errors.
  • Therefore, quicker notification should lead to quicker fixes
  • Predicted improvement: 10-15%
introduction basic experiment
Introduction: Basic Experiment
  • Controlled human experiment with 22 subjects.
  • Each subject performed two unrelated development tasks.
  • Continuous testing has no significant effect on time worked
  • Continuous testing has a significant effect on success completing the task.
  • Developers enjoy using continuous testing, and find it helpful, not distracting.
section tools built
SECTION: Tools Built
  • TODO: Put outline here once it is solid
junit wrapper
JUnit wrapper

Wrapper

Junit

  • Reorder tests
  • Time individual tests

Test Suite

Test Suite

  • Remember results
  • Output failures immediately
  • Distinguish regressions from unimplemented tests
  • Reorder and filter result text

Results

Results

emacs plug in
Emacs plug-in
  • On file save, or 15-second pause with unsaved changes, run a “shadow” compile and test.
  • Display results in modeline:
    • “Compilation Errors”
    • “Unimpl Tests: 45”
    • “Regression Errors: 2”
  • Clicking on modeline brings up description of indicated errors.
shadow directory
Shadow directory
  • The developer’s code directory is “shadowed” in a hidden directory.
  • Shadow directory has state as it would be if developer saved and compiled right now.
  • Compilation and test results are filtered to appear as if they occurred in the developer’s code directory.
monitoring
Monitoring
  • Developers who agree to the study have a monitoring plug-in installed at the same time as the continuous testing plug-in.
  • Sent to a central server:
    • Changes to the source in Emacs (saved or unsaved)
    • Changes to the source on the file system
    • Manual test runs
    • Emacs session stops/starts
section experimental design
SECTION: Experimental Design
  • TODO: Put outline here once it is solid
participants
Participants
  • Students in MIT’s 6.170 Laboratory in Software Engineering class.

107 total students

34 agreed participants

73 non-participants

14 excluded

for logistical reasons

20 successfully

monitored

(averages both tasks)

25% (6)

no tools

25% (5)

compilation notification only

50% (9)

compilation and test error notification

demographics experience 1
Demographics: Experience (1)
  • Relatively inexperienced group of participants
demographics experience 2
Demographics: Experience (2)

Usual environment: Unix 29%, Windows 38%, both 33%

problem sets
Problem Sets
  • Participants completed (PS1) a poker game and (PS2) a graphing polynomial calculator.
test suites
Test Suites
  • Students were provided with test suites written by course staff.
  • Passing tests correctly was 75% of grade.
sources of data
Sources of data
  • Quantitative:
    • Monitored state changes
    • Student submissions
    • Grades from TA’s
  • Qualitative:
    • Questionnaire from all students
    • E-mail feedback from some students
    • Interviews and e-mail from staff
section quantitative results
SECTION: Quantitative Results
  • TODO: Put outline here once it is solid
success variables
Success Variables
  • time worked: See next slide
  • grade: as assigned by TAs.
  • errors: Number of tests that the student submission failed.
  • correct: True if the student solution passed all tests.
more variables where students spent their time

x

x

x

x

x

x

:00

:05

:10

:15

:20

:25

:30

:35

:40

:45

:50

:55

:00

x = source edit

More variables: where students spent their time
  • All time measurements used time worked, at a five-minute resolution:
  • Some selected time measurements:
    • Total time worked
    • Ignorance time
      • between introducing an error and becoming aware of it
    • Fixing
      • between becoming aware of an error and fixing it
other predictions
Other predictions
  • Problem set predicts time worked
  • On PS1 only, years of Java experience predicts correctness
  • No other interesting predictions found at the p < .05 level
  • No effect on working time seen
    • student time budgets may have had an effect.
ignorance and fix time
Ignorance and fix time
  • Ignorance time and fix time are correlated, confirming previous result.
  • Chart shown for the single participant with the most regression errors
errors over time
Errors over time
  • Participants with no tools make progress faster at the beginning, then taper off; may never complete.
  • Participants with automatic tools make steadier progress.
section qualitative results
SECTION: Qualitative Results
  • TODO: Put outline here once it is solid
how did working habits change
How did working habits change?
  • “I got a small part of my code working before moving on to the next section, rather than trying to debug everything at the end.”
  • “It was easier to see my errors when they were only with one method at a time.”
  • “The constant testing made me look for a quick fix rather than examine the code to see what was at the heart of the problem.”
positive feedback
Positive feedback
  • “Once I finally figured out how it worked, I got even lazier and never manually ran the test cases myself anymore.”
  • Head TA: “the continuous testing worked well for students. Students used the output constantly, and they also seemed to have a great handle on the overall environment.”
pedagogical usefulness
Pedagogical usefulness
  • Several students mentioned that continuous testing was most useful when:
    • Code was well-modularized
    • Specs and tests were written before development.
  • These are important goals of the class
negative comments
Negative comments
  • “Since I had already been writing extensive Java code for a year using emacs and an xterm, it simply got in the way of my work instead of helping me. I suppose that, if I did not already have a set way of doing my coding, continuous testing could have been more useful.”
  • Some didn’t understand the modeline, or how shadowing worked.
suggestions for improvement
Suggestions for improvement
  • More flexibility in configuration
  • More information about failures
  • Smarter timing of feedback
  • Implementation issues
    • JUnit wrapper filtered JUnit output, which was confusing.
    • Infinite loops led to no output.
    • Irreproducible failures to run.
    • Performance not acceptable on all machines.
section conclusion
SECTION: Conclusion
  • TODO: Put outline here once it is solid
threats to validity
Threats to validity
  • Participants were relatively inexperienced
    • 2.8 years programming experience
    • Only 0.4 with Java
    • 67% were not familiar with regression testing.
    • Can’t predict what effect of more experience would have been.
  • This was almost a worst-case scenario for continuous testing:
    • Testing was easy.
    • Regression errors were unlikely
future work
Future Work
  • We can’t repeat the experiment: continuous testing helps, and we ethically can’t deny it to some students.
  • Case studies in industry
  • Extend to bigger test suites:
    • Integrate with Delta Debugging (Zeller)
    • Better test prioritization
    • Test factoring: making small tests from big ones.
conclusion
Conclusion
  • Continuous testing has a significant effect on developer success in completing a programming task.
  • Continuous testing does not significantly affect time worked
  • Most developers enjoy using continuous testing, and find it helpful.
the end
The End
  • Thanks to:
    • Michael Ernst
    • 6.170 staff
    • participants
introduction previous work findings
Introduction: Previous Work: Findings
  • Finding 2: Continuous testing is more effective at reducing wasted time than:
    • changing test frequency
    • reordering tests
  • Finding 3: Continuous testing reduces total development time 10 to 15%
reasons cited for not participating
Reasons cited for not participating

Students could choose as many reasons as they wished.

  • Other IDE’s cited, in order of popularity:
  • Eclipse
  • text editors (vi, pico, EditPlus2)
  • Sun ONE Studio
  • JBuilder
variables that predicted participation
Variables that predicted participation
  • Students with more Java experience were less likely to participate
    • already had work habits they didn’t want to change
  • Students with more experience compiling programs in Emacs were more likely to participate
  • We used a control group within the set of voluntary participants—results were not skewed.
problem sets1
Problem Sets
  • Participants completed several classes in a skeleton implementation of (PS1) a poker game and (PS2) a graphing polynomial calculator.
test suites1
Test Suites
  • Students were provided with test suites written by course staff.
  • Passing tests correctly was 75% of grade.