Investigating Defect Detection in Object-Oriented Design and Cost-Effectiveness of Software Inspection

Investigating Defect Detection in Object-Oriented Design and Cost-Effectiveness of Software Inspection Giedre Sabaliauskaite Inoue Laboratory Department of Informatics and Mathematical Science Graduate School of Engineering Science Osaka University

Introduction • Software is an important part in technical products developed today • Software quality is becoming an increasingly important issue as the use of software grows • Development of high quality software continues to be a major problem, largely due to the late removal of defects • Software inspection is one of the methods to ensure the quality of software by early detection and removal of defects

Software Inspection • Inspections have been used for over 30 years • Till now, a narrow scope of research has been centred on inspection of Object-Oriented artifacts • A typical inspection process consists of six stages [1] • Two stages are critical for defect detection: • Preparation (individual inspection) • Inspection meeting • Recently, researchers question whether inspection meetings are necessary, since: • They are costly • An insignificant number of new defects is found Planning Overview Preparation Inspection meeting Rework Follow-up [1]M. Fagan, Design and Code Inspections to Reduce Errors in Program Development, IBM Systems Journal 15 (3) (1976) 182-211.

Reducing Testing Cost • Inspection and testing are two main activities used for defect detection in software products • Testing cannot be performed until software is implemented • Inspection can be applied in early stages of software development and help reduce testing cost • Inspection is an effective method to reduce testing cost [1] • Design inspection saves on average44%of testing costs • Code inspection saves on average39%of testing costs [1]L. Briand, K. El Emam, O. Laitenberger, T. Fussbroich, Using Simulation to Build Inspection Efficiency Benchmarks for Development Projects, Proceedings of the 20th International Conference on Software Engineering, Kyoto, Japan, (1998) 340-349.

Evaluation of Inspection Cost-Effectiveness • Several metrics have been previously proposed to evaluate cost-effectiveness of inspection with respect to testing cost • However, none of conventional metrics considers false positives, although their rework is costly and may introduce new defects [1] • New metrics are needed to allow more precise evaluation of inspection as compared to conventional metrics [1]C. Sauer, R. Jeffery, L. Land, P. Yetton, The Effectiveness of Software Development Technical Reviews: a Behaviorally Motivated Program of Research, IEEE Transactions on Software Engineering 26 (1) (2000) 1-14.

The Purpose of the Research This thesis addresses the following issues: • Inspection of Object-Oriented design • Development of two inspection strategies (reading techniques) • Experimental evaluation of these strategies • Usefulness of inspection meetings • Experimental evaluation of inspection meeting effectiveness • Evaluation of cost-effectiveness of inspection • The proposal of four new metrics • Experimental evaluation of proposed metrics

Thesis Outline Chapter 1. Introduction Chapter 2. Preliminaries Chapter 3. Evaluation of Two Reading Techniques for OO Design Inspection (Experiment 1) [1-1] Chapter 4. Investigating Individual and 3-Person Team Performance (Experiment 2) [1-2] Chapter 5. Assessing Inspection Meetings [1-3] Chapter 6. Extended Metrics to Evaluate Cost-Effectiveness of Software Inspections [1-4] Chapter 7. Experimental Evaluation of the New Metrics Chapter 8. Conclusions and Future Work

Development and Experimental Evaluation of Two Reading Techniques for Object-Oriented Design Inspection(Experiment 1)

Reading Techniques • Reading technique is a defect detection strategy used to help individual inspectors to find defects during preparation stage of inspection process • Several reading techniques have been proposed in literature • Non-systematic– do not offer concrete instructions on how to proceed during inspection • Systematic– provide inspectors with a scenario that gives guidance on • How to proceed • What to look for • This research adapts two existing reading techniques for inspection of OO design and experimentally evaluates them • Non-systematic technique: Checklist-Based Reading(CBR) • Systematic technique: Perspective-Based Reading(PBR)

Checklist-Based Reading (CBR) • CBR is a widely used technique in inspections [1] • It provides inspectors with Checklist[2] which consists of “yes/no” questions • Inspectors are requested to answer these questions while checking the software document for defects [1] O. Laitenberger, J.M. DeBaud, An Encompassing Life Cycle Centric Survey of Software Inspection, The Journal of Systems and Software 50 (1) (2000) 5-31. [2] Y. Chernak, A Statistical Approach to the Inspection Checklist Formal Synthesis and Improvement, IEEE Transactions on Software Engineering 22 (12) (1996) 866-874.

Checklist

Perspective-Based Reading (PBR) • PBR is a recently proposed inspection technique, which provides inspectors with more guidance as compared to CBR • The main idea of PBR is that software product should be inspected from different perspectives [1] • Perspectives depend on the roles that people have during software development process (ex. user, designer, programmer) • The union of perspectives is expected to provide an extensive coverage of the software document • For inspection using each of perspectives, PBR provides a Scenario, which consists of • Introduction into quality requirements for each perspective • Instructions on how to proceed during inspection • Questions [1]V.R. Basili, S. Green, O. Laitenberger, F. Lanubile, F.Shull, S. Sorumgard, M.V. Zelkowitz, The Empirical Investigation of Perspective-Based Reading, Empirical Software Engineering: An International Journal 1 (2) (1996) 133-164.

Scenario

Comparing Reading Techniques • The majority of work in the area of software inspection concerns testing and comparing different reading techniques • The non-systematic techniques (Ad hoc, CBR) are usually compared versus systematic techniques (PBR) • The main findings of experimental evaluations are contradictory • PBR is more effective than CBR [1,2] • CBR is more effective than PBR on an individual level[3,4] [1]V.R. Basili, S. Green, O. Laitenberger, F. Lanubile, F.Shull, S. Sorumgard, M.V. Zelkowitz, The Empirical Investigation of Perspective-Based Reading, Empirical Software Engineering: An International Journal 1 (2) (1996) 133-164. [2] O. Laitenberger, C. Atkinson, M. Schlich, K. El Emam, An Experimental Comparison of Reading Techniques for Defect Detection in UML Design Documents, The Journal of Systems and Software 53 (2000) 183-204. [3]C. Wohlin, A. Aurum, H. Petersson, F. Shull, M. Ciolkowski, Software Inspection Benchmarking – A Qualitative and Quantitative Comparative Opportunity, Proceedings of the Eighth IEEE Symposium on Software Metrics (2002) 118-127. [4] S. Biffl, M. Halling, Investigating the Influence of Inspector Capability Factors with Four Inspection Techniques on Inspection Performance, Proceedings of the Eighth IEEE Symposium on Software Metrics (2002) 107-117.

The Purpose of Experiment 1 • Experiment 1 was conducted in December 2001 • The goals of the experiment were: • Application of CBR and PBR for UML diagram inspection • Experimental comparison of CBR and PBR with respect to individual inspector • Defect detection effectiveness • Efficiency • Time spent on inspection

Development of Reading Techniques • CBR Checklist • Includes 20 “yes/no” question about Class, Activity, Sequence and Component diagrams • Negative answer to the question indicated that a defect was detected, and inspectors had to fill in the defect information into Defect registration form • PBR Scenarios • Three perspectives have been defined • User’s: ensures that software satisfies user’s requirements • Designer’s: verifies the static and dynamic structure of the system • Implementer’s: ensures that system design in consistent, complete and ready for transferring from design to code • For inspection using each of the perspectives, a scenario has been developed • Inspectors had to perform tasks in Comment form and fill information about defects into Defect registration form

Experimental Planning • Subjects: 59 third year students of Software Development course of Osaka University • Objects: UML diagrams of two software systems (Seminar and Hospital), borrowed from Itoh et al. [1] • The following material has been borrowed • Requirements specifications • Use-Case diagrams • Activity diagrams • Class diagrams • Sequence diagrams • In addition, Component diagrams were developed • The size of Seminar system documentation was 24 pages, Hospital system – 18 pages • Assignment of objects to checklist and scenarios [1] K. Itoh, T. Hirota, T. Fuji, S. Kumagai, R. Kawabata, Software Engineering Exercises, Ohmsha, 2001. (in Japanese)

Defects • Three types of defects have been inserted • Syntactic: a concept from requirements specification is omitted or included in the wrong place • Semantic: a concept from requirements specification is misinterpreted of ambiguous • Consistency: the representation of concept in one diagram disagrees with its representation in either the same or another diagram • In total 15 defects inserted into each system • Class diagrams: 3 • Activity diagrams: 4 • Sequence diagrams: 5 • Component diagrams: 3 • Requirements specifications and Use-Case diagrams were assumed to be defect-free

Experimental Design and Operation • Experimental Design • Experimental Operation • Week 1: Training session to improve students understanding of software systems • Week 2: Experiment • Explanation of the experiment activities (20 minutes) • Individual inspection (maximum 120 minutes) • Week 3: Feedback questionnaire to collect additional information from students

Threats to Validity of Experimental Results (1) There are four groups of threats to the validity of the experiment results[1]: internal, external, conclusion and construct • Internal Validity describes the extent to which research design affects the results. There might be some threats due to • Selection of inspectors • to minimize it, we randomly assigned students to reading techniques and software systems • Software systems used for inspection • to minimize it, we made sure both software systems to be similar in size and complexity • Process conformance of inspectors • the data of inspectors who did not conform to the process has been eliminated from further analysis [1]C. Wohlin, P. Runeson, M. Höst, M.C. Ohlsson, B. Regnell, A.Wesslen, Experimentation in Software Engineering: an Introduction, Kluwer Academic Publishers, 2000.

Threats to Validity of Experimental Results (2) • Conclusion Validity concerns the issues that affect the ability to draw a correct conclusion • Considered small • Construct Validity concerns the ability to generalize from the experiment results to the concepts of theory • Considered small • It can be concluded, that there were threats to validity, but they were not considered to be large in this experiment • External Validity concerns the ability to generalize the experiment results in industry practice • Students instead of practitioners were used as subjects, however they were third year of studies, close to their professional start in industry • The size of inspected software systems was smaller as compared to those used in industry, however we think it was appropriate for this experiment

Summary of Experimental Results and Conclusions • CBR and PBR are effective techniques for OO design inspection • Lead to detection of on average 70% of defects • There is no difference in defect detection effectiveness of inspectors who use PBR as compared to the inspectors who use CBR • CBR is more efficient than PBR • PBR inspectors need less time for inspection as compared to inspectors who use CBR

Investigating Inspection Meetings(Experiment 2)

Introduction • In Fagan’s original inspection process [1] • Preparation stage is used by inspectors to obtain a deep understanding of the inspection artifact • Inspection meeting stage is used by the inspectors as a group to carry out defect detection • Although defects can be detected during the preparation as well, often it is assumed that meeting allows inspectors to detect more defects • However, a series of empirical studies into the usefulness of inspection meetings question whether meetings are really necessary [1]M. Fagan, Design and Code Inspections to Reduce Errors in Program Development, IBM Systems Journal 15 (3) (1976) 182-211.

Empirical Studies into the Usefulness of Inspection Meetings • Votta (1993) suggests that inspection meetings are no longer required [1] since • The number of new defects detected at the meeting (Meeting Gains) over those found in preparation is relatively small (4% in average) • Porter et al. (1995) reported that inspection meetings suffer from process loss [2] • The defects identified by individual inspectors during preparation are not included into the list during meeting (Meeting Losses) [1]L.G. Votta Jr, Does Every Inspection Need a Meeting?, Proceedings of the 1993 ACM SIGSOFT Symposium on Foundations of Software Engineering, ACM Software Engineering Notes 18 (5) (1993) 107-114. [2]A.A. Porter, L.G. Votta, V. Basili, Comparing Detection Methods for Software Requirements Inspections: A Replicated Experiment, IEEE Transactions on Software Engineering 21 (6) (1995) 563-575.

Investigating False Positives • In addition to true defects, false positives are being detected during inspection (erroneously identified defects) • False positives do not improve software quality, since their rework is costly and may introduce more defects [1] • However, the majority of researchers do not consider them to be of great importance • Land et al. (2000) reported that inspection meetings are especially effective in eliminating false positives[2] [1] C. Sauer, R. Jeffery, L. Land, P. Yetton, The Effectiveness of Software Development Technical Reviews: a Behaviorally Motivated Program of Research, IEEE Transactions on Software Engineering 26 (1) (2000) 1-14. [2]L.P.W. Land, Software Group Review and the Impact of Procedural Roles on Defect Detection Performance, PhD Dissertation, University of New South Wales, 2000.

Purpose of Experiment 2 • Experiment 2 was conducted in July 2002 • The goals of experiment were: • Verification of the results of Experiment 1 • Further comparison of CBR and PBR techniques with respect to 3-person team • Effectiveness • Efficiency • Meeting gains and meeting losses • Investigation of usefulness of inspection meetings • The number of new defects found during meeting • The number of false positives eliminated during meeting • The effectiveness of inspection teams as compared to individual inspectors

Experimental Planning and Operation • The following elements were the same as used in Experiment 1 • Reading techniques (CBR and PBR) • Experimental objects • Experimental Subjects: 54 third year students of Software Design course • Experimental Design • Experimental Operation • Explanations of inspection activities (about 20 minutes) • Individual inspection (maximum 60 minutes) • Inspection meetings (maximum 30 minutes)

The Results of Comparison between CBR and PBR • The results of individual inspector performance confirmed the results of Experiment 1 • There is no significant difference in effectiveness between CBR and PBR • CBR is more efficient than PBR • The results of comparison between CBR and PBR 3-person teams revealed that • There is no significant difference in effectiveness and efficiency • PBR team meetings are more beneficial • The meeting losses of PBR teams are similar to meeting gains • CBR teams exhibit significantly greater meeting losses as compared to meeting gains

The Results of Investigation of Inspection Meeting Usefulness • 3-person inspection teams do not detect significant number of new defects during inspection meeting • 3-person inspection teams eliminate significant number of false positives during inspection meeting • Individual inspectors are more effective than 3-person inspection teams in defect detection

Conclusions • PBR 3-person team meetings are more beneficial than CBR 3-person team meetings • Inspection meetings are effective in • Eliminating false positives • Inspection meeting are not effective in • Detecting new defects

Evaluation of Cost-Effectiveness of Software Inspection

Introduction • Several metrics have been previously proposed to evaluate cost-effectiveness of inspection • Mc : Cost consumed by inspection / Cost saved by inspection [1] • Mk : The degree to which testing costs are reduced by inspection [2] • However, none of those metrics considers false positives, which • Are costly • May introduce new defects • This research proposes • Two cost models to describe costs of preparation and inspection meeting stages • Four new metrics to evaluate • Cost-effectiveness of preparation and inspection meeting stages • Inspection losses due to false positives [1] J.S. Collofello, S.N. Woodfield, Evaluating the Effectiveness of Reliability-Assurance Techniques, Journal of Systems and Software 9 (3) (1989) 191-195. [2] S. Kusumoto, K. Matsumoto, T. Kikuno, K. Torii, A New Metrics for Cost Effectiveness of Software Reviews, IEICE Transactions on Information and Systems E75-D (5) (1992) 674-680.

Cr Ct Inspection Testing Cost Ct Virtual testing cost Traditional Inspection Cost Model Traditional inspection cost model, which shows the relationship between inspection and testing costs [1], consists of • Cr – cost spent for inspection • Ct– cost needed for testing • Ct– testing cost saved by inspection • Virtual testing cost – testing cost if no inspections are executed [1] S. Kusumoto, K. Matsumoto, T. Kikuno, K. Torii, A New Metrics for Cost Effectiveness of Software Reviews, IEICE Transactions on Information and Systems E75-D (5) (1992) 674-680.

Cr CrDEF CrFP Ct Preparation Testing Cost CtFP CtDEF Preparation gains Preparation losses Ct Extended Cost Model for Preparation Stage of Inspection In order to evaluate the influence of false positives introduced during preparation stage, we extend traditional cost model • Preparation costs: • CrDEF– cost spent to detect actual defects • CrFP– cost spent to detect false positives • Testing costs: • CtDEF– cost needed for testing to detect remaining defects • CtFP– cost needed for testing to detect defects introduced by false positives

Extended Cost Model for Preparation and Inspection Meeting Stages When inspection process comprises both preparation and inspection meeting stages, the extended cost model can be depicted in the following way • Inspection meeting costs, spent to: • CmDEF– confirm actual defects • CrFP– confirm false positives • CmADD_DEF– detect additional defects • CmADD_FP– detect additional false positives • CmLOST_DEF– eliminate actual defects • CmELIM_FP– eliminate false positives • Testing costs: • CtADD_FP– to detect defects introduced by additional false positives • CtLOST_DEF– to detect defects eliminated during meeting • CtADD_DEF– saved by additional defects detected during meeting • CtELIM_FP– saved by false positives eliminated during meeting • CtDEF– saved by defects detected during preparation and confirmed during meeting Cr CrDEF CrFP Preparation Ct CmLOST_DEF Inspection meeting CtADD_DEF CmADD_FP CmELIM_FP CmADD_DEF CmDEF CmFP CtELIM_FP Testing Cost CtDEF CtFP CtLOST_DEF CtDEF CtADD_FP Ct Inspection meeting gains Inspection meeting losses

New Metrics • In order to evaluate inspection losses, we proposed two metrics • Preparation Losses Ml_IDV Ml_IDV = Preparation losses / Preparation gains • Inspection Meeting Losses Ml_MEET Ml_MEET = Inspection meeting losses / Inspection meeting gains • We extended cost-effectiveness metric Mk to conform to extended cost models and proposed two metrics • Extended Cost Effectiveness of Preparation Stage Mg_IDV • Extended Cost Effectiveness of Preparation and Inspection Meeting Stages Mg_MEET

Experimental Evaluation of New Metrics To demonstrate the validity of new metrics • Proposed metrics are applied along with metric Mk to the data collected from Experiment 2 • The resultant values obtained by these metrics are compared

Experimental Data • Data Collected from Experiment 2 • Included only inspection data of eighteen 3-person teams • Testing costs were calculated from the set of defects found and assumptions on the benefit of finding a defect during inspection [1] • We assumed that • A major defect detected during inspection saves 8 hours of testing [2] • A minor defect saves 1 hour of testing [2] [1]S.Biffl, W. Gutjahr, Influence of Team Size and Defect Detection Methods on Inspection Effectiveness, Proceedings of IEEE International Software Metrics Symposium, London, UK, (2001) 63-75. [2]T. Gilb, D. Graham, Software Inspection, Addison-Wesley, 1993.

Summary of Evaluation Results and Conclusions • There is a strong correlation between cost-effectiveness metric Mk and extended cost-effectiveness metrics Mg_IDV and Mg_MEET • The values of extended cost-effectiveness metricsMg_IDV and Mg_MEET are significantly smaller as compared to metric Mk • Preparation stage is more cost-effective than preparation and inspection meeting stages together when the probability that false positives will propagate into testing is small • Inspection meetings are cost-effective when false positives propagate into testing and introduce major defects • Proposed metrics enable more precise evaluation of software inspections as compared to the conventional metrics

Summary of Major Results of the Thesis • Software inspection has been applied for defect detection in object-oriented design • Two reading techniques have been developed and empirically evaluated • The usefulness of inspection meetings has been investigated • New cost models to describe inspection costs, and metrics to evaluate inspection cost-effectiveness and inspection losses have been proposed and experimentally evaluated

Conclusions • The thesis has shown the way in which software inspection can be applied for defect detection in Object-Oriented design • The reading techniques, cost models and metrics proposed in this thesis may facilitate the work of researchers and practitioners when utilizing and evaluating software inspections

Future Work • From the work carried out in this thesis there are several issues that require further investigation • Further experimental evaluation and refinement of reading techniques • Application in industrial environment • Further evaluation and extension of proposed metrics • Evaluation using industrial data • Extension of metrics to evaluate inspection of different types of artifacts: requirements, design, code • From May 2004, I am planning to continue my research in Fraunhofer IESE, Germany

The END

Individual Defect Registration Form

Team Defect Registration Form

Data Collected from Experiment 1

Summary of Data of Experiment 1

Inspector Effectiveness in Detecting Defects

Data Collected from Experiment 2

Investigating Defect Detection in Object-Oriented Design and Cost-Effectiveness of Software Inspection

Investigating Defect Detection in Object-Oriented Design and Cost-Effectiveness of Software Inspection

Presentation Transcript

Department of Mathematical Sciences

Jiaping Wang Department of Mathematical Science 03/04/2013, Monday

Jiaping Wang Department of Mathematical Science 03/06/2013, Wednesday

Rosa Navarrete-Rueda Department of Informatics and Computer Science

Department of Mathematical Sciences School of Science and Technology

Informatics and Mathematical Modelling Technical University of Denmark

Hiroyuki Inoue

Department of Ecosystems and Environmental Informatics

László Manczinger Department of Microbiology, Faculty of Science and Informatics,

Artificial Intelligence Research Laboratory Department of Computer Science

Department of Mathematical Sciences

University of Piraeus Department of Informatics

Alexander Statnikov Discovery Systems Laboratory Department of Biomedical Informatics

Department of Mathematical Sciences School of Science and Technology

Department of Informatics and Telematics

Department of economics and informatics

Shoichi Nishimura Naohiko Yatomi Department of Mathematical Information Science

BASICS OF MATLAB (Mathematical Laboratory)

Laboratory Informatics Market