Classifying Software Faults to Improve Fault Detection Effectiveness

Classifying Software Faults to Improve Fault Detection Effectiveness Technical Briefing NASA OSMA Software Assurance Symposium September 9-11, 2008 Allen P. Nikora, JPL/Caltech This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology under a contract with the National Aeronautics and Space Administration. The work was sponsored by the NASA Office of Safety and Mission Assurance under the Software Assurance Research Program led by the NASA Software IV&V Facility. This activity is managed locally at JPL through the Assurance and Technology Program Office SAS08_Classify_Defects_Nikora

Agenda • Problem/Approach • Relevance to NASA • Accomplishments and/or Tech Transfer Potential • Related Work • Technology Readiness Level • Data Availability • Impediments to Research or Application • Next Steps SAS08_Classify_Defects_Nikora

Problem/Approach • All software systems contain faults • Different types of faults exhibit different types of failure behavior • Different types of faults require different identification techniques • Some faults are easier to find than others. • Likelihood of detecting and removing software faults during development and testing, as well as the possible strategies for dealing with residual faults during mission operations depend on the fault type. • Goals are to • Determine the relative frequencies of specific types of faults and to identify trends in those frequencies • Develop effective techniques for identifying and removing faults or making their effects. • Develop guidelines, based on the analysis of faults and failures, for applying the techniques based on the context of current and future missions. SAS08_Classify_Defects_Nikora

Problem/Approach (cont’d) • What must be done? • Analyze software failure data (test and operations) from historical, current JPL and NASA missions and classify the underlying software faults. • Further classify the faults by criticality (e.g., non-critical, significant mission impact, mission critical), and detection phase. • Perform statistical analysis • Proportions of faults of each category. • Conditional frequencies (e.g., percentage of critical faults among aging-related bugs, percentage of aging-related bugs among the critical faults). • Trends in conditional frequencies (within and across missions). • Determine criteria for further classifying faults (e.g., for the aging-related bugs: faults causing round-off errors, faults causing memory leaks, etc.) to identify classes of faults with high criticality and low detectability. • For highly critical faults that are difficult to detect prior to release, develop techniques for: • Identifying component(s) most likely to contain these types of faults. • Improving the detectability of the faults with model-based verification or static analysis tools, as well as during testing. • Masking the faults via fault-tolerance (e.g, software rejuvenation for aging-related faults) Such techniques must be able to accurately distinguish between behavioral changes resulting from normal changes in the system’s operating environment input space and those brought about by aging-related faults. • Develop guidelines for implementing techniques in the context of current, future missions. SAS08_Classify_Defects_Nikora

Relevance to NASA • Different types of faults have different types of effects. Choose fault identification/mitigation strategies based on types of failures encountered in system being developed • Bohrbugs • Deterministically cause failures • Easiest to find during testing • Fault-tolerance of the operational system can mainly be achieved with design diversity • Mandelbugs • difficult to find, isolate, and correct during testing • Re-execution of an operation that failed because of a Mandelbug will generally not result in another failure • Fault-tolerance can be achieved by simple retries or more sophisticated approaches like checkpointing, and recovery-oriented computing • Aging-related • Tendency of causing a failure increases with the system run-time • Proactive measures that clean the internal system state (software rejuvenation) and thus reduce the failure rate are useful • Aging can be a significant threat to NASA software systems, (e.g., continuously operating planetary exploration spacecraft flight control systems), since aging-related faults are often difficult to find during development • Related work • Rejuvenation has been implemented in many different kinds of software systems, including telecommunications system], transaction processing systems], and cluster servers • Various types of software systems, like web servers and military systems, have been found to age SAS08_Classify_Defects_Nikora

Accomplishments and/or Tech Transfer Potential • Collected over 40,000 failure records from JPL problem reporting system • Operational failures and failures observed during system test, ATLO operations • All failures (software and non-software) • Over 2 dozen projects represented • Planetary exploration • Earth-orbiter • Instruments • Continued analysis of software failures • Classified flight software failures for 18 projects • Classification of ground software failures for same 18 missions in progress • Completed statistical analysis of flight software failure data • Started application of machine learning/data mining techniques to improve classification accuracy: • Software vs. non-software failures • Types of software failures • Supervised vs. unsupervised learning SAS08_Classify_Defects_Nikora

Related Work • Working with Software Quality Initiative project at JPL to analyze software failure reports across multiple projects • Work performed includes: • Applying software reliability modeling techniques to multiple sets of failure data for a current flight project to determine. • Investigating use of text mining/machine learning techniques for discriminating between ground software anomalies encountered during test and ground software anomalies encountered during operations • Both types of anomalies reported using same problem reporting system; operations vs. test anomalies not consistently reported. SAS08_Classify_Defects_Nikora

Potential Applications • Improved SW reliability, SW development practices for NASA software systems • Identify/develop most appropriate techniques for identifying most common types of defects • Identify/develop appropriate techniques for preventing introduction of most common types of defects • Identify/develop fault mitigation techniques for most common types of defects • Applicable to mission/safety-critical software systems • Human-rated systems • Robotic planetary exploration systems • Critical ground support systems (e.g., planning and sequencing, navigation, engineering analysis) SAS08_Classify_Defects_Nikora

Technology Readiness Level • Current • 3 (defect classification) • Target • Level 3 • “Analytical and experimental critical function and/or characteristic proof-of-concept achieved in a laboratory environment” • Limited functionality • Small representative datasets SAS08_Classify_Defects_Nikora

Data Availability • Data collected from JPL Problem Reporting system • Failure reports during test, operations for current, historical missions going back to Voyager • Over 40,000 failure reports • Software and non-software • Flight vs. ground software • Detailed descriptions of observed problem, analysis and verification, and corrective action in PR database. • Additional problem reports available for DSN software • Software-related failures are tagged by a “Cause” field • However, analysis to date indicates that “Cause” field is not reliable. For a limited sample of problem reports • All problems identified as “Software” are software-related, but • Some problems identifies as non-software are also software • For a subset of problem reports analyzed by Nelson Green et al., analysis indicates that relying on “Cause” field to identify software anomalies may undercount them by a factor of 4-6. SAS08_Classify_Defects_Nikora

Impediments to Research or Application • More software failures may have occurred than what is documented in problem reporting database • Will require detailed analysis of a statistically significant subset of database to obtain more accurate counts of different types of failures • Currently applying text mining/machine learning techniques to • Identify software failures • Classify identified software failures by type • Potential issue involves computation time required to complete an experimental learning run. Example – applying 34 WEKA machine learners to Nelson Green data set to distinguish between SW and non-SW anomalies took approximately 3 weeks per data set. Experiments for remaining time will have to be designed to make most effective use of currently available results so as to minimize computation time required. SAS08_Classify_Defects_Nikora

Next steps • Complete analysis of failures • Complete analysis of ground software ISAs by end of September, 2008. • Complete statistical analyses for all failures to identify trends: • Proportions of software failures • Proportions of Bohrbugs vs. Mandelbugs vs. aging-related bugs • Complete experiments with machine learning/data mining; identify most appropriate failure data representations and learning models to distinguish between: • Software and non-software failures – find additional software failures in problem reporting system and classify them. Can improve accuracy of software failure type classification • Different types of software failures • Based on analyses of proportions and trends in failure data, identify/develop appropriate fault prevention/mitigation strategies (e.g., software rejuvenation) • Other software improvement/defect analysis tasks and organizations at JPL have expressed interest in collaborating with this effort: • JPL Software Product and Process Assurance Group • JPL Software Quality Improvement project SAS08_Classify_Defects_Nikora

Technical Details

Fault Classifications • Classification Scheme: • The following definitions of software fault types are based on [Grottke05a, Grottke05b]: • Mandelbug := A fault whose activation and/or error propagation are complex, where “complexity” can either be caused by interactions of the software application with its system-internal environment (hardware, operating system, other applications), or by a time lag between the fault activation and the occurrence of a failure. Typically, a Mandelbug is difficult to isolate, and/or the failures caused by it are not systematically reproducible. (Sometimes, Mandelbugs are – incorrectly – referred to as Heisenbugs.) • Bohrbug := A fault that is easily isolated and that manifests consistently under a well-defined set of conditions, because its activation and error propagation lack “complexity.” Complementary antonym of Mandelbug. • Aging-related bug := A fault that leads to the accumulation of internal error states, resulting in an increased failure rate and/or degraded performance. Sub-type of Mandelbug. • According to these definitions, the classes of Bohrbugs, aging-related bugs, and non-aging-related Mandelbugs partition the space of all software faults. • References: • [Grottke05a] M. Grottke and K. S. Trivedi, “Software faults, software aging and software rejuvenation,” Journal of the Reliability Engineering Association of Japan 27(7):425–438, 2005. • [Grottke05b] M. Grottke and K. S. Trivedi, “A classification of software faults,” Supplemental Proc. Sixteenth International Symposium on Software Reliability Engineering, 2005, pp. 4.19-4.20. Previous Slide SAS08_Classify_Defects_Nikora

Mission Characteristics Summary Accomplishments SAS08_Classify_Defects_Nikora

Analysis Results Fault type proportions for the eight projects with the largest number of unique faults Accomplishments Next Slide SAS08_Classify_Defects_Nikora

Analysis Results (cont’d) Proportion of Bohrbugs for the four earlier missions Proportion of non-aging-related Mandelbugs for the four earlier missions Accomplishments Next Slide SAS08_Classify_Defects_Nikora

Analysis Results (cont’d) Proportion of Bohrbugs for missions 3 and 9, and 95% confidence interval based on the four earlier missions Proportion of Bohrbugs for missions 6 and 14, and 95% confidence interval based on the four earlier missions Accomplishments Next Slide SAS08_Classify_Defects_Nikora

Analysis Results (cont’d) Proportion of non-aging-related Mandelbugs for missions 3 and 9, and 95% confidence interval based on the four earlier missions Proportion of non-aging-related Mandelbugs for missions 6 and 14, and 95% confidence interval based on the four earlier missions Accomplishments SAS08_Classify_Defects_Nikora

Machine Learning/Text Mining Results • Discriminate between FSW, GSW, Procedural, and “Other” anomalies • Use standard text-mining techniques to convert natural-language text in problem description, problem verification, and corrective action fields of anomaly report to vectors. • Training set built from anomaly reports analyzed by Nelson Green et al. • 3 representations used to date – word counts, word frequencies, TFxIDF. • These representations will also be tried with text that includes parts-of-speech (POS) information. POS information can be obtained with publicly-available tools. • Apply machine learners implemented in WEKA machine learning environment. • 34 machine learners applied. • Used 10-fold cross-validation used to build learning models. • “Leave-1-out” cross-validation not used to build learning models because of computing time required. • “Best” learner found for pd, pf, accuracy, precision, and F-measure • Also “best” learner found based on distance of ROC curve from ideal point (0, 1) • Discriminate between Bohrbugs, Mandelbugs, and aging-related bugs • Training set based on results of classifying flight software anomalies. • Developing separate training set for ground software anomalies. Accomplishments Next Slide SAS08_Classify_Defects_Nikora

Machine Learning/Text Mining Results Flight software failures vs. all other failures Accomplishments Next Slide SAS08_Classify_Defects_Nikora

Machine Learning/Text Mining Results Ground software failures vs. all other failures Accomplishments Next Slide SAS08_Classify_Defects_Nikora

Machine Learning/Text Mining Results Flight and ground software failures vs. all other failures Accomplishments Next Slide SAS08_Classify_Defects_Nikora

Machine Learning/Text Mining Results Procedural/process errors vs. all other failures Accomplishments SAS08_Classify_Defects_Nikora

Classifying Software Faults to Improve Fault Detection Effectiveness

Classifying Software Faults to Improve Fault Detection Effectiveness

Presentation Transcript

Software Faults and Fault Injection Models

Software Fault-Tolerance

Software faults reliability

Line Fault Detection

Faults in Circuits and Fault Diagnosis

Fault detection

Software Testing: Finding Software Faults

Data Mining Applied To Fault Detection

Fault Detection

Strategies to Consider to Improve ACCV Effectiveness

Sophistocation of Fault Detection

Faults and fault-tolerance

Soft-Error Detection Through Software Fault-Tolerance Techniques

Classifying Software Faults to Improve Fault Detection Effectiveness

Terminology and empirical measures General methods to mask faults . Software-fault tolerance

Software Fault-Tolerance

Faults and fault-tolerance

Classifying fault-tolerance

Fault detection

Fault Detection and Diagnosis

Terminology and empirical measures General methods to mask faults . Software-fault tolerance

Faults and fault-tolerance