1 / 17

NASA OSMA SAS '03

System and Software Reliability Dolores R. Wallace SRS Technologies Software Assurance Technology Center http://satc.gsfc.nasa.gov / Dr. William H. Farr, Dr. John R. Crigler Naval Surface Warfare Center Dahlgren Division. NASA OSMA SAS '03. Overview of the Problem.

frye
Download Presentation

NASA OSMA SAS '03

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. System and Software ReliabilityDolores R. WallaceSRS TechnologiesSoftware Assurance Technology Centerhttp://satc.gsfc.nasa.gov/Dr. William H. Farr, Dr. John R. CriglerNaval Surface Warfare Center Dahlgren Division NASA OSMA SAS '03 SAS 03/ GSFC/SATC-NSWC-DD

  2. Overview of the Problem • Reliability Measurement is a critical objective for NASA systems • Systems are assessed from the software/hardware/systems perspective • Methodologies for hardware reliability assessment have been developed and utilized over the past several decades • Methodologies for software reliability assessment have been developed since the 70’s and have been utilized over the last twenty years • Methodologies for system reliability assessment have only been addressed over the last 10 years with little application experience • Need for a tool that integrates all aspects of reliability data (software, hardware, and systems perspectives) SAS 03/ GSFC/SATC-NSWC-DD

  3. Project Objectives • Enhance the capability for NASA to assess software reliability by identifying and incorporating recent models into the tool Statistical Modeling and Estimation of Reliability Functions for Systems (SMERFS^3) • First Year Initiative • Perform a detailed literature search (1990 and beyond) • Enhance the capability for NASA to assess system reliability by updating SMERFS^3 • Second Year Initiative • Identify system models for incorporation • Apply the identified methodologies to project data sets within the NASA/DoD environments SAS 03/ GSFC/SATC-NSWC-DD

  4. FY03 Research Plan • Literature search • Selection of new models • Build new models into SMERFS^3 • Test new models with Goddard project data • Make latest version of SMERFS^3 available SAS 03/ GSFC/SATC-NSWC-DD

  5. Articles from 1990 forward Journals - sample IEEE TSE IEEE Reliability Software Testing, Verification, and Reliability IEEE Software IEEE Computer Conferences ISSRE ICSE Reliability & Maintainability High-Assurance Systems Eng. Various others Model selection criteria Model assumptions Fit within current SMERFS^3 Type of system Data availability Domain Experts Literature Search SAS 03/ GSFC/SATC-NSWC-DD

  6. Software Real-time Large-scale Time-critical Embedded Maybe heavy COTS Distributed System Safety-critical components Heterogeneous Fault tolerant Costly to develop Long lifetime, evolutionary Characteristics of the Software Based Systems SAS 03/ GSFC/SATC-NSWC-DD

  7. SMERFS^3 • Current Version features: • 6 software reliability models • 2D, 3D plots of input data, fit into each model • Various reliability estimates • User queries for predictions • Updates constraints: • Employ data from integration, system test, or operational phase • Use existing graphics of SMERFS^3 • Integrate with existing user interfaces, goodness-of-fit tests, and prediction capabilities SAS 03/ GSFC/SATC-NSWC-DD

  8. Available Data • Large GSFC project, but confidentiality required • GSFC person invaluable in explaining the system and the data • Several subsystems • Data flat files – much effort into spreadsheet/database • Operational failures only • Remove specific faults and sort others • Apply IntervalCounter • Bottom line: organizing data required substantial effort – minimized if project person prepared the data SAS 03/ GSFC/SATC-NSWC-DD

  9. Identified Models • Hypergeometric • Schneidewind (enhancements) • Log-logistic • Extended Execution Time (EET) • The first two models require error count failure data; the last two require time-between-failure data • Only error count data has been captured in the GSFC project database available for analysis • Hence, software reliability additions to SMERFS^3 in this task will be limited to the hypergeometric model and the metrics enhancements to the Schneidewind model SAS 03/ GSFC/SATC-NSWC-DD

  10. Hypergeometric ModelAssumptions • Test instance, t(i): A collection of input test data. • N: Total number of initial faults in the software. • Faults detected by a test instance are removed before the next test instance is exercised • No new fault is inserted into the software in the removal of the detected fault. • A test instance t(i) senses w(i) initial faults. w(i) may vary with the condition of test instances over i. It is sometimes referred to in the authors' papers as a "sensitivity" factor. • The initial faults actually sensed by t(i) depend upon t(i) itself. The w(i) initial faults are taken randomly from the N initial faults. SAS 03/ GSFC/SATC-NSWC-DD

  11. Hypergeometric Model • Meets many of our selection criteria: • Data type • Fits within the framework of the SMERFS^3 software • Research shows that it appears to perform well against other models • Allows for testing intensity factor (for example: number of test cases, number of testing personnel,debug time) • Scheduled for implementation in the last quarter of FY03 SAS 03/ GSFC/SATC-NSWC-DD

  12. Schneidewind Model • There are three versions: • Model 1: All of the fault counts for each testing period are treated the same. • Model 2: Ignore the first s-1 testing periods and their associated fault counts. Only use the data from s to n. • Model 3: Combine the fault counts of the intervals 1 to s-1 into the first data point. Thus there are s+1 data points. SAS 03/ GSFC/SATC-NSWC-DD

  13. Schneidewind Assumptions • The number of faults detected in each of the respective intervals are independent. • The fault correction rate is proportional to the number of faults to be corrected. • The intervals over which the software is tested are all taken to be of the same length. • The cumulative number of faults by time t, M(t), follows a Poisson process with mean value function μ(t). The mean value function is such that the expected number of fault occurrences for any time period is proportional to the expected number of undetected faults at that time. • The failure intensity function, λ(t), is assumed to be an exponentially decreasing function of time; that is, λ(t)=αexp(-βt) for some α, β > 0. SAS 03/ GSFC/SATC-NSWC-DD

  14. Schneidewind Model Enhancements • Meets many of our selection criteria: • Data type • Basic model already in the SMERFS^3 software • It has been shown to perform well against other models • Allows learning curve effect • Updates are being implemented this quarter • Risk measures • Operational quality at time t • Risk criterion metric for the remaining faults at time t • Risk criterion metric for the time to next failure at time t • Confidence intervals SAS 03/ GSFC/SATC-NSWC-DD

  15. Data Analysis of NASA Three Month Fault Counts

  16. Proposed Next Steps • FY03 – Focused on software • Complete implementation and testing • Prepare paper describing the research and model selection, implementation, conclusions • Apply the enhancements on the Goddard data set • Prepare SMERFS^3 for distribution • FY04 • Conduct similar research effort for System Reliability • University of Connecticut will participate • Enhance and validate system models SAS 03/ GSFC/SATC-NSWC-DD

More Related