1 / 32

An Empirical Study on Reliability Modeling for Diverse Software Systems

An Empirical Study on Reliability Modeling for Diverse Software Systems. Xia Cai and Michael R. Lyu Dept. of Computer Science & Engineering The Chinese University of Hong Kong. Outline. Introduction Objectives and previous work

lona
Download Presentation

An Empirical Study on Reliability Modeling for Diverse Software Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Empirical Study on Reliability Modeling for Diverse Software Systems Xia Cai and Michael R. Lyu Dept. of Computer Science & Engineering The Chinese University of Hong Kong

  2. Outline • Introduction • Objectives and previous work • Analyses and investigations on reliability models for diverse software systems • Reliability bounds model by Popov,Strigini, et al • System reliability model by Dugan and Lyu • Discussion • Conclusion

  3. Introduction • Design diversity is one of the two main techniques for software fault tolerance • The rationale of this approach is the expectation that software programs built differently will fail differently • Reliability models attempt to estimate the probability of coincident failures in multiple versions • Empirical data are highly demanded for evaluation and cross-validation of the usefulness and/or effectiveness of these models

  4. Reliability models for design diversity • Eckhardt and Lee (1985) • Variation of difficulty on demand space • Positive correlations between version failures • Littlewood and Miller (1989) • Forced design diversity • Possibility of negative correlations • Dugan and Lyu (1995) • Markov reward model • Tomek and Trivedi (1995) • Stochastic reward net • Popov, Strigini et al (2003) • Subdomains on demand space • Upper/lower bounds for failure probability Conceptual models Structural models In between

  5. Our objectives • To study reliability and fault correlation issues in design diversity by means of mutantation testing • To investigate and compare the prediction performance of different existing reliability models for design diversity

  6. Our previous work • Motivated by the lack of empirical data, we conducted the RSDIMU project in the year 2002. • It took more than 100 students 12 weeks to develop 34 program versions • 1200 test cases were executed on these program versions • 426 mutants were generated by injecting a single fault identified in the testing phase • A number of analyses and evaluations were conducted in our previous work

  7. Outline • Introduction • Objectives and previous work • Analyses and investigations on reliability models for diverse software systems • Reliability bounds model by Popov,Strigini, et al (PS model) • System reliability model by Dugan and Lyu (DL model) • Discussion • Conclusion

  8. PS Model • Proposed by P. T. Popov, L. Strigini, J. May and S. Kuball (2003) • Target: give the upper and “likely” lower bounds for probability of coincident failures • Assumptions: • Given the knowledge on disjoint subdomains Si on the demand space, i.e., 1)the probability P(Si) of a random demand being drawn from Si; 2)the probabilities of failure on demand (pfds) of A and B for demands from Si, PA|Siand PB|Si.

  9. PS Model (cont’) • Alternative estimates for probability of failures on demand (pfd) of a 1-out-of-2 system

  10. PS Model (cont’) • Upper bound of system pfd • “Likely” lower bound of system pfd - under the assumption of conditional independence

  11. Experimental setup • Mutants are treated as program versions in our experiment • 1200 test cases are divided into seven categories by the system status • The first 800 test cases (manually designed for functionality testing) are used as qualification test and other 400 test cases (randomly generated) as operational test

  12. Information on subdomains • Failure data and demand profile subdomains hypothetical Faults in operational test real Upper bounds Analysis Lower bounds Programs passed qualification test

  13. Estimation Method • Since no failure was observed in some subdomains, we adopt confidence bounds method rather than point estimates method in our experiment • One-sided confidence bounds (Bayesian Bounds) are computed for the probabilities of failures • 90% confidence upper bounds as well as lower bounds on pfds of mutants in subdomains under all demand profiles were estimated

  14. Bayesian Bounds under DP4 • 90% confidence upper bounds on pfds in subdomains • 90% confidence lower bounds on pfds in subdomains

  15. Upper bounds • Upper bounds on the joint pfds under all Demand Profiles Failure Lower Analysis

  16. Lower Bounds • “Likely” lower bounds on the joint pfds under Demand Profiles Failure Upper Analysis

  17. Analysis on upper/lower bounds Failure Upper Lower

  18. Discussion • With our data, the confidence bounds in PS model are tighter than PA*PB and min(PA, PB) under most circumstances except • One program performs worse than the other in all subdomains • Negative covariance holds between the failure probability of two programs • Difficulties and limitations of PS model • The way to divide the demand space into disjoint subdomains • The thorough knowledge on the probability and performance of all the versions in each subdomain

  19. DL Model • Proposed by Dugan and Lyu (1995) • 3-level reliability model • A Markov model detailing the system structure • Two fault trees presenting the causes of failures in the initial configuration and the reconfigured state • Assumptions • Unrelated faults: different erroneous results • Related faults: similar erroneous results

  20. DL Model • Example: Reliability model of DRB

  21. DL Model (cont’) • Fault tree models for 2-, 3-, and 4-version systems

  22. Results of DL model with our project data • The new experimental data is applied to verify the effectiveness and consistency of DL model • Six mutants with various failure characteristics are employed in the operational test

  23. Results of DL model with our project data • Failure characteristics for 2,3,4-version configurations

  24. Results of DL model with our project data • Summary of parameter values Prob. of unrelated faults Prob. of related faults between two versions Prob. of related faults in all versions

  25. Results of DL model with our project data • Predicted reliability by different configurations

  26. Results of DL model with our project data • Predicted safety by different configurations

  27. Discussion • Compared our project with former project, the reliability and safety performance of DRB, NVP, NSCP shows consistency of DL model with respect to our experimental data • The discrepancy in the first thousands of hours may indicate dependence on operational domains • The simplified classification of related and unrelated faults need to be improved by including real-life scenarios • To achieve more accurate results, the information about the correlation between successive executions should be included

  28. Comparison of PS & DL Model

  29. Conclusion • Mutants are employed to investigate the prediction performance of two reliability models • Advantages, limitations and performance of PS and DL model are compared • With our data, the confidence bounds in PS model are tighter than PA*PB and min(PA, PB) under most circumstances

  30. Conclusion • The PS approach is helpful with our data to analyze the behaviors of the versions under subdomains in revealing the features of fault correlation among diverse programs • Our analyses with DL model about the reliability and safety features of DRB, NVP and NSCP are consist with the original experiment, although there are crossovers in the first thousands of hours in the reliability curves

  31. Future work • More test cases should be employed for cross-validation on the prediction accuracy of PS model and DL model • Other existing reliability models can be applied for further comparisons with our experimental data

  32. Q & A Thank you! Dept. of Computer Science & Engineering

More Related