1 / 66

SENG 521 Software Reliability & Testing

SENG 521 Software Reliability & Testing. Defining Necessary Reliability (Part 3). Contents. Steps in defining necessary reliability Failure severity class (FSC) Failure intensity objective (FIO) Strategies to meet FIO System reliability Reliability economics. SRE: Process /1.

mika
Download Presentation

SENG 521 Software Reliability & Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SENG 521Software Reliability & Testing Defining Necessary Reliability (Part 3)

  2. Contents • Steps in defining necessary reliability • Failure severity class (FSC) • Failure intensity objective (FIO) • Strategies to meet FIO • System reliability • Reliability economics

  3. SRE: Process /1 • 5 steps in SRE process: • Define necessary reliability • Develop operational profiles • Prepare for test • Execute test • Apply failure data to guide decisions Define Necessary Reliability Develop Operational Profile Prepare for Test Execute Test Apply Failure Data to Guide Decisions

  4. Part 3 Section 1 How to define Necessary Reliability?

  5. Necessary Reliability: How to • Define failure with “failure severity classes (FSC)” for the product. • Set a “failure intensity objective (FIO)” for each system to be tested. • Choose a common scale for all associated systems. • Find the developed software failure intensity objective. • Engineer strategies to meet the software failure intensity objective.

  6. 1. Failure Severity Classes • Failures usually differ by their impact on the system • A failure Severity Class (FSC) is a set of failures that have the same per-failure impact on users using a failure classification criteria • Common classification criteria: • cost, system capability, human life, environment • Failure severity is different from its complexity • Severity can change with the time of failure

  7. FSC: Common Classification • Common classification criteria: Cost • What does this failure cost in terms of operational cost, repair cost, loss of business, disruption, etc. • Severity classes based on cost may be scaled by a factor of 10. • Usually 4 ranges are enough.

  8. FSC: Common Classification • Common classification criteria: System capability (Services) • May include factors such as loss of data, downtime, recoverability, etc.

  9. FSC: Common Classification • Common classification criteria: Environment • May include factors such as harmful to environment, loss of wild life, etc. • Applicable to nuclear, chemical industry, etc.

  10. FSC: Common Classification • Common classification criteria: Human life • May include factors such as harmful to human or environment, loss of human life, etc. • Applicable to aeronautical, automotive, nuclear, health care industry, military systems, etc.

  11. How to Define FSC? • Experience based: ask users/ stakeholders/ developers/ compare to similar products • List all factors that may be considered as failure severity for the project • Narrow the list down to the most critical and/or measurable ones • Some factors may be hard to measure, such as impact on company reputation, etc.

  12. FSC: Conflicting Concerns • Conflicting viewpoints (concerns) between the software developer and customer regarding failure severity class (FSC) should be resolved before proceeding to set target failure intensity objective • Comparison of the FSC for the software with a similar product is usually useful

  13. Documenting FSC Define classes for each criterion separately

  14. 2. Failure Intensity Objective (FIO) • Failure intensity objective (FIO) reflects an estimation of the “bugs” allowed to be remained in the product at the release time. • FIO is an alternative way of expressing reliability.

  15. Failure Intensity Objective • Failure intensity is usually given in terms of number of failure per time (or some other defined units), e.g., • 3 alarms per 100 hours of operation. • 5 failures per 1000 print jobs, etc. • Failure intensity of a system is the sum of failure intensities for all of the components of the system (assuming exponential model).

  16. How to Set FIO /1 • Mainly experience based and depends on the project. • Depends on the trade-off among quality characteristics (development time and development cost) and functionality and technology. • Rule of thumb: Estimate the project’s total cost (C), e.g., using COCOMO’s Early Design Model, etc., and set FIO to be 1 over C (i.e., C units of operation, assuming that the cost of highest impact is equal to the total development costs)

  17. How to Set FIO /2 • Typical FIO for various projects

  18. How to Set FIO: Reliability • Setting FIO in terms of reliability  is failure intensity R is reliability t is natural unit (time, etc.) • For reliability around 0.992 for 8 hours of operation,  is set to0.001

  19. Reliability & Failure Intensity

  20. How to Set FIO: Availability • Setting FIO in terms of system availability (A) for the exponential model :  is failure intensity is downtime per failure • e.g., if a product must be available 99% of time and downtime is 6 min, then FIO is about 1 per 10 hours.

  21. How to Set FIO: MTTF • Using MTTF  failure intensity MTTR meantime to repair MTTF meantime to failure • Another definition of availability:

  22. How to Set FIO: Hazard Rate • Hazard Rate z(t): The probability that the component will fail in a given time interval given that it has not failed prior to the interval • Hazard rate of 0.05 means that there is a 5% chance that the first failure will occur in the specified time interval and not before • For exponential distribution, z(t) is 

  23. Reliability vs. Availability • Why specify reliability when availability is better understood and has better intuitive appeal? • Availability has a subjective appeal to the user and there are usually workarounds to make the system available without increasing the intrinsic reliability of it. • Example:Using a replica server in case the domain server goes down increases the availability of the system but it does not necessarily increase the reliability of the server software.

  24. Developed Software Product • Developed software product is usually only a part of the whole system Interface to other systems Acquired components Developed components OS, System software Hardware

  25. 3. Choose a Common Scale • There may be various scales for expressing FIO for various project parts. • Example: • System failure intensity objective = 30 failure/1,000,000 transactions • MTTF for OS is 3,000 hours for 10 million transactions • MTTF for hardware is 1 per 30 hours of operation • One must define a unique scale for all FIOs

  26. FIO for Developed Product • How to compute failure intensity objective for the developed software? • Set FIO for the whole system • Set a common measurement unit for failure intensity for the whole system • Subtract expected failure intensity for acquired components from the FIO. • Subtract expected failure intensity for the environment (OS, interface systems) that the developed software will run on • The remaining will be failure intensity objective for the developed software components.

  27. Computing Developed FIO Example 1: • System failure intensity objective = 100 failure/1,000,000 transactions • Failure intensity for hardware = 0.1 failure/hour • OS failure for a load of 100,000 transactions = 0.4 failure/hour • Therefore, developed software FIO = 95 failure/1,000,000 transactions

  28. Computing Developed FIO Example 2: Database system running on Win 2K • System failure intensity objective = 30 failure/1,000,000 transactions • MTTF for Win 2K is around 3,000 hours for 10 million transactions • Average hardware failure is 1 per 30 hours • Failure rate for other systems is 9 for one million transactions • What is FIO for the developed software?

  29. Computing Developed FIO

  30. 4. Strategies to Meet FIO • Engineer strategies to meet the software failure intensity objective for the developed software. • 4 main strategies: • Fault prevention • Fault removal • Fault tolerance • Fault/failure forecasting

  31. Fault Prevention • To avoid fault occurrences by construction. • Activities: • Requirement review • Design review • Clear code • Establishing standards (ISO 9000-3, etc.) • Using CASE tools with built-in check mechanisms • Effectiveness factor: • Proportion of the faults remaining after prevention activities.

  32. Fault Removal • To detect, by verification and validation, the existence of faults and eliminate them. • Activities: • Code review • Test • Effectiveness factor: • Reduction of failure intensity due to code review. • Ratio of failure intensity after test and before test.

  33. Fault Tolerance • To provide, by redundancy, service complying with the specification in spite of faults occurrences. • Activities: • Designing and implementing redundancy • Effectiveness factor: • Reduction of failure intensity as a result of redundant design.

  34. Fault / Failure Forecasting • To estimate, by evaluation, the presence of faults and the occurrences of failures • Activities: • Establishing reliability model • Collecting failure data • Analysis and interpretation of results • Effectiveness factor: • Reduction of failure intensity as a result of applying reliability engineering

  35. Part 3 Section 2 System Reliability

  36. System Reliability /1 • A system usually consists of components. • Each component consists of sub-components. • Components may have • Different reliability • Different dependencies among each other • System reliability is a function of the reliabilities of the (sub-) components and of the relationships between the components.

  37. ... R1/1 R2 /2 Serial System Reliability • System is composed of n independent serially connected components. • Failure of any component has a cross system effect, i.e., results in failure of the whole system. • A serial system has always smaller reliability than its components (because Rk 1).

  38. Combining Reliabilities /1 • Serial system reliability can be calculated from component reliabilities, if the components fail independently of each other. • For serial systems: • Components reliabilities (Rk) must be expressed with respect to a common interval. Qp number of components Rk component reliability

  39. Combining Reliabilities /2 • Using relation between reliability and failure intensity: • Will lead to: • i.e., total failure intensity is the sum of failure intensity of components

  40. Example: Serial System • The system is composed of 4 independent serially connected components • R1 = 0.95 • R2 = 0.87 • R3 = 0.82 • R4 = 0.73 Rsystem = 0.95  0.87  0.82  0.73 = 0.4947 • Serial system reliability is smaller than any individual reliability of the components

  41. R1 R2 ... Parallel System Reliability • System is composed of n independent components connected in parallel. • Failure of all components results in the failure of the whole system (principle of active redundancy).

  42. Example: Parallel System • The system is composed of 4 independent components connected in parallel • R1 = 0.95 • R2 = 0.87 • R3 = 0.82 • R4 = 0.73 Rsystem = 1 – ((1 – 0.95)  (1 – 0.87)  (1 – 0.82)  (1 – 0.73)) = 0.9996 • Parallel system reliability is greater than any individual reliability of the components

  43. R11 R12 R1j R1n R21 R22 R2j R2n Ri1 Ri2 Rij Rin Rm1 Rm2 Rmj Rmn Parallel-Series System path

  44. R11 R12 R1j R1n R21 R22 R2j R2n Ri1 Ri2 Rij Rin Rm1 Rm2 Rmj Rmn Series-Parallel System subsystem

  45. Other Constructs • One-way bridge • Two-way bridge

  46. Active Redundancy • Employs parallel systems. • All components are active at the same time. • Each component is able to meet the functional requirements of the system. • Only one component is required to meet the functional requirements of the system. • Each component satisfies the minimum reliability condition for the system. • System only fails if all components fail.

  47. m – out of – n System • System has n components. • At least m components need to work correctly for the system to function properly (m  n). • m=n: serial system • m=1: parallel system • e.g.: airplane with 4 engines can fly with only 2 engines. R1 R2 m/n Ri Rn Assumption: All components have the same reliability.

  48. Reliability Block Diagram (RBD) • Reliability Block Diagram (RBD) is a graphical representation of how the components of a system are connected from reliability point of view. • The most common configurations of an RBD are the series and parallel configurations. • In a serial system configuration, the elements must all work for the system to work and the system fails if one of the components fails. The overall reliability of a serial system is lower than the reliability of its individual components. • In parallel configuration, the components are considered to be redundant and the system will still cease to work if all the parallel components fail. The overall reliability of a parallel system is higher than the reliability of its individual components. • A system is usually composed of combinations of serial and parallel configurations. • RBD analysis is essential for determining reliability, availability and down time of the system.

  49. RBD: Example /1

More Related