1 / 108

Software Engineering 533: Software Metrics and Economics

Software Engineering 533: Software Metrics and Economics. Linda M. Laird Reliability Stevens Institute of Technology. A question to consider.

spencer
Download Presentation

Software Engineering 533: Software Metrics and Economics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software Engineering 533: Software Metrics and Economics Linda M. Laird Reliability Stevens Institute of Technology © 2006 Linda Laird All Rights Reserved

  2. A question to consider • You have a boss who is extremely . committed to quality. He tells you, “I want 99.99% reliability for the first 6 months in operation.” You are still in the architecture phase of your project. What do you say and do? © 2006 Linda Laird All Rights Reserved

  3. Software Reliability • Software Reliability is a probability that the software system will function without failure under a given environment and during a specified period of time. © 2006 Linda Laird All Rights Reserved

  4. Looking deeper into the definition • Consider the word probability. • We are not speaking about absolutes. We are speaking about probabilistic events and predictions. The system may actually have higher or lower reliability in operation, and the reliability prediction can still be correct. • The next interesting phrase is without failure. • This phrase indicates that you need to have a good definition of what a failure is and what it is not. For the telephone network, dropping a call was not considered a failure, crashing the network was. • Under a given environment is extremely important. • A system which was designed and tested on Windows XP may have significantly different behavior on Windows NT. Or a system which was built and proved extremely effective and safe for controlling one type of Anti-Ballistic Missile may not be effective and safe for the next generation of ABM. • During a specified period of time • The requirement for a highly critical piece of software to have a reliability of 99.999% for two seconds versus two years is entirely different. © 2006 Linda Laird All Rights Reserved

  5. Reliability Agenda • Faults and Failures • Cost of Reliability • Failure, Severity Classes, and Failure Intensity Definition • Reliability Theory • Reliability Models • Operational Profiles • When do I ship? • System Configurations: Probability and Reliability © 2006 Linda Laird All Rights Reserved

  6. Defects and Reliability A fault leads to a failure. Defect Metrics are measuring Faults. Reliability Metrics are measuring Failures. People make errors. These are key concepts – REMEMBER THEM – especially the difference between faults and failures © 2006 Linda Laird All Rights Reserved

  7. Faults vs. Failures If the code that contains the faults is never executed in operation……. Then the system never fails. And the MTBF -> Infinity Conversely…. If the there is only 1 fault in the entire system, and it is executed every time, then the system has no reliability, and the MTBF -> 0. Failures only occur in operation. Faults are in defects in the system that may or may never be seen in operation. © 2006 Linda Laird All Rights Reserved

  8. Cost of Reliability – No free lunch • Reliability is not free. • Each percentage of improvement is costly. • Exact relationship is between cost and reliability is not completely understood • This is in part due to the difficulty of isolating the variables of cost and reliability in any kind of repeatable study. • Multiple Cost of Reliability Models/Theories • All are consistent in that cost increases as the reliability requirement increases. • Will look at 3 different models © 2006 Linda Laird All Rights Reserved

  9. Reliability Cost Models – COCOMO-II • RELY – one of the COCOMO – II cost drivers • Measure of the extent to which the software should not fail over a period of time • Ranges from Very Low at .82 to Very High at 1.26, with 1 being “nominal,” which gives a factor of 1.54 to the cost equation. • Nominal means “moderate, easily recoverable losses.” • Example: • If a project were to cost $1M, where the impact of a failure was “moderate, easily recoverable” and instead, you realized that the impact of a failure was really “risk to human life,” then the same project would be projected to cost $1.26M, in order to improve the reliability to an acceptable level. © 2006 Linda Laird All Rights Reserved

  10. “Availability Index” for Systems • Marcus and Stern Availability Index, which shows a logarithmic relationship between system availability and cost. • Mathematical relationship between Availability and Reliability • (R=e ((A-1)/A*MTTR)*t) where A is availability, and MTTR is Mean Time to Repair. • Can transform the Marcus Stern Index into a Reliability versus Investment chart, which has a logarithmic relationship as well. • Note that this is transformation is for systems, and includes hardware as well software. Many experienced software practitioners feel this curve is valid for software alone. Marcus, Evan and Stern, Hal, Blueprints for High Availability, pgs 5-7 © 2006 Linda Laird All Rights Reserved

  11. Cost of Reliability Index © 2006 Linda Laird All Rights Reserved

  12. Sha’s Reliability Model: R=e -kCt/ E • Relates Effort and Reliability[ii], • k is a scaling constant • C is complexity • E is the additional effort spent to improve reliability. • This equation gives the same logarithmic relationship between Investment and Reliability, that is E=-kCt/(lnR). Bernstein, Larry. “Software Fault Tolerance” from Advances in Computers, Volume 58 (edited by Marvin V. Zelkowitz), Academic Press, 2003. © 2006 Linda Laird All Rights Reserved

  13. Using Sha’s model: R=e -kCt/ E • If you increase the effort spent on reliability by 10%, what does it do to the reliability? © 2006 Linda Laird All Rights Reserved

  14. Using Sha’s model: R=e -kCt/ E • If your MTTF is 100 days, and you increase the effort from 5 staff years to 6 staff years, what does it do to the MTTF? • λold = .01; kC/5 = .01; kC = .05; • Rnew= e -.05t/6 • Rnew = e -.0083t • λ new = 120 days • More generically, if you increase the effort spent on reliability by x%, it increases the MTTF by x%! © 2006 Linda Laird All Rights Reserved

  15. Cost of Reliability Engineering Rules • Seems to be a logarithmic relationship between cost and software reliability, as shown in the in the previous chart. • Recommend using that relationship, and/or the COCOMO factors and tune them for your environment. • To get started – cost vs. “nominal cost” -based upon the COCOMO parameters. • use 25% more effort for a highly reliable system (NOT ultra-reliable) • 20% less for “don’t care” reliability software © 2006 Linda Laird All Rights Reserved

  16. A Few Reliability Concepts • Failure and Faults – Faults are in the code, Failures occur during operation • Failure Intensity – Failures per unit of Operation (e.g, Time) • Failure Severity Class – set of failures which have same impact on user © 2006 Linda Laird All Rights Reserved

  17. Defining Failures, Severity Levels & Failure Objectives © 2006 Linda Laird All Rights Reserved

  18. Defining Failures • Based upon System • Could be • Crashes or Hangs • Transaction aborts • Storing Invalid Data • Security Breach • ? • Need to decide what is a failure for your system © 2006 Linda Laird All Rights Reserved

  19. Failure Examples • Telephone Network • Network unavailable? • Dropped Call? • Connection to incorrect number? • ? • Amazon • Site unavailable? • Can not take orders? • Search function not working? • Personal recommendations function not working? • Space Shuttle • Explosion? • Unable to perform some parts of mission? Notice the difference between these and defects…they are concerned with the operational impact, rather than the defect(s) which cause them. © 2006 Linda Laird All Rights Reserved

  20. Failure Severity Classes • Need to understand failures and failure severity classes (e.g., equivalence classes) • Allows prioritization of • repair and restore activities - will have different processes and/or performance requirements of different levels of faiures • Provides a finer granularity for defining reliability © 2006 Linda Laird All Rights Reserved

  21. Failure Severity Classes Examples: • Telephone Network: • Network Down • Part of Network Down • One Switching Office Down • One Call Dropped • Amazon.com: • Entire Site Down or Billing Incorrect • Some “Stores” Unavailable • Transaction Crashes • Challenger: • Mission Aborts • Mission At Risk • Some functions of mission compromised • Inconsequential © 2006 Linda Laird All Rights Reserved

  22. Failure Severity Class Definition • Severity Classes and Meanings will Vary with your Product • Can be based upon Cost Impact, System Capability Impact, User Impact… • Typically have ~4 to 5 classes • Generic Severity Levels (from Musa): Cost based • Severity 1--- >$100,00 IMPACT • Severity 2 ---10k->100k • Severity 3 ---1k -> 10k • Severity 4 ---<1k • Can be used as a starting point or when you don’t know anything else better to use. • When you have multiple components within your product, want to use the same failure classes for all components. © 2006 Linda Laird All Rights Reserved

  23. Failures need a unit of execution • Reliability needs to be specified over some unit of execution…e.g., ….Failures per… • Execution Time • Number of Transactions • Number of Missions • Elapsed Time • Frequently use execution time (5 failures per hour) but other units may be better • Use the unit that makes most sense to your application © 2006 Linda Laird All Rights Reserved

  24. Failure Execution Unit Examples: • Telephone Network: Possible Units: Calls Processed, Time • Network Down • Part of Network Down • One Switching Office Down • One Call Dropped • Amazon.com: Possible Units: Transactions Processed, Time • Entire Site Down or Billing Incorrect • Some “Stores” Unavailable • Transaction Crashes • Challenger: Possible Units: Missions, Flight Time • Mission Aborts • Mission At Risk • Some functions of mission compromised • Inconsequential © 2006 Linda Laird All Rights Reserved

  25. Failure Intensity Objectives • Need reliability objectives for your product • Example: • Amazon.com: • Entire Site Down – Objective: Once per year • Some “Stores” Unavailable – Objective: One store unavailable per week • Transaction Crashes – 1 crash per 2 Million transactions © 2006 Linda Laird All Rights Reserved

  26. Failure intensity objectives • Set the overall product objective based on analysis of the value/specific user need of the individual components • Musa’s has some guidelines • When nothing better to be used • Based upon acceptable risk concept • Other Possibilities • Value/cost to business • Continuous Improvement • … © 2006 Linda Laird All Rights Reserved

  27. Failure Intensity Engineering Rules - Ultrahigh reliability - < 10-4 High Reliability 10^-4 -> 10^-2 Commercial 10^-2 -> 2 Prototype > 2 Musa 1998 © 2006 Linda Laird All Rights Reserved

  28. Example Using Engineering Rules Amazon Examples: What unit should time be? Calendar or execution? Or if we use the commercial guideline – then it is the FIO is 10^-2 to 2 which translates into once every ½ to 100 hours – hard to make sense in this case, since the different severity classifications © 2006 Linda Laird All Rights Reserved

  29. Using the Engineering Rules • As you can see, very rough – but can get you started • Helps to put a value on acceptable risk for human life • Best is if you understand the costs of the failures, and can balance that against the cost of the reliability. © 2006 Linda Laird All Rights Reserved

  30. Introduction to Reliability Theory © 2006 Linda Laird All Rights Reserved

  31. Reliability – The Problem The basic problem is to….. predict when a system will fail…. be it an America’s Cup Spinnaker, a probe to Mars, a network, or google. © 2006 Linda Laird All Rights Reserved

  32. Hardware Reliability Components wear out – due to corrosion, shock, overheating etc… • Physical in nature • Probabilistic © 2006 Linda Laird All Rights Reserved

  33. Software Reliability: • Is similar - we can use the same basic approach – even though we don’t have the same physical issues. • We have Probabilities of failure • Vary over time • Can create a graph of it and a model • For each model, there is a Probability Distribution Function --- f(t) – which is the probability the system will fail during time t. © 2006 Linda Laird All Rights Reserved

  34. f(t) .1 10 t One Software Reliability Model • Suppose we have a software component that we know will fail at least once in 10 days…and may fail any time up to that point with equal probability. • Then f(t) = 1/10 for t=1 to 10, f(t) = 0 for t > 10. © 2006 Linda Laird All Rights Reserved

  35. Uniform Distribution • In the above case, f(t) has a uniform distribution, e.g., the probability is the same it will fail any day (until 10) • A uniform distribution describes some situations well, but has some major limitations, in that • Many not know the endpoint • Probability may vary over time © 2006 Linda Laird All Rights Reserved

  36. f(t) t Random distribution - • Another model uses random distributions…that failures are apt to occur randomly over time .. E.g., independent of the past. • Then, f(t) = * e - t where  is the instantaneous failure rate” (also called the failure intensity, and also = 1/MTTF) Notes: 1)It never reaches 0 2) it is also called the exponential distribution © 2006 Linda Laird All Rights Reserved

  37. Probability of failure during an interval • Prob of failure between time t1 and t2 = • For our example of uniform distribution: f(t) = • For the exponential distribution: f(t) = © 2006 Linda Laird All Rights Reserved

  38. Examples - Interval • Probability of failure between time 5 and 6 for our uniform distribution = .1 * (6-5) = .1…e.g., 10% • Probability of failure between time 6 and time 5 for our exponential distribution = e-  *5 – e-  *6 … for = 1, = .004, for  =.1, = .057….e.g., 5.7% © 2006 Linda Laird All Rights Reserved

  39. Cumulative Distribution Function: F(t) • Cumulative Distribution Function (CDF) --- is the probability that the system will fail by time t --- F(t) = • For our uniform Distribution = t*.1 • For the exponential distribution = F(t) = © 2006 Linda Laird All Rights Reserved

  40. Reliability Function – R(t) • Reliability function is the probability the system has NOT failed by time t R(t) = 1-F(t) • For Random Distribution (e.g., the exponential case) • R(t) = 1 - = © 2006 Linda Laird All Rights Reserved

  41. Graphically – Uniform Dist: F(t), R(t) © 2006 Linda Laird All Rights Reserved

  42. Graphically – Expl Dist: F(t), R(t) Lambda = .1 © 2006 Linda Laird All Rights Reserved

  43. Summary • There are 3 primary functions in reliability theory • f(t) = Probability distribution function of failures • F(t) = Probability of failure by time t • R(t) = Probability of no failure by time t. • They are related by: • F(t) = • R(t) = 1 – F(t) • And f(t) describes the distribution of the failure arrivals– it can be a uniform distribution, normal, or other distributions © 2006 Linda Laird All Rights Reserved

  44. Questions? © 2006 Linda Laird All Rights Reserved

  45. Reliability Models • Purpose • Static vs. Dynamic Models • 3 Dynamic Arrival Distributions • Summary © 2006 Linda Laird All Rights Reserved

  46. Reliability Models • Used to estimate: • the number of latent defects when shipped or • a product’s reliability • Why? • Objective statement of quality of product • Number of reasons – • When to ship • Meeting Reliability requirement • Resource planning for maintenance © 2006 Linda Laird All Rights Reserved

  47. Model Types • Two Categories: Static and Dynamic • Static uses attributes of the program to estimate number of defects • Usually work better at the module level to provide indication to engineers on where to focus • Typically of form y = f(a,b,c,d,e…) where y is the defect rate or # of defects, and a->z are attributes of the product, process, or project…ex. Coqualmo • Dynamic is usually based on statistical distributions • Work better “in the large” on projects – when you need to estimate when/if the project will fail. • Two types • One that model the entire development  Rayleigh distributions • One that models the testing/deployment process  exponential models © 2006 Linda Laird All Rights Reserved

  48. Reliability Distributions • Many proposed fault distribution models – get very complicated very quickly • The 4 that we will be concerned with are: • Uniform • Exponential (aka, Random) • Rayleigh • S Curves – Take into account “find/fix” times for faults • Will examine the last 3 in more detail © 2006 Linda Laird All Rights Reserved

  49. Rayleigh Distributions • Definition • What do they look like • Predicting defects • Recommendations © 2006 Linda Laird All Rights Reserved

  50. Rayleigh Curves • One distribution of arrival rates for failures • Distribution used extensively in hardware reliability and in many texts on defects (see previous lecture) • One of the curves that has shown to match experiential data well - Others are exponential and S curves • In the family of Weibull curves; • Which have the form of: • F(t) = 1 – e(-t/c)m ; • f(t) = (m/t)*(t/c)me (-t/c)m • For m = 1Exponential Distribution • For m= 2  Rayleigh Distribution; © 2006 Linda Laird All Rights Reserved

More Related