1 / 21

Assessment Topics, Part 1

This session focuses on the assessment of essential properties, such as dependability, reliability, availability, robustness, fault tolerance, cyber-security, and safety, for digital equipment used in safety-critical applications. It discusses the real needs, potential failures, mitigation approaches, and the importance of cyber-security and safety in ensuring the reliable operation of digital systems.

larrybryan
Download Presentation

Assessment Topics, Part 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assessment of Digital Equipment for Safety and High Integrity Applications – Session 2 of 6 Assessment Topics, Part 1 Thuy Nguyen and Ray Torok Joint IAEA - EPRI Workshop on Modernization of Instrumentation and Control Systems in NPPs 3 - 6 October, 2006 Vienna, Austria

  2. Essential Properties Assessment of Digital Equipment for Safety and High Integrity Applications

  3. Essential Properties -Dependability • Property allowing a well-founded confidence in the ability of a system to correctly provide an expected service • Different services may be associated with different dependability levels • Usually includes the following main factors • Adequacy of the specified service with respect to the real needs to be addressed by the system • Reliability, i.e., the likelihood that the specified service will be provided as specified • Availability, i.e., the proportion of time where the specified service is effectively provided • Robustness, i.e., the degree to which the system can provide an acceptable service even in abnormal conditions. Also includes • Fault-tolerance • Cyber-security • Safety, i.e., the avoidance of failure modes that have unacceptable consequences • Maintainability of the preceding factors over the required time period

  4. Adequacy: What Are the Real Needs? • Often, the dominant cause of failure of highly dependable digital systems are specification faults • Functional ambition and complexity of individual services • Interdependencies between services • Complexity of interfaces and interactions with other equipment or systems, with human beings • Typical specification issues • Understanding the real needs to be satisfied, the environment of the digital system(s), and / or operational constraints • Real needs, system contexts may change over time • “Traduttore, traditore”: what is specified may not be exactly what is intended • “Intrinsic” errors: incompleteness, ambiguity, inconsistency, ...

  5. Reliability: What Can Go Wrong? • Random failures • Due e.g., to hardware aging, wear, radiation • Effects specific to modern electronic technologies • Failures caused by manufacturing / installation errors • Failures caused by maintenance / modification errors • Competencies, data collection processes, spares • Failures caused by incorrect human-system interactions • Digital HSIs may be inadequate / too complex (human factors) • Digital systems may reduce / mitigate human mistakes • Digital failures • Digital faults: specification faults, design faults, incorrect data • Digital failures occur systematically in the same conditions • Risk of Common Cause Failures (CCF) of multiple systems or channels • Digital failures may originate in individual systems, or in interactions between systems

  6. Digital Faults • No rigorous means to eliminate all digital faults • Mitigation approaches • Fault avoidance • Engineering processes, design rules • Fault detection & removal • Verification & Validation (V&V) processes and rules • Tolerance of residual faults • Avoid activation of residual faults • Activation of residual faults not resulting in system failure • Acceptable system failure modes • Many types of digital faults • Avoidance, elimination, tolerance approaches may depend on faults types

  7. Availability • Main causes of digital systems unavailability • Failures • Maintenance, Periodic testing • Repairs, restarts • Restoring a complex digital system back to service often requires more than just hardware repairs or software reboots • Examples • Understanding the causes and consequences of the failure • Repairing data bases jeopardized by the failure • Resynchronization with other systems of the Infrastructure

  8. Robustness • Ability to maintain the expected service even in abnormal situations • Abnormal external situations, including voluntary aggression • Internal failures • Ability to provide “graceful degradation” if the service cannot be maintained • Identification and specification of acceptable failure modes • Self-monitoring • Highly reliable, but delicate electronic components • Digital failure modes are sometimes difficult to predict

  9. Cyber Security: Vulnerability Factors • Need to optimize operation & maintenance of plant systems and I&C equipment • Remote operation, Remote diagnostics, Remote software maintenance, Data collection • Defenses may need frequent updates that may adversely affect the other dependability factors • Use of “COTS” (Commercial Off-The-Shelf) products • “Black-boxes”, products with unknown vulnerabilities • Consequences may be serious • Unauthorized modification of critical data and software • Confidentiality of information • Standards exist but need to be adapted • Designed mainly for “classical” information systems

  10. Safety • Ability to avoid / mitigate dangerous failures • Issues specific to digital systems • International standards, Best practices

  11. Maintenance of Dependability • Despite • Commercial obsolescence and aging of I&C components & platforms • Modification in plant systems, other I&C systems, operation procedures and / or requirements • Staff turnover

  12. Evaluating Quality & Dependability Assessment of Digital Equipment for Safety and High Integrity Applications

  13. Rule-Based Approaches • Due to complexity, quality, dependability and safety of digital systems are often difficult to achieve and assess • Particularly when high levels of achievement and confidence are required • Standards, technical codes and regulations often specify “how to” requirements • It is assumed that complying with the rules helps • However • There is usually no strong guarantee that the desired properties will be achieved to the desired levels • The desired properties and levels could be achieved using different approaches than those specified by the rules • Rules are often technology and application domain dependent • New (regulatory) issues may not be covered by existing rules

  14. Performance-Based Approaches • Direct justification of quality / dependability / safety “claims” • Claims that are difficult to justify can be decomposed (iteratively if necessary) into supposedly simpler sub-claims • Final sub-claims are supported by factual evidence • Claim  Argument Evidence • Solves some of the weaknesses of rule-based approaches • Good design measures (beyond the generic rules) can be credited and are encouraged • Appropriately documented claims, argument and evidence can be reviewed by independent assessors • Modifications might be easier to assess and justify when a suitable claim-argument-evidence justification already exists • However • More practical experience is still needed • Existing systems and products have usually relied on rule-based approaches • Switching to performance-based approaches might be economically impractical

  15. Types of Evidence - Development Process • Most requirements rule-based approaches concern the development process • Good development processes can help, but they are neither necessary nor sufficient • Most of these requirements represent good practice and should be followed anyway

  16. Types of Evidence – Use of Standards • Hundreds of software development standards are available • No consensus on which development approach is best • Usually intended for large software development from scratch • Overkill for utility applications • Graded approach is most useful • Regulators have endorsed standards • Basis for selection is not clear • Perhaps because “something is better than nothing” • Use of standards implies systematic, well-documented development process • Reviewer should understand principles and confirm that standards were: • Correctly applied and documented • Used on the products of interest

  17. Types of Evidence - Rigorous Reasoning • Systematic “proof” that a (sub-)claim is true • High level of confidence, but usually dependent on assumptions that should be clearly stated • Examples • Static resource allocation can guarantee that all the required resources will be available when necessary (a priori proof) • Formal verification may be used to guarantee freedom from particular “intrinsic software programming faults”, such as index overflows (a posteriori proof) • See also Defensive Measures, and Inter-Channel / Inter-System Data Communication and Susceptibility to Digital CCF • Not applicable to all types of claims and designs • Also, usually not applicable to the last stages of system integration

  18. Types of Evidence - “Sampling” Techniques • Sampling is a universal practice • Testing, simulation • Many support tools • Essential in the later stages of system integration and for validation • Sufficiency criteria (coverage) and levels may be specified when high levels of achievement and confidence are required • But • When enough is enough? • Experience in operation • Some commercial products benefit from large or massive experience in operation • Necessary conditions • Credibility: Can we trust the claimed information? How do we know that failures are reported and correctly analyzed? • Applicability: Is the claimed experience applicable to the product we intend to use and to the expected conditions of use? • Sufficient volume of experience

  19. Types of Evidence - Experience in Operation • Some commercial products benefit from large or massive experience in operation • Necessary conditions • Credibility: Can we trust the claimed information? How do we know that failures are reported and correctly analyzed? • Applicability: Is the claimed experience applicable to the product we intend to use and to the expected conditions of use? • Sufficient volume of experience • Usually not well-suited to programmable products or to products with complex behavior when high levels of achievement and confidence are required • May be used as complementary evidence, or as confirmation

  20. Types of Evidence - Expert Judgment • In most cases, key aspects of assessments must rely on subjective judgment • Trade-offs, acceptance criteria for testing, ... • “Fuzzy” properties like ease of understanding, clarity of documentation, ... • Subjective judgment in an evaluation should be identified • So that other experts can say if they agree or not • Whenever possible, review guidelines should be provided

  21. Conclusion • Rule-based evaluation approaches are still widely applied • And will remain so for the foreseeable future • Performance-based evaluation approaches can be used where rule-based approaches cannot be applied • New technologies (see FPGA) • Issues not well covered by current rules (see Inter-Channel / Inter-System Data Communication, and Susceptibility to Digital CCF) • Wider use of performance-based approaches and increased experience may help improve quality / dependability / safety evaluations

More Related