1 / 17

Fehlererkennung in SW

Fehlererkennung in SW. David Rigler. Overview. Types of errors detection Fault/Error classification Description of certain SW error detection techniques Evaluation (Coverage / Overhead) Conclusion. Failure Runtime Detection (in Software). Software Diversity / N-Version P.

chars
Download Presentation

Fehlererkennung in SW

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fehlererkennung in SW David Rigler

  2. Overview • Types of errors detection • Fault/Error classification • Description of certain SW error detection techniques • Evaluation (Coverage / Overhead) • Conclusion

  3. Failure Runtime Detection (in Software) • Software Diversity / N-Version P. • Defensive Programming • Assertions • Bound/Range checking • Control Flow checking • Block Entry Exit Checking • Error Capturing Instructions • Advanced Techniques … • Redundant Data/Code SW - Failures HW - Failures

  4. Transient Hardware Error Classification • Data Errors • Code Errors • Type S1 Statements • affecting data only • Type S2 Statements • affecting the execution flow • Type E1 Errors • changing operation (not control flow) • Type E2 Errors • changing the Statement type (S1 S2)

  5. Data Errors (Executable Assertions) • Generic • Bound • Integrity • For SW and HW Errors • Non-Generic • Value Range • Approximate (False alarm)

  6. Data Errors (systematic Data Redundancy) • Rules • Duplicate every variable: x -> (x1 and x2) • Perform write operations on x1 and x2 • Read operation on x -> check for consistency of x1 and x2

  7. Data Errors (systematic Data Redundancy) • Generic Approach • Use pre-processor on high level language • Compiler optimisations may be a problem • All (visible) single Bit Flip Errors in DATA Memory can be detected

  8. ControlFlowErrors Block EntryExitChecking • Uniquesignatures for Basic Blocks • Assign at Entry • Compare at Exit • Problems • Jumps within Block • Granularity • Jumps to unused Area

  9. ControlFlowErrors • Duplicate Condition Checks

  10. ControlFlowErrors • Error Capturing Instructions • Special or unused Instructions • Trap, SWI, … • Spread over unused Memory • Program Memory • Data Memory • Call Error Handling Function

  11. ControlFlowErrors • Watchdog Timer • Periodically reset timer • Take Action at specific timer value • Needs Support of Hardware • Common in embedded Controllers • Detects infinite loop errors

  12. Coverage Example 1 • BEEC, Duplicate Condition Checks, Systematic Data Redundancy • Simulated bit-flip errors in memory • ~ 5x Performance slow down • ~ 2x Size • No Silent Violations (Data) • High Coverage even for Errors in Code Area.

  13. Coverage Example 2 • Physical Fault Injection • Heavy-Ion Radiation • Power-Supply Disturbances • Hardware WDT • Effect of additional SW • 60%  85%

  14. Improving Coverage • Separate BB for redundant variables • Separated in Memory • No single bit-flip jumps • Use cumulative Signatures • Detect jumps within Block • Avoid Signature aliasing • Hamming distance

  15. 100% Coverage • For simple failure model • Single bit-flip • Data- and Code-Memory/Registers • Hidden Registers not included (Branch Buffer, Cache tags, etc) • High Overhead • ~4x Memory usage • >3x Time

  16. Conclusion: Error Detection in SW • Pure SW: high coverage only for simple failure models • Addition to HW Error Detection • Trade-off: Overhead Coverage • Fine tuning possible • Use available Resources (Time, Memory)

  17. Miremadi G., J. Karlsson, U. Gunneflo, and J. Torin, Two Software Techniques for On-Line Error Detection , Proc. of the 22th International Symposium on Fault-Tolerant Computing (FTCS-22), July 1992, pp. 328-335. Miremadi G. and J. Torin, Evaluation Processor-Behavior Three Error-Detection Mechanisms Using Physical Fault-Injection, IEEE Trans. On Reliability, Vol. 44, No. 3, Sept. 1995, pp. 441-453. Rabejac C., J.-P. Blanquart, J.-P. Queille, Lab. for Dependability Eng., CNRS, Toulouse, France, Executable assertions and timed traces for on-line software error detection, Proc. of the 26th International Symposium on Fault-Tolerant Computing (FTCS-26), 1996. Alkhalifa Z., V. S. S. Nair, N. Krishnamurthy and J. A. Abraham, Design and Evaluation of Systemlevel Checks for On-line Control Flow Error Detection, IEEE Trans. on Parallel and Distributed Systems, Vol. 10, No. 6, Jun. 1999, pp. 627-641. M. Fazeli, R. Farivar, S. G. Miremadi, "A Software-Based Concurrent Error Detection Technique for PowerPC Processor-based Embedded systems", Proc. Of 20th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems (DFT), Monterey, California, 2005. Software Detection Mechanisms Providing Full Coverage Against Single Bit-Flip Faults B. Nicolescu, Y. Savaria, Senior Member, IEEE, and R. Velazco, Member, IEEE Soft-error Detection through Software Fault-Tolerance techniques Maurizio REBAUDENGO, Matteo SONZA REORDA, Marco TORCHIANO, Massimo VIOLANTE

More Related