1 / 16

Error Detection in Hardware

Error Detection in Hardware. VO Hardware-Software-Codesign Philipp Jahn. Error detection. How to detect errors with hardware methods during system operation Conditions Coverage (probability that error is detected) Latency (time between start of error and detection) Performance.

artiem
Download Presentation

Error Detection in Hardware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn

  2. Error detection • How to detect errors with hardware methods during system operation • Conditions • Coverage (probability that error is detected) • Latency (time between start of error and detection) • Performance Slide from VO „Echtzeitsysteme“, H. Kopetz Error Detection in Hardware

  3. Hardware-based error detection • Hardware redundancy • Passive (TMR, majority voting) • Active (duplication and comparison, standby) • Hybrid • Information redundancy • Parity • Checksums • Arithmetic Codes • Time redundancy • Watchdog timers • Checking • Capability Checking • Consistency Checking • Control-Flow Checking Error Detection in Hardware

  4. Information redundancy (1) • Detection / Correction • Hamming distance • X = (1001), Y = (0111) • d(X,Y) = 3 • SEC – DED Error Detection in Hardware

  5. Information redundancy (2) • Parity • One extra bit (even / odd) • Decoding circuit (set of XOR gates) • Routine checking in busses, memory and registers • Detecting singlebit errors(no stuck-at faults) Error Detection in Hardware

  6. Information redundancy (3) • Overlapping parity • m of n codes • Duplication codes • Cycle redundancy checks • Sender and receiver agree upon generator polynom G(x) • Append checksum (k bit) at end of data frame (n-k bit) • Checksum / G(x) = 0  correct • Simple implementation (linear feedback shift register and XOR gates) • Detect single-bit errors, multiple adjacent bit errors affecting fewer than n-k bits, and burst transient errors • High successful in serial transmission (communication channels: Ethernet, Token Ring) Error Detection in Hardware

  7. Information redundancy (4) • Checksums Error Detection in Hardware

  8. Information redundancy (5) • Arithmetic Codes • Detect errors in arithmetic units (parity would not be preserved) • Separate or nonseparate • Examples • AN codes • Residue codes Error Detection in Hardware

  9. Time redundancy (1) • Repetition of computations two or more times and then comparing (detection or correction by majority) • Error detected  maybe retry • Good for detecting transient faults • Not protecting against errors resulting from permanent faults • No extra hardware needed but longer processing time • Non-time-critical applications • Alternate Logic also detects permanent faults (self-checking circuits f(x) = f ‘(x’)) Error Detection in Hardware

  10. Time redundancy (2) • Handle permanent faults per encoding the second computation (must not alter calculation) e.g. k-shift • Error in k-1 consecutive bit of arithmetic or logical operation detected • Additional hardware (two shifters, storage register, comparator) Error Detection in Hardware

  11. Watchdog timers • Implemented in hardware (external timer) or software (process) • If timer expires  system reset or recover • Detect only very specific type = control-flow error • If error occurs but timer reset  no detection • Difficult to determine runtime • High detection latency Error Detection in Hardware

  12. Capability & Consistency Checking • Capability checking limits access to objects (e.g. memory segments) to authorized users (processes) • Implemented in hardware (error traps) or software (firewall) • e.g. checking of address validity by MMU • Consistency checking determines if states or results are reasonable • e.g. range checking, address checking, opcode checking Error Detection in Hardware

  13. Control-Flow Checking (1) • Hardware scheme • Divide application program into blocks • Each block has a single entry and exit point • Reference signature represents an encoding of the correct execution • Watchdog processor validates the application program by comparing the runtime with the signature • 70% of transient faults lead to control flow errors • Limitations • Only suitable for processors running single programs (multiple processes or threads) • Reduced coverage if transmission errors on the bus to the watchdog processor occurs Error Detection in Hardware

  14. Control-Flow Checking (2) • Signatured Instruction Stream (SIS) • Hardware: Watchdog processor with cyclic code signature generator • Software: Modified assembler and loader • Control Flow Checking using Shadow Processing Error Detection in Hardware

  15. Summary • Hardware low error latency • Hardware is more expensive • e.g. Massively parallel multiprocessors • Combining error detection mechanism Error Detection in Hardware

  16. References • Ravishankar K. Iyer, Zbigniew Kalbarczyk - Hardware and Software Error Detection - Center for Reliable and High-Performance Computing, University of Illinois at Urbana-Champaign • Real-Time Systems, Design Principles for Distributed Embedded Applications Kopetz, Hermann, 1997, 356 p., Hardcover, ISBN: 978-0-7923-9894-3 • Alireza Vahdatpour, Mahdi Fazeli, Seyed Ghassem Miremadi - Transient Error Detection in Embedded Sysetms Using Reconfigurable Components - IES, October 2006 • M. Dal Chin, W. Hohl, E. Michel, A. Pataricza - Error Detection Mechansims for Massively Parallel Multiprocessors - IEEE Proceedings, 1993 • Evaluation of error detection coverage and fault-tolerance of digital plant protection system in nuclear power plants • http://robotics.ee.uwa.edu.au/courses/faulttolerant/notes/FT2b.pdf • A. Steiniger, C. Scherrer - Identifying Efficient Combinations of Error Detection Mechanisms Based on Results of Fault Injection Experiments - IEEE Transactions on computers, Vol. 51, No. 2, February 2002 Error Detection in Hardware

More Related