1 / 27

Evaluation of Error Detection Strategies for an FPGA-Based Self-Checking Arithmetic and Logic Unit

Evaluation of Error Detection Strategies for an FPGA-Based Self-Checking Arithmetic and Logic Unit. Varadarajan Srinivasan, Julian W. Farquharson, William H. Robinson, and Bharat L. Bhuva Department of Electrical Engineering and Computer Science Vanderbilt University Nashville, TN.

kiele
Download Presentation

Evaluation of Error Detection Strategies for an FPGA-Based Self-Checking Arithmetic and Logic Unit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation of Error Detection Strategies for an FPGA-Based Self-Checking Arithmetic and Logic Unit Varadarajan Srinivasan, Julian W. Farquharson, William H. Robinson, and Bharat L. Bhuva Department of Electrical Engineering and Computer Science Vanderbilt University Nashville, TN

  2. Soft Errors in FPGAs Error Location • Data/Logic Errors Corrupt the data processed by the circuit • Configuration Bit Errors Can alter the functionality of the circuit Error Classification [1] • Persistent Errors Cannot be flushed out of the system by an SEU correction scheme Example: Counters, Flip-Flops • Non-Persistent Errors Can be corrected by scrubbing or partial reconfiguration Example: Adder, Feed-Forward Logic Data Errors Configuration Bit Errors State Machine Persistent Errors Throughput Logic Non-Persistent Errors Error Space of a FPGA Unit [1] D. E. Johnson et al., “ Persistent Errors in SRAM based FPGAs,” MAPLD 2004, Washington DC MAPLD 2005/216

  3. Error Correction Techniques • Error Correction for SEUs in Configuration Bits • Scrubbing • Partial Reconfiguration • Disadvantages of Error Correction Scheme: • Frequent scrubbing is required to ensure proper functionality • Configuration logic will be in write mode for a greater percentage of time • Covers bit stream errors only which account for 45% of the total errors observed in an accelerator test [1] • Error Correction for SEUs in Data Bits • Triple Mode Redundancy in the base module • Disadvantages of Error Correction Scheme: • Huge area and power penalties in replicating the circuit [1] Earl Fuller et al., “Radiation Testing Update, SEU Mitigation, and Availability Analysis of the Virtex FPGA for Space Reconfigurable Computing,” MAPLD 2000, Washington DC MAPLD 2005/216

  4. Error Correction in FPGA Values normalized to TMR processor TMR - TRIPLE MODE REDUNDANCY BC - BERGER CHECK ED - REMAINDER/PARITY CHECK • In accelerator test reported previously (Fuller 2000 MAPLD) only 45% of the errors were due to configuration bit stream errors • Built-In-Error-Detection can detect majority of errors (including Single event transients) in FPGA at a reduced area, performance penalty • Error can be removed by instruction Re-execution/ Bit stream Re-Configuration MAPLD 2005/216

  5. Built-In-Error-Detection (BIED) Scheme for SEUs in Data and Configuration Bits Compare Check Symbols Check Symbol From Inputs Logic Block Check Symbol From Output • Systematic Codes can be used to design a BIED Logic to detect errors in both configuration bits and data • Errors in data can be eliminated by re-executing the instruction and the corresponding data • Errors in the configuration bits can be corrected at the instant of error detection rather than frequent scrubbing Re-Generate/ Re-Configure Signal A B MAPLD 2005/216

  6. Error Correction Codes Three techniques are studied • Global Coding technique for all the operations • Berger check prediction [1][2] • Error Codes based on instruction groupings • Remainder check and parity check • Triple Mode Redundancy A single-instruction-issue processor has been designed and implemented with each mentioned Error Detection Techniques Errors Corrected by re-executing Instruction and data [1] J. C. Lo et al., “An SFS Berger check prediction ALU and its application to self checking processor designs,” IEEE Trans. on Computer Aided-Design, vol. 11, pp. 525-540, Apr. 1992. [2] J. C. Lo et al., “Concurrent Error Detection in Arithmetic and Logical Operations Using Berger Codes,” Proceedings of 9th Symposium on computer arithmetic , pp. 233 -240, Sep 1989. MAPLD 2005/216

  7. Target Device – Altera’s FLEX chip EPF10k70RC240 • EPF10k70RC240 Device • Features • Typical Gates (Logic & RAM) - 70000 • Logic Elements - 3744 • Supply Voltage – 5.0 Volts • 0.42 μm CMOS process • Simulation Software • Quartus II FLEX 10k Device Block Diagram MAPLD 2005/216

  8. Fault Injection • Implemented the three ALU designs (BCALU, EDALU and TMR ALU) • Realized the ALU designs in the target chip • Bit-flip model used to inject fault while running an assembly language program • Error correction achieved by re-transmission of instruction and data MAPLD 2005/216

  9. Berger Check PredictionDesign of BCALU

  10. Background Information – Berger Codes • Berger codes are systematic separate codes[1] • Information symbol and check symbol separated from each other • Berger codes are capable of detecting multiple-bit unidirectional errors • Two possible Berger encoding schemes • Check symbol is calculated from the binary representation of the number of 0’s in the information symbol • Check symbol is the 1’s complement of the number of 1’s in information symbol • Length of Berger check symbol is minimal among all systematic codes [2] [1] C. Metra et. al, “Novel Berger code checker,” IEEE Proceedings on Defect and Fault tolerance in VLSI systems, Nov. 1995, pp. 287-295 [2] C. V. Frieman, “Optimal error detection codes for completely asymmetric binary channels,” Inform. Contr., vol. 5, pp. 64 - 71, March 1962. MAPLD 2005/216

  11. Example: Berger Check Prediction for ADD Instruction ALU Operation (ADD) Berger Check Symbol Calculation C 1 0 1 1 1 1 Cc = 1 1 0 1 0 1 1 X Xc = 2 1 0 1 1 0 1 Yc = 2 Y + Cin = 0 S 01 1 0 0 0 Cout = 1 Sc = Xc + Yc –Cc –Cin + Cout = 2 + 2 – 1 – 0 + 1 = 4 Xc, Yc = No. of 0’s in data X and Y ,Cin = Carry in, Cout = Carry out Cc= No. of 0’s in internal carries, Sc= Berger Check Symbol S = ALU output MAPLD 2005/216

  12. Example: Berger Check Prediction for AND Instruction ALU Operation (AND) Berger Check Symbol Calculation 1 0 1 0 1 1 X Xc = 2 1 0 1 1 0 1 Yc = 2 Y Cin = 0 S 10 1 0 0 1 X or Y = 101111, (X or Y)c = 1 Sc = Xc + Yc – (X or Y)c = 2 + 2 – 1 = 3 Xc, Yc = No. of 0’s in data X and Y ,Cin = Carry in, Cout = Carry out Cc= No. of 0’s in internal carries, Sc= Berger Check Symbol S = ALU output MAPLD 2005/216

  13. Berger Check ALU Process Flow Cout Cin Xc Berger Check Calculator 0’s Counter Sc from inputs and carries Yc State Machine Memory Cc InternalCarries Clock Sc from ALU output X 16 Cout ALU Comparator 0’s Counter Reset Y 16 32 S 8 Opcode PC Cin DFF Error_flag_out Re-Generate/Re-Configure Signal BCALU MAPLD 2005/216

  14. Remainder Check and Parity CheckDesign of EDALU

  15. Instruction Groupings • Arithmetic • Addition (ADD) • Subtraction (SUB) • Multiplication (MUL) • Shift • Shift Left Logical (SLL) • Shift Right Logical (SRL) • Logical • Bitwise AND • Bitwise OR • Bitwise XOR • Rotate • Rotate Left (ROL) • Rotate Right (ROR) MAPLD 2005/216

  16. Remainder Check for Arithmetic Instructions Remainder Check at the ALU Inputs Generate Remainder RX Data X Calculate Remainder Check from two remainders RC = f(RX, RY) Generate Remainder RY Data Y Remainder Check at the ALU Output Generate Remainder Check from ALU Output Comparator Re-Generate/ Re-Configure ALU output MAPLD 2005/216

  17. Example: Remainder Check for ADD Instruction ALU Operation (ADD) Remainder Check Symbol Calculation 1 E F 216 X 1 + E + F + 2 mod 15 = 2 0 3 2 416 0 + 3 + 2 + 4 mod 15 = + 9 Y + B S 2 2 1 616 2 + 2 + 1 + 6 mod 15 = B • Single bit errors would change the ALU output which will result in a different remainder check at the output • Remainder check can detect single bit errors MAPLD 2005/216

  18. Parity Check for Logical and Shift-Rotate Instructions Data X Parity of the output determined using inputs Data Y Comparator Re-Generate/ Re-Configure Parity of the ALU output ALU output • Logical instruction group • Parity bit chosen to make the total parity even • Shift instruction group • Truncated bits would not affect parity • Rotate instruction group • Parity does not change through rotate operation MAPLD 2005/216

  19. Error Detection ALU with Error Correction by Instruction Reissue DFF Comparator Check-Sum/ Parity Calculator State Machine State Machine PC PC Decoder ALU Memory Memory DFF Re-Generate/Re-Configure Signal Error_flag Comparator Enable Remainder / Parity Check from inputs 16 Remainder / Parity Check from ALU output ALU Reset 16 Clock 8 32 Opcode Data ALU output EDALU • State machine monitors the error signal • State machine checks for error signal in the instruction fetch stage • If an error is detected, PC is not updated and the same instruction is fetched again MAPLD 2005/216

  20. Error Correction By Re-Transmit Mismatch of Encoder and Decoder Output due to SEU sets the regenerate flag high Subtract Instruction Re-Executed on detection of error MAPLD 2005/216

  21. Results and Discussion

  22. FPGA Resource Utilization for ALU TMR - TRIPLE MODE REDUNDANCY BC - BERGER CHECK ED - REMAINDER/PARITY CHECK • EDALU requires 46% of the area of an ALU with TMR • BCALU requires 58% of the area of a TMR ALU • BCALU can detect multiple feed-forward errors, whereas EDALU can detect • only single bit feed-forward errors MAPLD 2005/216

  23. Processor Implementation TMR - TRIPLE MODE REDUNDANCY BC - BERGER CHECK ED - REMAINDER/PARITY CHECK • Single Instruction Issue BCALU processor runs at 85% of the clock frequency of TMR ALU Processor • EDALU processor operates at 60% of the clock frequency of TMR ALU processor • EDALU performance is affected by the sequential decoding element MAPLD 2005/216

  24. Area and Delay Results Values normalized to TMR processor TMR Processor ` ED Processor BC Processor TMR - TRIPLE MODE REDUNDANCY BC - BERGER CHECK ED - REMAINDER/PARITY CHECK • Area-delay product of a Berger Check Processor is 71% of the TMR • processor • Berger Check Processor achieves the error correction at reduced area penalty • Large data size Berger Check ALU would not require significant increase in error detection logic MAPLD 2005/216

  25. Discussion • Re-issuing instruction and data to recover from SEU would affect processor performance • Performance penalty for re-generate is dependent on the bit error rate • Errors in configuration bits would affect the functionality of the circuit which would set the re-generate/re-configure flag • Scrubbing can be done on demand to correct the SEUs in configuration bits MAPLD 2005/216

  26. Summary • Triple Mode Redundancy and Scrubbing to correct errors in FPGAs involve huge penalties • Built-In-Error-Detection can be used to detect errors in data as well as configuration bits • Berger Check error detection minimizes penalties and scales better for higher data widths • Area Delay Product of Berger Check Single Instruction Issue processor is 70% of the Area Delay Product of a TMR processor MAPLD 2005/216

  27. Future Work • Running Benchmark applications to estimate the penalty in re-generating the instruction/data • ALU Design with combination of Berger Check and Remainder-Parity Check to optimize area and performance penalties MAPLD 2005/216

More Related