230 likes | 416 Views
Paper by F.L. Kastensmidt , G. Neuberger, L. Carro , R. Reis Talk by Nick Boyd. Designing and Testing Fault-Tolerant Techniques for SRAM-based FPGAs. Overview of the paper. Exploring techniques for detecting and dealing with radiation-induced faults in FPGAs Why?
E N D
Paper by F.L. Kastensmidt, G. Neuberger, L. Carro, R. Reis Talk by Nick Boyd Designing and Testing Fault-Tolerant Techniques for SRAM-based FPGAs
Overview of the paper • Exploring techniques for detecting and dealing with radiation-induced faults in FPGAs • Why? • Drive to use commercial off-the-shelf to minimize cost and development time (for space apps) • As technology gets smaller, radiation becomes an issue even at ground level
Background - Radiation and Transistors • Incident radiation deposits energy • creation of electron-hole pairs and secondary ionizations produce transient current pulse • Can change a ‘0’ to a ‘1’ or vice versa, often called “bit flip” • In combinational logic: Single Event Transient (SET) • In sequential (or memory): Single Event Upset (SEU)
Background – Transistor Faults in FPGAs • In FPGAs there are further considerations • SEU in the configuration SRAM (logic, routing) • SET in combinational FPGA fabric • SEU in BlockRAM
Background – Transistor Faults in FPGAs • Effects of SEU in configuration fabric
Techniques – TMR • TMR = Triple Modular Redundancy • Logic is triplicated and results are accepted by majority vote • Everything is tripled; including combinational, sequential, routing and i/o
Techniques - TMR • Benefits • Able to detect and correct SEU\SET anywhere in the FPGA • No performance penalty • Drawbacks • Very large area/resource penalty (particularly problematic for i/o pads)
Techniques – DMR-CED • A new technique proposed by the authors of this paper • DMR-CED: Double Modular Redundancy with Concurrent Error Detection • Motivation: Want to find a way that is as reliable as TMR in detecting/correcting errors with less area overhead
Background: CED • CED = Concurrent Error Detection • Exploits some property of the logic block to find error • Time-redundant examples: • bit-wise inversion • re-computing with shifted operands (RESO) • re-computing with swapped operands (REWSO)
Background: CED • Result calculated from direct input and stored • Input then encoded, new result calculated and decoded • Two outputs compared – should be equal
Back to DMR-CED • How can we use CED? • Only duplicate combinational logic • Use CED to determine the faulty module only if there is disagreement
Evaluating DMR-CED Effectiveness:Methodology • Three sample sequential circuits tested • 8-bit multiplier • 8-bit ALU • FIR filter • Sample circuits generated then each node was replaced with a multiplexor which either passes ‘correct’, ‘0’, or ‘1’ • Able to simulate every possible SEU fault
Evaluating DMR-CED • Benefits • Reduces area required for combinational logic (by a significant amount in some cases) • Drawbacks • Significantly more complicated due to CED • CED circuit needs to be chosen to be optimized for each combinational circuit you protect • Speed reduced by as much as 50%
Comments on the original paper • Reasonably well written and complete • Necessary to read the references to understand the minutiae of underlying principles • DMR-CED probably only useful under very specific conditions