240 likes | 401 Views
Cost-Efficient Soft Error Protection for Embedded Microprocessors. Jason Blome 1 , Shuguang Feng 1 , Shantanu Gupta 1 , Scott Mahlke 1 , Daryl Bradley 2 University of Michigan 1 ARM, Ltd. 2. CLK. 0. Q. D. transient fault. soft error. The Soft Error Problem. 1. Register File.
E N D
Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome1, Shuguang Feng1, Shantanu Gupta1, Scott Mahlke1, Daryl Bradley2 University of Michigan1 ARM, Ltd. 2 1
CLK 0 Q D transient fault soft error The Soft Error Problem 1 2
Register File mov r2, 4 0 - mov r5, 8 1 - 0 2 - add r6, r2, r5 decoder 3 - 0 4 - CLK 5 - … tsetup thold Fault Masking • Logical: faulted value does not affect logical operation of the circuit • Architectural/Software: incorrect state is written before it is read • Latching-Window: the fault pulse does not reach a state element within the latching window • Electrical: the fault pulse is electrically attenuated by subsequent gates in the circuit mov r2, 4 mov r5, 8 4 add r6, r2, r5 8 9 3
Soft Error Rate Trends Soft Error Rate Contributions Mitra 2005 Shivakumar 2002 Increasing contribution of faults in combinational logic to the overall soft error rate 4
Outline • Soft error analysis setup • Summary of fault analysis results • Fault tolerance techniques • Register value cache • Strategic deployment of fault detectors • Conclusion 5
testbench reference design test design benchmark ARM926EJ-S Instruction Fetch Instruction Decode Data cache Data Interface error checking and logging fault injection scheduler MMU Instruction Address Logic Register Bank Mux Array Instruction cache ALU Shift fault injection/error analysis framework MMU Write Buffer/ Bus Interface Multiply Bus Interface Data Address Logic report generation Fault Analysis Framework 6
94% 7% 16% 4% Observed Error Rates Faults Occurring in Registers Faults Occurring in Combinational Logic At the software interface, error rates within 3% 7
Targeting the Faults that Count • ARM926EJ-S register file consumes 8.7% of total core area • Responsible for 57.4% of architectural errors • Register file area dominated by combinational logic • ECC cost, efficacy? 9
The Register Value Cache Register File 0 1 Read/Write Addr/Data 2 decoder 3 Read Result 4 5 … Register Value Cache 0 CMP 1 x 2 Stall/ Check CRC 3 CMP 4 5 x CMP … 10
The Register Value Cache Index Array Valid Value Array Read Data Read/Write Addr Previous Read Values Write Data CRC CMP Error Write Data CRC Read Operation Check Operation Write Operation Error 11
4 crc 8 crc Check CRC Example Register File 0 - mov r2, 4 mov r2, 4 1 - 4 2 - 4 decoder 3 - mov r5, 8 mov r5, 8 0 4 - 8 5 - add r3, r1, r4 add r3, r2, r5 … Register Cache 0 - - 1 - - 4 x 2 - - 3 - - 8 4 5 x … 12
RVC Fault Coverage 57.4% 13
RVC Overhead 14
What About the Rest? • Leverage fault fanout to place detectors at likely targets 15
Fault Fanout 16
Transient Fault Detector D Main Flip-Flop Main Flip-Flop Q CLK Shadow Latch Shadow Latch Error Delay A Self-Tuning DVS Processor Using Delay-Error Detection and Correction: S. Das 2006 17
Glitch Detector Coverage Power Area Coverage Coverage Percent Overhead Percent Overhead 18
Combined Technique Coverage Power Area Coverage Coverage Percent Overhead Percent Overhead 19
Conclusion • Circuit level soft error analysis offers significant insight • Faults in combinational logic do not require structural duplication • Coverage versus cost tradeoffs available • Significant benefits in compromise • 85% fault coverage for only 5.5% area • 2-3x increase in MTTF 20
Questions? 21