1 / 63

Dealing with Multiple Simultaneous Faults in Future Technologies

Dealing with Multiple Simultaneous Faults in Future Technologies. Doutorando: Carlos Arthur Lang Lisbôa Orientador: Luigi Carro. Why Multiple Simultaneous Faults ?. Future technologies (2010 and beyond) very small transistors and fewer electrons to form the channel (  SETs)

malaya
Download Presentation

Dealing with Multiple Simultaneous Faults in Future Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dealing withMultiple Simultaneous Faultsin Future Technologies Doutorando: Carlos Arthur Lang Lisbôa Orientador: Luigi Carro

  2. Why Multiple Simultaneous Faults ? • Future technologies (2010 and beyond) • very small transistors and fewer electrons to form the channel ( SETs) • transient pulses due to radiation attack will last longer than the propagation delays of gates and cycle times • devices will be more sensitive to the effects of electromagnetic noise, neutrons and alpha particles

  3. Single Event Upset Origin 1 0 1 0 0 0 0 1 0 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0

  4. Why Should One Study Multiple Faults ? Changes in paradigm: • Gates will behave statistically, producing correct outputs only a fraction of the time • Faster devices  cycle times shorter than duration of transient pulses

  5. How to Deal with Multiple Faults ? • New paradigm: multiple simultaneous faults • new fault tolerance techniques will be required (TMR will no longer provide enough protection)

  6. How to Deal with Multiple Faults ? • New paradigm: multiple simultaneous faults • new fault tolerance techniques will be required (TMR will no longer provide enough protection) • How to deal with this problem ? • new materials and manufacturing technologies must be developed OR • new design approaches must be taken

  7. How to Deal with Multiple Faults ? • New paradigm: multiple simultaneous faults • new fault tolerance techniques will be required (TMR will no longer provide enough protection) • How to deal with this problem ? • new design approaches must be taken (our bet !)

  8. Research Evolution - Overview SRC 2005 TechCon DATE 06 PhD Forum DFT 04 WDES 04 DFT 06 Research Report Majority Logic Research Report Bit Stream Operators Online Hardening Stochastic Operators TMR and Analog Voter Statistical Computation Low cost redundancy IOLTS 04 VTS 07 (submitted) ETS 05 SBCCI 05 MemProc LATW 06 ETS 06 2004 2005 2006 2007

  9. Published Papers • Lisbôa, C. and Carro, L., “Arithmetic Operators Robust to Multiple Simultaneous Upsets”, 10th IEEE International Online Test Symposium - IOLTS 2004, IEEE Computer Society, Funchal, Madeira Island, Portugal, July 2004. • Lisbôa, C. and Carro, L., “Highly Reliable Arithmetic Multipliers for Future Technologies”, in Proceedings of the International Workshop on Dependable Embedded Systems - WDES 2004 - in conjunction with the 23rd International Symposium on Reliable Distributed Systems - SRDS 2004, pp. 13-18. Edited by Becker, L. B. and Kaiser, J., Florianópolis, October 17, 2004. • Lisbôa, C. and Carro, L., “Arithmetic Operators Robust to Multiple Simultaneous Upsets”, in Proceedings of the 19th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems - DFT 2004, pp. 289-297, ISBN0-7695-2241-6. IEEE Computer Society, New York, October 2004.

  10. Published Papers • Lisbôa, C. A. L., Carro, L. and Cota, E., “RobOps - Arithmetic Operators for Future Technologies”, 10th European Test Symposium - ETS 2005, Tallin, Estonia, May 2005. • Lisbôa, C. A. L., Schüler, E. and Carro, L., “Going Beyond TMR for Protection Against Multiple Faults”, in Proceedings of the 18th Symposium on Integrated Circuits and Systems Design - SBCCI 2005, September 2005. • Rhod, E.; Lisbôa, C. A. L. and Carro, L., “Using Memory to Cope with Simultaneous Transient Faults”, in Proceedings of the 7th Latin-American Test Workshop - LATW 2006, pp. 151-156, IEEE Computer Society, New York, March 2006.

  11. Published Papers • Rhod, E.; Lisbôa, C. A. L.; Michels, Á. and Carro, L., “Fault Tolerance Against Multiple SEUs using Memory-Based Circuits to Improve the Architectural Vulnerability Factor”, in Informal Digest of Papers of the 11th IEEE European Test Symposium - ETS 2006, pp. 229-234, IEEE Computer Society, New York, May 2006. • Michels, Á., Petroli, L., Lisbôa, C. A. L., Kastensmidt, F. and Carro, L. “SET Fault Tolerant Combinational Circuits Based on Majority Logic”, in Proceedings of the 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems - DFT 2006, pp. 345-352, IEEE Computer Society, Los Alamitos, CA, October 2006. • Lisbôa, C. A. L., Carro, L., Sonza Reorda, M., and Violante, M. “Online Hardening of Programs against SEUs and SETs”, in Proceedings of the 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems - DFT 2006, pp. 280-288, IEEE Computer Society, Los Alamitos, CA, October 2006.

  12. Research Approaches - 2004 / 2005 • Use of stochastic operators • Use of bit stream operators • Ensuring voter reliability to use n-MR while dealing with multiple simultaneous faults

  13. Research Evolution - 2004 / 2005 IOLTS 2004 Stochastic Operators

  14. Research Evolution - 2004 / 2005 IOLTS 2004 Stochastic Operators OK for some DSP Applications

  15. Research Evolution - 2004 / 2005 DFT 2004 WDES 2004 Bit Stream Operators Looking for more speed Stochastic Operators

  16. Research Evolution - 2004 / 2005 DFT 2004 WDES 2004 Bit Stream Operators Small footprint and fast Looking for more speed Stochastic Operators

  17. Research Evolution - 2004 / 2005 Bit Stream Operators Looking for more speed Looking for tolerant converter Stochastic Operators Analog Voter ETS 2005 SBCCI 2005

  18. Research Evolution - 2004 / 2005 Bit Stream Operators Tolerant to multiple faults in n-MR solutions Looking for more speed Looking for tolerant converter Stochastic Operators TMR and Analog Voter ETS 2005 SBCCI 2005

  19. Research Evolution - 2004 / 2005 SRC 2005 TechCon Bit Stream Operators Research Report Looking for more speed Looking for tolerant converter Stochastic Operators TMR and Analog Voter

  20. Research approach - 2006 / 2007 • cooperation with peers • use of memory for computation • analog voter + majority logic • use of an I-IP to harden instructions

  21. Research approach - 2006 / 2007 • cooperation with peers • use of memory for computation • analog voter + majority logic • use of an I-IP to harden instructions • low cost redundancy using statistical parallel computation

  22. Research Evolution - 2006 / 2007 DATE 06 PhD Forum Research Report

  23. Research Evolution - 2006 / 2007 DATE 06 PhD Forum Research Report MemProc LATW 06 ETS 06

  24. Research Evolution - 2006 / 2007 DATE 06 PhD Forum Research Report MemProc Majority Logic LATW 06 ETS 06 DFT 06

  25. Research Evolution - 2006 / 2007 DATE 06 PhD Forum Research Report Low cost redundancy MemProc Majority Logic LATW 06 ETS 06 DFT 06

  26. Research Evolution - 2006 / 2007 DATE 06 PhD Forum DFT 06 Online Hardening Research Report Low cost redundancy MemProc Majority Logic LATW 06 ETS 06 DFT 06

  27. Research Evolution - 2006 / 2007 DATE 06 PhD Forum DFT 06 Online Hardening Research Report Statistical Computation Low cost redundancy MemProc Majority Logic VTS 07 (submitted) LATW 06 ETS 06 DFT 06

  28. Current research - motivation • future technologies faster devices transient pulse duration scaling not proportional to speed scaling  transient pulses will last longer than one cycle

  29. Current research - motivation • future technologies faster devices transient pulse duration scaling not proportional to speed scaling  transient pulses will last longer than one cycle • techniques relying on time redundancy will fail

  30. Current research - motivation • alternative approach: space redundancy  current solutions: area overhead  100%  small granularity does not provide low overhead (what can one do with 50% of a MOSFET ?)

  31. Current research - motivation • proposed solution:  fingerprinting  parallel processing on subset of possible inputs  small transient fault probability (desired: 0%) • alternative approach: space redundancy  current solutions: area overhead  100%  small granularity does not provide low overhead (what can one do with 50% of a MOSFET ?)

  32. main circuit inputs output random checker error Current research - focus • use of low cost redundancy and statistical computation to cope with transient faults

  33. Sample application • Freivalds: matrix multiplication correctness • given matrices A and B, n x n • given one algorithm that calculates C = A x B • goal: check if the algorithm performs correctly by executing thousands of multiplications and comparing the results • naive solution: calculate again and compare O(n3)

  34. Sample application • Freivalds technique 1. generate a random vector r, with values from {0,1} 2. compute vector Cr = C  r O(n2) 3. compute vector ABr = A  (B x r) O(n2) 4. if C  A  B, then Pr[Abr = Cr]  1/2 After k independent repetitions of steps 1, 2 and 3: Pr[Abr = Cr]  1/2k

  35. Sample application • Our extension of Freivalds technique 1. generate a random vector r, with values from {0,1} 2. generate a vector rc with rci = not(ri) for i = 1:n 3. compute Cr = C  r and Crc = C  rc 4. compute ABr = A  (B x r) and ABrc = A  (B x rc) 5. if ABr  Cr OR ABrc  Crc, then Pr[Abr  Cr] = 1

  36. C  A * B inputs (A, B) output (C) Cr  C * r ABr  A*(B*r) error Sample Implementation • matrix multiplier with checker • application of Freivalds technique

  37. Sample Implementation Area overhead (# of gates)

  38. Sample implementation Time overhead (# of instructions)

  39. Sample implementation Fault injection results

  40. PhD program requiremnets • 36 credits  • qualifying examination  • 2 foreign languages proficiency exam  • academic week seminar  • Thesis proposal  February 2007 • Thesis presentation  December 2007

  41. Questions ?    

  42. % Errors in 1,000 additions Stochastic Adder Conventional 2 faults 0 faults 4 faults 8 faults 0.0000 0.1412 0.2580 0.1768 0.2196 Using Stochastic Operators • SEU induced transient errors are of random nature • Stochastic operators rely on randomness to produce approximate results • The injection of random faults in the input signals processed by stochastic operators did not impact the precision of the results

  43. Using Stochastic Operators • SEU induced transient errors are of random nature • Stochastic operators rely on randomness to produce approximate results • The injection of random faults in the input signals processed by stochastic operators did not impact the precision of the results • Several application areas (DSP) can deal with approximate values and still produce acceptable results (outputs)

  44. 01100010101 S1 010111011001 Sum S3 0010100110101 S2 01010101101 Stochastic Adder Circuit 1001000100001011 1000000100001010 1000100110011010 Stochastic multiplier circuit Using Stochastic Operators • Benefit: reduced area of the operators

  45. F1 F1 F1 2 1 0 x F2 F2 F2 2 1 0 . . . F2 F1 F2 F1 F2 F1 0 2 0 1 0 0 . . . F2 F1 F2 F1 F2 F1 1 2 1 1 1 0 . . . F2 F1 F2 F1 F2 F1 2 2 2 1 2 0 b48 .. b33 b32 .. b17 b16 .. b5 b4 .. b1 b0 Proposed Multiplication Algorithm - bit stream product (the count of 1’s in the stream is equal to the product value) Using Bit Stream Operators • Computation principles similar to those of the stochastic adder and multiplier • Operators can produce bit streams which represent the exact results of the operation

  46. b48 .. b48 b47 .. b47 ... b0 .. b0 1 1 1 1 0 0 0 8 times 8 times 8 times +4 total count of 1’s = 8 * product + 4 Using Bit Stream Operators • Computation principles similar to those of the stochastic adder and multiplier • Operators can produce bit streams which represent the exact results of the operation • Redundancy is added to the bit streams in order to stand to multiple bit flips Adding robustness to the bit stream through redundancy

  47. Using Bit Stream Operators • Computation principles similar to those of the stochastic adder and multiplier • Operators can produce bit streams which represent the exact results of the operation • Redundancy is added to the bit streams in order to stand to multiple bit flips • Conversion of bit streams to binary coded values is delayed as much as possible, and conversion circuits must use TMR or n-MR for protection against faults

  48. Using Bit Stream Operators • Computation principles similar to those of the stochastic adder and multiplier • Operators can produce bit streams which represent the exact results of the operation • Redundancy is added to the bit streams in order to stand to multiple bit flips • Conversion of bit streams to binary coded values is delayed as much as possible, and conversion circuits must use TMR or n-MR for protection against faults • Issues to be further investigated: size of bit streams and area of the conversion circuits

  49. What is Wrong with TMR ? • TMR protects only against single faults in one of the modules V O T E R Module 1 correct output Module 2 correct output correct output Module 3 correct output

  50. Module 2 wrong output What is Wrong with TMR ? • TMR protects only against single faults in one of the modules V O T E R Module 1 correct output correct output Module 3 correct output

More Related