1 / 45

Challenges of Future: Robust Design against Soft Errors with Low Power

Challenges of Future: Robust Design against Soft Errors with Low Power. Prof. Dhiraj Pradhan Department of Computer Science University of Bristol, UK http://www.cs.bris.ac.uk/~pradhan/. Overview. Background Introduction Soft Error Problem Memory Solution Logic Solution

doane
Download Presentation

Challenges of Future: Robust Design against Soft Errors with Low Power

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenges of Future: Robust Design against Soft Errors with Low Power Prof. Dhiraj Pradhan Department of Computer Science University of Bristol, UK http://www.cs.bris.ac.uk/~pradhan/

  2. Overview • Background • Introduction • Soft Error Problem • Memory Solution • Logic Solution • Architecture Level • Conclusion

  3. Our Recent Related Publications • A Defect Tolerance Scheme for Nanotechnology Circuits. IEEE Trans. On Circuits and Sytems I, . December 2007 . • A Graph-Based Unified Technique for Computing and Representing Coefficients over Finite Fields . IEEE Transactions on Computers, 56(8)., pp. 1119–1132. August 2007. • Reliable Network-on-Chip Based on Generalized de Bruijn Graph. DATE 2008. • An Efficient Technique for Synthesis and Optimization of Polynomials in GF(2m), ICCAD 2006. • “Efficient Method to Tolerate Multiple Bit Upsets in SRAM Memory”. ETS 2007. • LPRAM: a novel low-power high-performance RAM design with testability and scalability, IEEE TCAD. May 2004 • Simultaneous scheduling and binding for low gate leakage nano-complementary metal-oxide-semiconductor data path circuit behavioural synthesis. IET Computers & Digital Techniques, 2008. • Single Error Correcting Finite Field Multipliers over GF(2m). VLSID 08 . • GfXpress: An Efficient Technique for Synthesis and Optimization of Polynomials in GF(2m), IEEE TCAD 2008.

  4. Source: Shekhar Borkar, Intel

  5. CLK 0 Q D transient fault soft error The Soft Error Problem 1 A 1->0 change at the above AND gate output may not cause soft error

  6. Soft Error Rate Trends Soft Error Rate Contributions Mitra 2005 Shivakumar 2002 Increasing contribution of faults in combinational logic to the overall soft error rate

  7. Robust Systems in Scaled CMOS Aging, Infant mortality, Process variation Radiation, Erratic bits Robust System “Acceptable” Outputs Performance Power Data Integrity Availability Security Inputs Design errors, Software failures Malicious attacks, Human errors Source: S Mitra

  8. Solution? -Different solutions for different components -Memory -Logic -Subsystem

  9. Low Power Soft Error Tolerant RAM Architecture Address, Data and Control Bus Root Node SWITCH NODE (Sub-tree select, Buffers) MEMORY NODES

  10. Error Tolerant Low Power RAM • Consider a 64 MB RAM • This can be partitioned into 16 4MB RAM Modules The 16 leaf nodes B0, B1…B15 connected in a binary tree and laid out as a H-Tree as shown next. Assume the rows in each module are encoded using different classes of codes shown below • Hamming Code denoted as H • Matrix-Product code denoted as M • Reed Muller Codes

  11. Cell Array Cell Array B5 B6 B9 B10 R o w D e c o d e r Sense Amp. Column Dec. B4 B7 B8 B11 Cell Array Cell Array B3 B0 B15 B12 Address Buffer Timing Control Data Buffer B2 B1 B14 B13 ECC Circuitry Error Tolerant Low Power RAM

  12. Different ECC Effectiveness • Hamming Code denoted as H(single error correction) • Matrix-Product code denoted as M(double errror correction) • Reed Muller Codes(two or more error correction) A row is segmented into Different sized data bits and encoded

  13. Reduction in Power Consumption compared to traditional design Power savings compared to traditional design Guaranteed protection against certain number of Soft Errors H- Uses Hamming M- Uses Matrix product code

  14. Performance improvement along with soft error correction Delay is reduced compared to traditional design Soft Error Protection Incorporated

  15. Reliability Results (no. information bits On z-axis) Memory size 64Mbits

  16. Logic Synthesis for Low Power and Soft Error Robustness using Finite Field Theory

  17. Finite Field • Also called Galois Field, denoted by GF(N) • N = pk, p is a prime number, and k an integer • Each element is a k-tuple • There are exactly N elements in the field (0, 1,…, N-1) or (0, 1, , 2 , 3 ,... , N-1) where ,  - primitive element

  18. Finite Field • Two operators • ‘+’ called addition operation forming Abelian Group • ‘.’ called multiplication operation forming Abelian group. • Additive and multiplicative identities 0 and 1 respectively • Additive and multiplicative inverse exists for each element

  19. GF(4) Elements [0, 1, , 2 ] = [0, 1, , ] Elements can be represented in GF(2n) as 2-tuples over GF(2) as shown below

  20. Example GF(4) Let 2 =  Thus we have 4 elements 0, 1,  and 

  21. Addition and Multiplication • Addition and Multiplication in GF(4) •  is multiplicative inverse of , and vice versa.

  22. Addition The addition of any two elements in GF(4), can be performed as addition of polynomials over GF(2) For example, consider  +   = x and  = x +1  +  = x + x + 1 = 2 x + 1 Since 2 x = 0 over GF(22)  +  = x + x + 1 = 1

  23. Multiplication Multiplication is done by using polynomials mod a primitive polynomial P(x) Let P(x) = x2 + x + 1 Thus  *  = x (x + 1) = x2 + x (x2 + x) mod x2 + x + 1 = 1 So  *  = 1  -1 =  ,  - 1 =   and  are inverse of each other

  24. - N 1 å - = - - d N 1 f ( x , , x , , x ) [ 1 ( x ( e )) ] f = d 1 i n i x ( e ) i = e 0 An Expansion Theorem (Pradhan 1978) Theorem:Any n-variable function f (x1,...,xi,...,xn) in GF(N) can be expanded as, • where  : IN  GF(N) is a one-to-one mapping, with IN = {0,1,...,N - 1}, and (0)=0. • fxi=(e) is called the cofactor of f with respect to xi= (e) • The term [1 - (xi - (e))N - 1] is a multiple-valued literal in GF(N).

  25. 1 å = - - d f ( x , , x , , x ) [ 1 ( x ( e ))] f = d 1 ( ) i n i x e i = 0 e = - - d + - - d [ 1 ( x ( 0 ))] f [ 1 ( x ( 1 ))] f = d = d ( 0 ) ( 1 ) i x i x i i = - + [ 1 x ] f x f = d = d ( 0 ) ( 1 ) i x i x i i = + x f x f i = d = d ( 0 ) ( 1 ) x i x i i Special Case • Let N = 2 • Then, • But this is Shannon’s expansion theorem! • i.e., the expansion theorem reduces to Shannon’s theorem in GF(2).

  26. GFMODD • Galois Field Multiple Output Decision Diagram (GFMODD) • Based on Finite Field

  27. An Example 2-Bit Multiplier Bit level 2x2 multiplier word level 2x2multiplier

  28. 2-Bit Multiplier Galois Expression • Inputs partitioned as X = (x1, x0), Y = (y1, y0) • Outputs partitioned as Z1 = (z3, z2), Z2= (z1, z0) • GFMODD yields Gfxpress (TCAD 2008) uses an algorithm using the MODD description as input to synthesize low power designs. This has been recently augmented to incorporate soft Error correction

  29. Power Aware Soft Error Robust Multiplier Design Area in 10-6 mm2, delay in ns, power in microW at 1.8V

  30. An order of magnitude improvement! Power Aware Soft Error Robust Adder Design

  31. Architecture Level Solution

  32. M1 Input 1 Voter M2 Input 2 Output M3 Input 3 Traditional Soft Error Tolerance at the system Level • TMR • - 3X power Power Dissipation is too high and Errors In Two or More Modules are not tolerated.

  33. Rollback Recovery • Checkpoints are taken at regular intervals • When an error is detected the program is rolled back to the last checkpoint and re-executed • Loss of performance. One checkpoint interval for every occurrence of soft error. • Loss of Power • Limited to one check point interval

  34. Rollback Recovery P t Checkpoint 1 T t Checkpoint 2 T t Checkpoint i X Soft Error Roll Back T t Checkpoint i+1 t = Time to take checkpoint T = Checkpoint Interval

  35. How to detect faults • Diagnostic Program (single module) • Time overhead of running the diagnostic program. • Fault coverage a problem • Duplicated System • Requires two modules • Comparison • Fault coverage can be good

  36. Traditional Duplex Rollback Schemes Rollback tch tch tch tch tr A Ij A Ij B tu X B (a) (b)

  37. A Novel Power Aware Soft Error Tolerant Scheme (Roll forward) Spare is released at time t2 and the system continues to operate as before Ij Ij+1 A 3 X B 2 1 s Ij Time t0 t1 t2 1: Copy state to the spare 2: Compare state of the spare with the state of A and B 3: Copy state from A to B X A fault

  38. Roll forward Cont. • Spare is activated for a short duration (a single check point interval) when there is a soft error. • This scheme tolerate all single transient/intermittent faults with minimum loss of performance and power. • This scheme also tolerates hard faults and various multiple transient faults

  39. Roll Forward Scheme with additional checking of the spare Better Reliability with Minimal loss of Performance with soft errors Ij Ij+1 Ij+2 A 3 4 X B 2 1 s Ij+1 Ij Time t0 t1 t2 4: Compare state of the spare and module A

  40. Roll Forward Scheme with a Permanent fault in B Ij Ij+1 Ij+2 A 3 4 X X X X B 2 Spare replaces B 1 s Ij+1 Ij Time t0 t1 t2 Spare replaces faulty module

  41. Roll forward with minimal Power/performance penalty roll-forward I2 I1 A I2 2 I1 X B 3 I1 1 S Spare activated Spare released

  42. Double failures can force a roll back. The additional power required still remains the same rollback I1 I2 I1 A X I2 2 I1 I1 X B 3 I1 1 S Spare activated Spare released

  43. Future Challenges • Develop new Design Paradigm: Soft Error Tolerant Low Power Design • Hierarchical Soft Error Tolerance with low Power • Design for Testing under Process Variation -It is a major challenge because variability may cause soft error • Impact of workload and speed on soft error rate • Trade-off of Soft error robustness againt power

  44. Other ongoing Research in Our Group • Memory Cell design for robustness against Soft Error • Logic Synthesis for Low power using TED • Logic Error Correction using Mutliple Parity bits • Soft Error Tolerant Lowpower Sensor Networks

More Related