240 likes | 315 Views
Post-Manufacturing ECC Customization Based on Orthogonal Latin Square Codes and Its Application to Ultra-Low Power Caches. Rudrajit Datta and Nur A. Touba Computer Engineering Research Center Dept. of Electrical and Computer Engineering University of Texas at Austin. Motivation.
E N D
Post-Manufacturing ECC Customization Based on Orthogonal Latin Square Codes and Its Application to Ultra-Low Power Caches Rudrajit Datta and Nur A. Touba Computer Engineering Research Center Dept. of Electrical and Computer Engineering University of Texas at Austin
Motivation • For memories with high defect rates • Reduce check-bit overhead • Increase reliability • Applicable to low voltage caches
Agenda • Introduction • Proposed Approach • Application • Related Work • Orthogonal Latin Square (OLS) Codes • Customization • Results • Conclusion
Introduction • Tolerate high defect rates for memories • Occurs in memories operating at ultra-low voltages • Expected in future nanoscale technologies • Eg. nanoscale crossbar architectures • Conventional method • ECC selected based on • Expected number of maximum defects per word
Introduction Data Check Bit Generator cfull Memory cfull Check Bits Information Bits cfull Decoder CorrectedData
Observations • A priori information available for location of defects • Through post-manufacturing memory tests • Obtain a defect map • Use information to customize code • Reduce check bit storage in memory/caches
Data Proposed Approach Check Bit Generator cfull Switch Network cused Memory Config. Bits cused Check Bits Information Bits cused Switch Network cfull Decoder Corrected Data
Proposed Approach • Customize code by disabling rows of the H-matrix • Possible if modular code used for ECC • Current work looks at OLS codes Configuration Bits
Application - Low-voltage Caches • Microprocessor voltage lowered while idle • Reduces power • Caches and memories susceptible at lower voltages • Unreliable below Vccmin • Enable reliable cache operation at lower voltages • At lower voltages use part of cache to store extra check bits
Related Work • Word-disable and Bit-fix [Wilkerson 08] • Defect map • Identify vulnerable bits • Mitigates only persistent errors • Uses up half of the cache to store extra check-bits • Two-dimensional ECC [Kim 07] • Slow • Complicated decoding • Multi-bit segmented ECC [Chishti 09] • Orthogonal Latin Square (OLS) code • Single step decodable • High redundancy
Key Takeaways • Have full ECC on chip • Can handle all defect maps • Generate defect map • Disable part of the original code • Reduces check bit redundancy • Retain capability of original code w.r.t the defect map
One Step Majority Decoding • t-error correctable – information bit copied over 2t+1 times; each an independent copy • One copy – bit itself • Rest - 2t independent parity equations di dp + cp corrected di dq + Majority Voter cq ds + cs
Orthogonal Latin Square Codes • Latin Square • m x m array • Row-columns permutation of digits 0,1,…..m-1 • Orthogonal Latin Squares • Ordered pair of elements (r, c, s) appear only once • m2 data bits, 2tm check bits, t-error correctable [Hsiao 70] • Single step decodable
Proposed Scheme • Implement full OLS code on chip • Run memory tests • Generate defect map • At manufacturing time or at boot-time • Identify vulnerable bits • Disable rows in OLS H-matrix • On chip-by-chip basis, based on defect map • Correct all erasures PLUS ‘e’ random error in each cache line • Reduce redundancy while providing same reliability
Definitions • “good row” – for information bit di • Row of OLS H-matrix • No ‘1’ in any other erasure position save bit di • Holds true for all lines In cache • “bad row” – for information bit di • Row of OLS H-matrix • ‘1’ in one or more erasure positions apart from bit di • Holds for at least one line of cache
“Good Rows” & “Bad Rows” d0 d1 d2 d3 d4 d5 d6 d7 line1 - E - - - E - - line2 - - - E - - - - H-row1 1 0 0 0 1 0 1 1 H-row2 0 1 1 0 1 0 0 1 H-row3 1 0 0 1 0 1 1 0 H-row1G - - - G - G G H-row2- G B - B - - B H-row3B - - B - B B -
Necessary and Sufficient Conditions • Tolerate ‘e’ random errors • “good rows” – “bad rows” ≥ 2(e + 1) • Original code – t-error correcting • (Max vulnerable bits in any line) + e ≤ t
Row Selection • Covering problem • Select enough good rows for each information bit di • Until constraint is satisfied • NP-complete problem • Apply heuristics H-row1G - - - G - G G H-row2- G B - B - - B H-row3B - - B - B B - “good rows” – “bad rows” -1 1 -1 -1 -1 -1 -1 -1 1 1 -1 0 0 0 1 0
Covering Problem • Solve for cache line with maximum erasures first • Apply solution to all other cache lines • If unsatisfactory, add erasures from one of unsolved lines • Repeat until solution fits entire cache
Implementation di dp + & ctlp cp Adjustable Threshold Voter dq Majority Voter + & corrected di ctlq cq ctl ds + & cs ctls
Experimental Results Results for Word Size of 256 Bits and Bit-Error Rate of 10‑3
Experimental Results Results for Constant Cache Size of 64KB
Experimental Results 64 KB cache, 484-bit word, 10-3 bit-error rate
Conclusion • Post-manufacturing customization • Reduces large check-bit overhead • Provides requisite reliability • Applicable to systems with high defect rate