html5-img
1 / 19

Designing a Fast and Adaptive Error Correction Scheme for Increasing the Lifetime of Phase Change Memories

Designing a Fast and Adaptive Error Correction Scheme for Increasing the Lifetime of Phase Change Memories. Rudrajit Datta and Nur A. Touba Computer Engineering Research Center Dept. of Electrical and Computer Engineering University of Texas at Austin. Introduction.

mikkel
Download Presentation

Designing a Fast and Adaptive Error Correction Scheme for Increasing the Lifetime of Phase Change Memories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing a Fast and Adaptive Error Correction Scheme for Increasing theLifetime of Phase Change Memories Rudrajit Datta and Nur A. Touba Computer Engineering Research Center Dept. of Electrical and Computer Engineering University of Texas at Austin

  2. Introduction • Challenges for traditional memories • Scalability • Device leakage • Retention time • Phase Change Memories (PCM) – a possible substitute • Non-volatile • Amenable to process scaling • High density – 4x DRAM [Seznec 10]

  3. Phase Change Memories • Crystalline state • Low resistance – ‘1’ • Amorphous state • High resistance – ‘0’ • Thermally induced state changes • Scalable • Disadvantages • Relatively quick degradation • ~107 writes [Ferreira 10] • Slow writes • PCM in place of DRAM – fix PCM reliability [Fantini 06]

  4. Previous Work • Hybrid PCM/DRAM [Zhang 09] • OS level paging scheme • BCH code correcting up to 7 errors • Slow • Spread/minimize PCM writes • [Ferreira 10] – minimize PCM writes • [Lee 09] – buffer reorganization and partial writes

  5. Previous Work • Architectural solutions so far • None using novel error correction code (ECC) • PCM errors increasing function of time • Function of writes/cell • Very different from traditional DRAM • Increasing permanent errors

  6. Proposed Scheme • Adaptive Error Correction • OS monitors errors corrected • Signals memory controller • Increase number of check bits • Physical line size of memory unchanged • More check bits, less data bits • Main memory to cache bandwidth affected • Gradually decreasing cache line size • Minimal performance impact • Orthogonal Latin Square (OLS) codes used • Fast – single step decode • Modular

  7. Proposed Scheme OLS Check Bits Word 1 Word 2 Word 3 Word 4 Enhanced ECC Word 1 Word 2 OLS Check Bits Word 3 Into Cache Word 2 Word 1 Word 3

  8. Proposed Scheme Data Regular Check-bitGenerator Enhanced Check-bitGenerator Signal from OS Main Memory Check Bits Information Bits Regular Check-bitGenerator Enhanced Check-bitGenerator Corrected Data

  9. Orthogonal Latin Square Codes • Latin Square • m x m array • Row-columns permutation of digits 0,1,…..m-1 • Orthogonal Latin Squares • Ordered pair of elements (r, c, s) appear only once • m2 data bits, 2tm check bits, t-error correctable [Hsiao 70]

  10. Adaptive ECC • Increase number of check bits per line • Break up line into small segments • Based on number of data bits • Implement ECC separately on each segment • Constraint – original line size unchanged • (Data + ECC)Original = ∑Segments (DataSegment + ECCSegment) • Overall error tolerance goes up

  11. Adaptive ECC Word 1 Word 2 Word 3 Word 4 ECC_OLS Enhanced ECC ECC_OLS Word 1 Word 2 Word 3 Enhanced Adaptive ECC ECC1 ECC2 ECC3 ECC4 Segment 3 Segment 1 Segment 4 Segment 2

  12. Adaptive ECC – Numerical example • Original configuration • 3-bit OLS code on 256-bit line – total 352 bits • Corrects all 3-error patterns and less • Increased check-bits • 25% of data-bits store ECC – 192 data bits • 2 64-bit data segments • 4 16-bit data segments • Check-bits – (352 – 192) = 160 • 3-bit OLS on the 64-bit segments • 2-bit OLS on the 16-bit segments

  13. Adaptive ECC – Numerical example • Enhanced ECC configuration corrects • 99.97% 3-bit errors • 99.73% 4-bit errors • ….. • Small fraction of 14-bit errors • Segmented ECC implementation boosts error tolerance

  14. Results Error Tolerance (no. of errors / no. of bits * 100) for varying memory sizes

  15. Results Percentage of operational memory lines versus number of errors injected out of 100,000 experiments

  16. Results

  17. Results SPEC2006 Benchmarks

  18. Results SPEC2006 Benchmark – bzip2

  19. Conclusion • Novel error correction scheme for PCM • Fast • Adaptive • Graceful decrease in memory capacity • Increases PCM lifetime • Switching period (to enhanced ECC) of the order of years

More Related