1 / 24

Energy Reduction for STT-RAM Using Early Write Termination

Energy Reduction for STT-RAM Using Early Write Termination. Ping Zhou , Bo Zhao, Jun Yang, *Youtao Zhang Electrical and Computer Engineering Department *Department of Computer Science University of Pittsburgh. ICCAD 2009. Introduction. Traditional SRAM Cache

sally
Download Presentation

Energy Reduction for STT-RAM Using Early Write Termination

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Energy Reduction for STT-RAM Using Early Write Termination Ping Zhou, Bo Zhao, Jun Yang, *Youtao Zhang Electrical and Computer Engineering Department *Department of Computer Science University of Pittsburgh ICCAD 2009

  2. Introduction • Traditional SRAM Cache • Limited by density, leakage and scalability • STT-RAM Cache? • High density (~4x than SRAM) • High speed (same read speed as SRAM) • Non-volatile • No write endurance problem

  3. STT-RAM: Cell • Magnetic Tunnel Junction (MTJ) • Relative magnetization direction • Different resistances  Logic 0 or 1 • Write: spin-polarized current • Much less write current than conventional MRAM Reference Layer MgO MgO Free Layer High Resistance (Logic 1) Low Resistance (Logic 0)

  4. STT-RAM: Cell Array • Similar array structure as SRAM • Bidirectional write current BL SL BL SL MTJ MTJ MTJ WL MTJ write 0 write 1 WL

  5. STT-RAM Cache: Challenge • High dynamic energy • 6~14x more energy per write access [Dong et al. DAC 2008, Sun et al. HPCA 2009] • Write contributes >74% of total dynamic energy 74.2% Need to reduce write energy in STT-RAM cache!

  6. Opportunity • Many bits are unchanged in a write access – Redundant bit-writes [Zhou et al. ISCA 2009] • Redundant bit-writes in 16MB STT-RAM cache 88% How to exploit this opportunity?

  7. Exploiting Redundant Bit-Writes • Need to know the old value… • Read & compare before write [Zhou et al. ISCA 2009] • Can we do better?

  8. Observation • MTJ resistance changes abruptly by the end of write cycle • Cell still holds old value at early stage of write cycle • Read is much faster than write Y. Chen et al. ISQED 2008 Possible to sense the old value at early stage of write cycle

  9. Early Write Termination: Idea • On a write access… • Start write cycle like normal • Sense the old value at early stage • Terminate the write cycle if old value is same as new value • Does not require a preceding read & compare!

  10. EWT Circuit BL SL WL MTJ write 0 Rwire write 1 Rwire Vin1 conversion Vsense1 Vsense0 conversion Vin0 pass pass Vsense0 Vref0 New value • Conversion circuit • Basic differential amplifier • Input lower  Output higher • Input higher  Output lower Vsense1 Vref1 Sense-Amp Terminate?

  11. How EWT Works? BL SL WL MTJ write 0 Rwire high low Rwire Vin1 conversion Vsense1 Vsense0 conversion Vin0 pass pass 0.536ns Old Value New Value Vin0 Vsense0 SA output Action 0 0 lower higher 1 Terminate 1 0 higher lower 0 Continue

  12. Advantages of EWT • No performance penalty! • Carried within a write cycle • No need to read & compare before a write • Write access may finish early  Slight speedup • Low energy overhead (3.23%) • Low complexity • Easy to integrate with existing designs

  13. Modeling STT-RAM and EWT

  14. Latency Modeling • Cell • Derived from recent works [Dong et al. DAC 2008] • Peripheral • Derived from CACTI[Thoziyoor et al. ISCA 2008, Dong et al. DAC 2008]

  15. Dynamic Energy Modeling • Baseline: Derived from recent works[Dong et al. DAC 2008] • EWT • Read energy: same as baseline • Write energy: variable Peripheral (derived from CACTI) Extra energy introduced by EWT circuits (HSPICE) Nchanged × Echanged + Nunchanged × Eunchanged Cell change Terminated cell change

  16. Leakage Energy Modeling • STT-RAM is non-volatile • Power gate the idle banks • Assume 1ns delay to “wake up” • Used in both baseline and EWT

  17. Experimental Setup • Simics-based simulator • 4-core CMP, 1GHz • 32KB private L1 cache • 16MB shared L2 cache using STT-RAM, 16 banks • 4GB main memory • Enhanced cache model: STT-RAM & EWT

  18. Results: Performance • Normalized Cycle-Per-Instruction (CPI) 1% speedup Slight performance improvement

  19. Results: Write Energy • Normalized write energy 70% saving Up to 80% write energy reduction

  20. Results: Dynamic Energy • Normalized dynamic energy Base EWT 52% reduction

  21. Results: Total Energy • Normalized total energy 33% reduction

  22. Results: Energy-Delay Product • Normalized ED2 34% reduction

  23. Conclusion • Address a key challenge to STT-RAM cache: dynamic energy • EWT: Exploit redundant bit-writes without performance penalty • Low overhead and complexity • Modeling and evaluation • Up to 80% write energy reduction • 34% ED2 reduction

  24. Thank you!

More Related