1 / 23

Fine-Grain Power-Gating on STT-MRAM Peripheral Circuits with Locality-aware Access Control

Fine-Grain Power-Gating on STT-MRAM Peripheral Circuits with Locality-aware Access Control. Eishi Arima † Hiroki Noguchi* Takashi Nakada † Shinobu Miwa† Susumu Takeda* Shinobu Fujita* and Hiroshi Nakamura†. †The University of Tokyo *Toshiba Corporation.

season
Download Presentation

Fine-Grain Power-Gating on STT-MRAM Peripheral Circuits with Locality-aware Access Control

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fine-Grain Power-Gatingon STT-MRAM Peripheral Circuits with Locality-aware Access Control EishiArima† Hiroki Noguchi* Takashi Nakada† Shinobu Miwa† Susumu Takeda* Shinobu Fujita* and Hiroshi Nakamura† †The University of Tokyo *Toshiba Corporation Normally-off Computing Project http://noff-pj.jp/en/ The Memory Forum 2014

  2. Background • STT-MRAMis considered as the best candidate to substitute SRAM for LLC. • low leakage,high density, high write endurance • The write access energy has been regarded as the most critical problem inan STT-MRAM cache. • With state-of-the-art MTJ cells, it isreduced dramatically. • 43nJ -> 1.1nJ (1MB)† • On the other hand, we need to consider the leakage power of STT-MRAM peripheral circuits. • To drive write current, high-performance but leaky transistors are needed in peripherals. †E. Kitagawaet al. “Impact of Ultra Low Power and Fast Write Operation of Advance Perpendicular MTJ on Power Reduction for High-Performance Mobile CPU,” in IEDM, 2012 The Memory Forum 2014

  3. Motivation • In an STT-MRAM LLC, the leakage energy accounts for 48% of the total LLC energy consumption. • It is mainly consumed in peripherals. • The leakage of memory cells is nearly equivalent to zero. We need to reduce them. There are many techniques for reducingthem. The Memory Forum 2014

  4. Our goal and approaches • The goalof this research • Leakage reduction for STT-MRAM LLC’s peripheral circuits while maintaining processor performance • Our approaches • Fine-grained power-gating on peripheral circuits • Especially at the granularity of subarrays • Access control for further energy reduction • Particularly, gathering the cache accesses The Memory Forum 2014

  5. Our goal and approaches • The goalof this research • Leakage reduction for STT-MRAM LLC’s peripheral circuits while maintaining processor performance • Our approaches • Fine-grained power-gating on peripheral circuits • Especially at the granularity of subarrays • Access control for further energy reduction • Particularly, gathering the cache accesses The Memory Forum 2014

  6. Subarray level power-gating onSTT-MRAM peripheral circuits • We assume power-gating at the granularity of subarrays. • Finer granularity increases the chance of power-gating. • Finer granularity than subarray is difficult to be implemented. • A frequently accessed subarray should be kept awake. • It takes a few ns to wake a subarray up. • To solve this problem, we adopt time-out control. • The subarray not accessed for a while tends to be idle for a long time. The Memory Forum 2014

  7. Our goal and approaches • The goalof this research • Leakage reduction for STT-MRAM LLC’s peripheral circuits while maintaining processor performance • Our approaches • Fine-grained power-gating on peripheral circuits • Especially at the granularity of subarrays • Access control for further energy reduction • Particularly, gathering the cache accesses The Memory Forum 2014

  8. Access control strategyforfurther leakage reduction • We should improve subarray level access locality for further leakage reduction. • While minimizing performance degradation • Our Methodologies • Locality-aware subarraymapping • For spatial locality enhancement • Write aggregation with buffers • For temporal locality enhancement The Memory Forum 2014

  9. Access control strategy forfurther leakage reduction • We should improve subarray level access locality for further leakage reduction. • While minimizing performance degradation • Our Methodologies • Locality-aware subarraymapping • For spatial locality enhancement • Write aggregation with buffers • For temporal locality enhancement The Memory Forum 2014

  10. Locality-aware subarraymapping • There are two types of subarray mappings in a cache. • Way-division and set-division • Way-division is usually adopted • But, a set-divisioncache has better spatial locality in a successive data access sequence. All the subarrays are awake. Only one subarray is awake. The index of a line is decided according to its address. 64 sets So they are scattered like this. The way of a line is decided unpredictably. 8ways 256KB 8ways 8subarrays LLC subarray

  11. Access control strategy forfurther leakage reduction • We should improve subarray level access locality for further leakage reduction. • While minimizing performance degradation • Our Methodologies • Locality-aware subarraymapping • For spatial locality enhancement • Write aggregation with buffers • For temporal locality enhancement The Memory Forum 2014

  12. Write aggregation for temporallocality enhancement • Our technique gathers write access with buffers. • Each subarray has a set of buffers. • One set of buffer is flushed if the corresponding subbarray is read or the buffer set is filled with data. • Write access latency is not critical for performance. t t Write (LLC misses and writebacks) sleep Read (demand LLC hits) time-out interval active buffering The access sequence for a subarray

  13. Experiment • We evaluate the effect of our methodologies using the processor simulator gem5. • Environment • We set the time-out interval and the number ofeachbuffer entry as 1K cycles and 8 respectively. • They are the optimal numbers in our simulations. LLC

  14. Result • The figure shows the Leakage energy of an STT-MRAM LLC for each method. • On average, more than 80% of L2 cache leakage can be reduced with our techniques. average best case for each method 67% 83% 40% 30% contribution of set-division contribution of buffers The Memory Forum 2014

  15. Summary • To reduce the leakage power of STT-MRAM’s peripheral circuits, we propose subarray level power-gating. • We also propose two locality-aware access control methodologies to achieve more leakage power reduction. • Our experimental result shows that on average more than 80% of L2 cache leakage can be reduced with our techniques. The Memory Forum 2014

  16. The Memory Forum 2014

  17. The Memory Forum 2014

  18. Performance Degradation • The performance degradation caused by our methodologies is almost negligible. The Memory Forum 2014

  19. Access control strategyforfurther leakage reduction • Aim:Improving overall sleep rate forsubarrays. • To achieve this, we only need to improve spatial / temporal subarray level locality of cache accesses. t t t t Temporal Locality Improvement Spacial Locality Improvement Sleep Rate 50% access sleep active almost 0 leakage time-out interval 75%→94% 50%→75% Subarrays The Memory Forum 2014

  20. Implementation • The buffers are constructed as an SRAM array. • The buffer array is accessed just after an LLC access. • accessed only when LLC miss The Memory Forum 2014

  21. The optimal Timeout interval • The figure shows the relationship between the time-out interval and performance degradation. • We assume that we can accept a performance degradation of 1.5% and consider 1K cycles as the time-out interval. (of all the benchmarks) The Memory Forum 2014

  22. The optimal number of buffer entries • The figure shows the total energy of L2 cache and write buffers for the subarrays. • We can see from the graph that the optimal number of entries is 8. The Memory Forum 2014

  23. Access distributions • With our techniques, we can reduce the number of small access intervals. sleeprate active rate The Memory Forum 2014

More Related