1 / 17

Kilmo Choi rlfah926@naver.com

A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore Systems. Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao , Mingyu Chen, Chengyong Wu. Kilmo Choi rlfah926@naver.com. Contents. Background and Motivation Bank-Level Partition Mechanism(BPM)

artie
Download Presentation

Kilmo Choi rlfah926@naver.com

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore Systems Lei Liu, ZehanCui, MingjieXing, YungangBao, Mingyu Chen, ChengyongWu Kilmo Choi rlfah926@naver.com

  2. Contents • Background and Motivation • Bank-Level Partition Mechanism(BPM) • Results • Conclusion • Reference

  3. Background and Motivation • Memory bank - The same set of memory access speed • Multicore platform - Multiple banks can serve memory requests independently and concurrently  Bank-Level Parallelism

  4. Background and Motivation • Row buffer conflict • Causes performance degradation(throughput slowdown and unfairness ) • ex. row buffer hit rate decrease from 1 core(over 60%) to 16 core(35%) Core 0 Access in the same page Core 0 Access data in Row 1 Core 1 Access data on Row 3 Core Core Core Core Core Core Core Core Row-buffer Hit Row-buffer Conflict R/W R/W R/W R/W Activate Operation Precharge Operation Activate Operation Row 0 Row 0 Row 0 Row 0 Row 1 Row 1 Row 1 Row 1 Row 2 Row 2 Row 2 Row 2 Row 3 Row 3 Row 3 Row 3

  5. Bank-Level Partition Mechanism(BPM) • Numerous new memory scheduling algorithms have been proposed to address the interference problem • However, these algorithms usually employ complex scheduling logic and need hardware modification to memory controllers • Overview of BPM • OS memory management system uses a page-coloring mechanism to partition banks into several groups and maps each thread (process) to a specific bank group • Address mapping policy

  6. Bank-Level Partition Mechanism(BPM) row buffer row buffer row buffer row buffer row buffer row buffer row buffer row buffer Core Core Core Core Row 0 Row 0 Row 0 Row 0 Row 0 Row 0 Row 0 Row 0 Row 1 Row 1 Row 1 Row 1 Row 1 Row 1 Row 1 Row 1 Row 2 Row 2 Row 2 Row 2 Row 2 Row 2 Row 2 Row 2 Row 3 Row 3 Row 3 Row 3 Row 3 Row 3 Row 3 Row 3 Bank Bank Bank Bank Bank Bank Bank Bank

  7. Bank-Level Partition Mechanism(BPM) • Advantages • row buffer conflict ↓ row buffer hit ↑ • BPM is entirely software approach  Flexible • Easier for OS to monitor thread’s behavior than hardware • Bank-level conflicts can be fully eliminated by exclusively mapping a thread’s data to specific banks • How much influence the performance of thread amount of available bank?

  8. Bank-Level Partition Mechanism(BPM) • Discover bank bit by software method(Algorithm) • (Uncached) • Row{ } • Remain{ } FOR y{FOR x} FOR x y x x 0 0 0 Row hit Row miss 1 1 1 Mapped to different banks Row miss Row 0 Row 0 Row 0 Row 1 Row 1 Row 1 Row 2 Row 2 Row 2 Row 3 Row 3 Row 3 Higher latency  Column{ } Left parts  BANK{ } Higher latency  Row{ } Left parts  Remain{ }

  9. Bank-Level Partition Mechanism(BPM) • Advantages • row buffer conflict ↓ row buffer hit ↑ • BPM is entirely software approach  Flexible • Easier for OS to monitor thread’s behavior than hardware

  10. Results • Environments • 4 cores, 2.8GHz Intel Core i7-860 processor, 8GB DDR3 main memory with 64banks, 5 bank bits • CentOS Linux 5.4 with kernel 2.6.32.15 • SPEC CPU2006 benchmarks

  11. Results • Overall system performance

  12. Results • Page-Policy and Power

  13. Results • BPM VS Cache-Partition-Only • The correlation between BPM improvements and Per-core bandwidth

  14. Conclusion • BPM is a new approach to eliminate the interference between threads and improve the overall system performance • BPM achieves this goal by assign different group of banks to different threads to eliminate inter-thread bank-level interference • This leads to the reduction of row buffer misses as well as the energy consumption of memory system

  15. Reference • J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems. In HPCA-14, 2008. • Junghoon Kim, Junghan Kim, YoungikEom. A Page Coloring Scheme through Page Cache Separation for Improving Cache Performance, In NIPA-2010 • DimitrisKaseridis, Jeffrey Stuecheli, LizyKurian John. Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era. In MICRO 44, 2011

  16. 부록 : Page Coloring • Physically indexed caches are divided into multiple regions (colors). • All cache lines in a physical page are cached in one of those regions (colors). Physically indexed cache Virtual address virtual page number page offset OS control Address translation … … Physical address physical page number Page offset OS can control the page color of a virtual page through address mapping (by selecting a physical page with a specific value in its page color bits). = Cache address Cache tag Set index Block offset page color bits

  17. 부록 : Page Coloring Physical pages are grouped Physically indexed cache 1 2 3 4 … … …… i i+1 i+2 … … …… Process 1 … … ... 1 2 3 4 … … …… i i+1 i+2 … … …… Process 2

More Related