1 / 23

Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs

Euro-Par 2009, Delft (The Netherlands) - August 27, 2009. Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs. Javier Lira ψ Carlos Molina ф Antonio González λ. ф Dept. Enginyeria Informàtica Universitat Rovira i Virgili Tarragona, Spain carlos.molina@urv.net.

lan
Download Presentation

Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Euro-Par 2009, Delft (The Netherlands) - August 27, 2009 Last Bank: Dealing with Address Reuse inNon-Uniform Cache Architecture for CMPs Javier Liraψ Carlos Molinaф Antonio Gonzálezλ фDept. Enginyeria Informàtica Universitat Rovira i Virgili Tarragona, Spaincarlos.molina@urv.net ψDept. Arquitectura de Computadors Universitat Politècnica de Catalunya Barcelona, Spain javier.lira@ac.upc.edu λ Intel Barcelona Research Center Intel Labs - UPC Barcelona, Spainantonio.gonzalez@intel.com

  2. Outline • Introduction • Methodology • Last Bank • Characterization of replacements in NUCA • Last Bank Optimizations • Conclusions

  3. Introduction • CMPs have emerged as a dominant paradigm in system design. • Keep performance improvement while reducing power consumption. • Take advantage of Thread-level parallelism. • Commercial CMPs are currently available. • CMPs incorporate larger and shared last-level caches. • Wire delay is a key constraint.

  4. NUCA • Non-Uniform Cache Architecture (NUCA) was first proposed in ASPLOS 2002 by Kim et al.[1]. • NUCA divides a large cache in smaller and faster banks. • Banks close to cache controller have smaller latencies than further banks. Processor [1] C. Kim, D. Burger and S.W. Keckler. An Adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ASPLOS ‘02

  5. Outline • Introduction • Methodology • Last Bank • Characterization of replacements in NUCA • Last Bank Optimizations • Conclusions

  6. Methodology • Simulation tools: • Simics + GEMS • CACTI v6.0 • PARSEC Benchmark Suite

  7. Baseline NUCA cache architecture 8 cores 256 banks [2] B. M. Beckmann and D. A. Wood. Managing wire delay in large chip-multiprocessor caches. MICRO ‘04

  8. Outline • Introduction • Methodology • Last Bank • Characterization of replacements in NUCA • Last Bank Optimizations • Conclusions

  9. Last Bank • Data movements concentrate most accessed data in few banks. • Data replacements in HOT banks are unfair.

  10. Last Bank • An extra bank is included in the NUCA cache. • Acts as a Victim cache, but it is not fully-associative. • Provides evicted data a second chance for keeping in the NUCA. Last Bank

  11. Last Bank • Performance benefits restricted by Last Bank size. • Significant performance potential. • Analysis of reused addresses to find improvement points.

  12. Outline • Introduction • Methodology • Last Bank • Characterization of replacements in NUCA • Last Bank Optimizations • Conclusions

  13. Characterization of replacements in NUCA • How many evicted addresses are later reused? • How many cycles do a reused address usually spend out of the NUCA before being reinserted? • Where were reused addresses located within the NUCA just before being evicted? • What action did motivate reused addresses eviction from NUCA?

  14. Reused address statistics • Nearly 70% of evicted addresses return to the NUCA cache. • Most of the reused address, return to NUCA at least twice.

  15. Time between Eviction and Reinsertion • Nearly 30% of evicted addresses return in less than 100,000 cycles. • In blackscholes, almost 50% of reused addresses return to NUCA in less than 1,000 cycles.

  16. Last location within the NUCA • Most of reused addresses were evicted from Local Banks. • Most of addresses replaced from Central Banks are not later reused.

  17. Outline • Introduction • Methodology • Last Bank • Characterization of replacements in NUCA • Last Bank Optimizations • Conclusions

  18. Selective Last Bank • Target: To reduce pollution in Last Bank. • This mechanism allows to select the evicted data blocks that are going to be stored in the Last Bank. • Implemented Selective Last Bank: • Stores data blocks, if and only if, they were evicted from a Local Bank. • Otherwise, sends them back to the main memory.

  19. LRU Prioritising Last Bank • Target: To maintain reused addresses in the NUCA cache. • Modification of data eviction policy of NUCA banks. • Prioritises lines that come from Last Bank during the data replacement process. @D, P:0 @B, P:0 @A, P:0 @C, P:0 MRU LRU

  20. Results • BothoptimizationsincreaseLast Bank performance benefits. • Thereisstillroomforimprovement. • Adaptivefilterswillbeanalysed in futureworks.

  21. Outline • Introduction • Methodology • Last Bank • Characterization of replacements in NUCA • Last Bank Optimizations • Conclusions

  22. Conclusions • Data movements provoke unfair replacements in HOT banks. • Last Bank reduce access latency of promptly reused addresses. • Huge performance potential. • Two optimizations are proposed: • Selective Last Bank: Reduce pollution in Last Bank. • LRU Prioritising Last Bank: Maintain reused addresses in the NUCA cache.

  23. Last Bank: Dealing with Address Reuse inNon-Uniform Cache Architecture for CMPs Questions?

More Related