Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs

Euro-Par 2009, Delft (The Netherlands) - August 27, 2009 Last Bank: Dealing with Address Reuse inNon-Uniform Cache Architecture for CMPs Javier Liraψ Carlos Molinaф Antonio Gonzálezλ фDept. Enginyeria Informàtica Universitat Rovira i Virgili Tarragona, Spaincarlos.molina@urv.net ψDept. Arquitectura de Computadors Universitat Politècnica de Catalunya Barcelona, Spain javier.lira@ac.upc.edu λ Intel Barcelona Research Center Intel Labs - UPC Barcelona, Spainantonio.gonzalez@intel.com

Outline • Introduction • Methodology • Last Bank • Characterization of replacements in NUCA • Last Bank Optimizations • Conclusions

Introduction • CMPs have emerged as a dominant paradigm in system design. • Keep performance improvement while reducing power consumption. • Take advantage of Thread-level parallelism. • Commercial CMPs are currently available. • CMPs incorporate larger and shared last-level caches. • Wire delay is a key constraint.

NUCA • Non-Uniform Cache Architecture (NUCA) was first proposed in ASPLOS 2002 by Kim et al.[1]. • NUCA divides a large cache in smaller and faster banks. • Banks close to cache controller have smaller latencies than further banks. Processor [1] C. Kim, D. Burger and S.W. Keckler. An Adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ASPLOS ‘02

Methodology • Simulation tools: • Simics + GEMS • CACTI v6.0 • PARSEC Benchmark Suite

Baseline NUCA cache architecture 8 cores 256 banks [2] B. M. Beckmann and D. A. Wood. Managing wire delay in large chip-multiprocessor caches. MICRO ‘04

Last Bank • Data movements concentrate most accessed data in few banks. • Data replacements in HOT banks are unfair.

Last Bank • An extra bank is included in the NUCA cache. • Acts as a Victim cache, but it is not fully-associative. • Provides evicted data a second chance for keeping in the NUCA. Last Bank

Last Bank • Performance benefits restricted by Last Bank size. • Significant performance potential. • Analysis of reused addresses to find improvement points.

Characterization of replacements in NUCA • How many evicted addresses are later reused? • How many cycles do a reused address usually spend out of the NUCA before being reinserted? • Where were reused addresses located within the NUCA just before being evicted? • What action did motivate reused addresses eviction from NUCA?

Reused address statistics • Nearly 70% of evicted addresses return to the NUCA cache. • Most of the reused address, return to NUCA at least twice.

Time between Eviction and Reinsertion • Nearly 30% of evicted addresses return in less than 100,000 cycles. • In blackscholes, almost 50% of reused addresses return to NUCA in less than 1,000 cycles.

Last location within the NUCA • Most of reused addresses were evicted from Local Banks. • Most of addresses replaced from Central Banks are not later reused.

Selective Last Bank • Target: To reduce pollution in Last Bank. • This mechanism allows to select the evicted data blocks that are going to be stored in the Last Bank. • Implemented Selective Last Bank: • Stores data blocks, if and only if, they were evicted from a Local Bank. • Otherwise, sends them back to the main memory.

LRU Prioritising Last Bank • Target: To maintain reused addresses in the NUCA cache. • Modification of data eviction policy of NUCA banks. • Prioritises lines that come from Last Bank during the data replacement process. @D, P:0 @B, P:0 @A, P:0 @C, P:0 MRU LRU

Results • BothoptimizationsincreaseLast Bank performance benefits. • Thereisstillroomforimprovement. • Adaptivefilterswillbeanalysed in futureworks.

Conclusions • Data movements provoke unfair replacements in HOT banks. • Last Bank reduce access latency of promptly reused addresses. • Huge performance potential. • Two optimizations are proposed: • Selective Last Bank: Reduce pollution in Last Bank. • LRU Prioritising Last Bank: Maintain reused addresses in the NUCA cache.

Last Bank: Dealing with Address Reuse inNon-Uniform Cache Architecture for CMPs Questions?

Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs

Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs

Presentation Transcript

Proper Wear of the CAP Uniform

Uniform Quantization

Speakers : Mahmoud Al- Zoubi , IT Manager, YWC

OCP: Open Core Protocol

Chapter 5: Uniform Circular Motion

CSC: 345 Computer Architecture

CMPT 300 Introduction to Operating Systems

Comparison Between Uniform And Non Uniform Motion

Code Reuse Through Hierarchies

International Financial Management P G Apte

MS108 Computer System I

Managing Wire Delay in CMP Caches

Oracle8i Administration

OpenMP

Introduction to PCI System Architecture

Advanced Pipelining