Scaling the Bandwidth Wall:
This presentation is the property of its rightful owner.
Sponsored Links
1 / 23

Brian Rogers †‡ , Anil Krishna †‡ , Gordon Bell ‡ , Ken Vu ‡ , Xiaowei Jiang † , Yan Solihin † PowerPoint PPT Presentation


  • 51 Views
  • Uploaded on
  • Presentation posted in: General

Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scalability 36th International Symposium on Computer Architecture. Brian Rogers †‡ , Anil Krishna †‡ , Gordon Bell ‡ , Ken Vu ‡ , Xiaowei Jiang † , Yan Solihin †. †. ‡. NC STATE UNIVERSITY. As Process Technology Scales …. P.

Download Presentation

Brian Rogers †‡ , Anil Krishna †‡ , Gordon Bell ‡ , Ken Vu ‡ , Xiaowei Jiang † , Yan Solihin †

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Brian rogers anil krishna gordon bell ken vu xiaowei jiang yan solihin

Scaling the Bandwidth Wall:Challenges in and Avenues for CMP Scalability36th International Symposium on Computer Architecture

Brian Rogers†‡, Anil Krishna†‡, Gordon Bell‡,

Ken Vu‡, Xiaowei Jiang†, Yan Solihin†

NC STATE

UNIVERSITY


As process technology scales

As Process Technology Scales …

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

$

$

$

$

$

$

$

$

$

$

$

$

$

$

$

DRAM

DRAM

Scaling the Bandwidth Wall -- ISCA 2009


Problem

Problem

  • Core growth >> Memory bandwidth growth

    • Cores: ~ exponential growth (driven by Moore’s Law)

    • Bandwidth: ~ much slower growth (pin and power limitations)

  • At each relative technology generation (T):

    • (# Cores = 2T) >> (Bandwidth = BT)

  • Some key questions (Our contributions):

    • How constraining is increasing gap between # of cores and available memory bandwidth?

    • How should future CMPs be designed; how should we allocate transistors to caches and cores?

    • What techniques can best reduce memory traffic demand?

Build Analytical CMP Memory Bandwidth Model

Scaling the Bandwidth Wall -- ISCA 2009


Agenda

Agenda

  • Background / Motivation

  • Assumptions / Scope

  • CMP Memory Traffic Model

  • Alternate Views of Model

  • Memory Traffic Reduction Techniques

    • Indirect

    • Direct

    • Dual

  • Conclusions

Scaling the Bandwidth Wall -- ISCA 2009


Assumptions scope

Assumptions / Scope

  • Homogenous cores

  • Single-threaded cores (multi-threading adds to problem)

  • Co-scheduled sequential applications

    • Multi-threaded apps with data sharing evaluated separately

  • Enough work to keep all cores busy

  • Workloads static across technology generations

  • Equal amount of cache per core

  • Power/Energy constraints outside scope of this study

Scaling the Bandwidth Wall -- ISCA 2009


Agenda1

Agenda

  • Background / Motivation

  • Assumptions / Scope

  • CMP Memory Traffic Model

  • Alternate Views of Model

  • Memory Traffic Reduction Techniques

    • Indirect

    • Direct

    • Dual

  • Conclusions

Scaling the Bandwidth Wall -- ISCA 2009


Cache miss rate vs cache size

Cache Miss Rate vs. Cache Size

  • Relationship follows the Power Law, Hartstein et al. (√2 Rule)

R = New cache size / Old cache size

α = Sensitivity of workload to cache size change

M = M0 * R-α

Scaling the Bandwidth Wall -- ISCA 2009


Cmp traffic model

CMP Traffic Model

  • Express chip area in terms of Core Equivalent Areas (CEAs)

    • Core = 1 CEA, Unit_of_Cache = 1 CEA

    • P = # cores, C = # cache CEAs, N = P+C, S = C/P

  • Assume that non-core and non-cache components require constant fraction of area

  • Add # of cores term for CMP model:

Scaling the Bandwidth Wall -- ISCA 2009


Cmp traffic model 2

CMP Traffic Model (2)

P = # cores, C = # cache CEAS N = P+C, S = C/P

  • Going from CMP1=<P1,C1> to CMP2=<P2,C2>

  • Remove common terms, express M2 in terms of M1

Scaling the Bandwidth Wall -- ISCA 2009


One generation of scaling

One Generation of Scaling

  • Baseline Processor: 8 cores, 8 cache CEAs

    • N1=16, P1=8, C1=8, S1=1, and ~ fully utilized BW

    • α = 0.5

  • How many cores possible if 32 CEAS now available?

    • Ideal Scaling = 2X # of cores at each successive technology generation

Ideal Scaling

BW Limited Scaling

Scaling the Bandwidth Wall -- ISCA 2009


Agenda2

Agenda

  • Background / Motivation

  • Assumptions / Scope

  • CMP Memory Traffic Model

  • Alternate Views of Model

  • Memory Traffic Reduction Techniques

    • Indirect

    • Direct

    • Dual

  • Conclusions

Scaling the Bandwidth Wall -- ISCA 2009


Cmp design constraint

CMP Design Constraint

P = # cores, C = # cache CEAS N = P+C, S = C/P

  • If available off-chip BW grows by factor of B:

    • Total memory traffic should grow by at most a factor of B each generation

  • Write S2 in terms of P2 and N2:

  • New technology: N2 CEAs, B bandwidth => solve for P2 numerically

P2 is # of cores that can be supported

Scaling the Bandwidth Wall -- ISCA 2009


Scaling under area constraints

Scaling Under Area Constraints

  • With an increasing # of CEAs available, how many cores can be supported at constant BW requirement

  • 2x die area: 1.4x cores

  • 4x die area: 1.9x cores

  • 8x die area: 2.4x cores

  • 16x die area: 3.2x cores

Scaling the Bandwidth Wall -- ISCA 2009


Agenda3

Agenda

  • Background / Motivation

  • Assumptions / Scope

  • CMP Memory Traffic Model

  • Alternate Views of Model

  • Memory Traffic Reduction Techniques

    • Indirect

    • Direct

    • Dual

  • Conclusions

Scaling the Bandwidth Wall -- ISCA 2009


Categories of techniques

Categories of Techniques

  • Indirect

  • Cache Compression

  • DRAM Caches

  • 3D-stacked Cache

  • Unused Data Filter

  • Smaller Cores

  • Direct

  • Link Compression

  • Sectored Caches

  • Dual

  • Cache+Link Compress

  • Small Cache Lines

  • Data Sharing

Scaling the Bandwidth Wall -- ISCA 2009


Indirect dram cache

Indirect – DRAM Cache

F – Influenced by Increased Density

Ideal Scaling

Scaling the Bandwidth Wall -- ISCA 2009


Direct link compression

Direct – Link Compression

R – Influenced by Compression Ratio

Ideal Scaling

Scaling the Bandwidth Wall -- ISCA 2009


Dual small cache lines

Dual – Small Cache Lines

F,R – Influenced by % Unused Data

Ideal Scaling

Scaling the Bandwidth Wall -- ISCA 2009


Dual data sharing

Dual – Data Sharing

  • Please see paper for details on modeling of sharing

  • Data sharing unlikely to provide a scalable solution

Scaling the Bandwidth Wall -- ISCA 2009


Summary of individual techniques

Summary of Individual Techniques

Indirect

Direct

Dual

Scaling the Bandwidth Wall -- ISCA 2009


Summary of combined techniques

Summary of Combined Techniques

Scaling the Bandwidth Wall -- ISCA 2009


Conclusions

Conclusions

  • Contributions

    • Simple, powerful analytical CMP memory traffic model

    • Quantify significance of memory BW wall problem

      • 10% chip area for cores in 4 generations if constant traffic req.

    • Guide design (cores vs. cache) of future CMPs

      • Given fixed chip area and BW scaling, how many cores?

    • Evaluate memory traffic reduction techniques

      • Combinations can enable ideal scaling for several generations

  • Need bandwidth-efficient computing:

    • Hardware/Architecture level: DRAM caches, cache/link compression, prefetching, smarter memory controllers, etc.

    • Technology level: 3D chips, optical interconnects, etc.

    • Application level: working set reduction, locality enhancement, data vs. pipelined parallelism, computation vs. communication, etc.

Scaling the Bandwidth Wall -- ISCA 2009


Questions

Questions ?

Thank You

Brian Rogers

[email protected]

Scaling the Bandwidth Wall -- ISCA 2009


  • Login