An analytical model for cmps
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

An Analytical Model for CMPs PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on
  • Presentation posted in: General

An Analytical Model for CMPs. Spring 2003 ECE/CS 757 University of Wisconsin - Madison. Peter McClone Kim-Huei Low. Overview. Introduction Simulation Environment Analytical Model ### Results Future work Conclusion. Introduction. Performance limits of superscalar processors

Download Presentation

An Analytical Model for CMPs

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


An analytical model for cmps

An Analytical Model for CMPs

Spring 2003 ECE/CS 757

University of Wisconsin - Madison

Peter McClone

Kim-Huei Low


Overview

Overview

  • Introduction

  • Simulation Environment

  • Analytical Model ###

  • Results

  • Future work

  • Conclusion


Introduction

Introduction

  • Performance limits of superscalar processors

  • Relative chip area increases

  • Chip Multiprocessors (CMP)

  • Design by simulation data

  • Time constraint

  • Analytical Model


Cmp system model

CPU0

CPUN

iL1

dL1

iL1

dL1

L2

Memory

CMP System Model

  • Like Piranha

  • Multi-programmed workload

  • SimpleMP?


Cachetracer

CacheTracer


Processor model

Processor Model

  • Most area-efficient model

  • Area vs performance

  • SimpleScalar is used


Simplescalar

SimpleScalar

  • Generate address traces

  • Ignore instruction addresses

    • ~100% hit rate

    • Fetched directly from memory, no allocation in L2

    • Very little interference in L2

    • Performance


Simulator combination

Simulator Combination

  • 1. Generate address traces using word size cache blocks at all levels and minimal cache sizes

  • 2. Fed into CacheTracer to simulate the cache interference at the L2 level and generate the statistics needed for the model computation

  • 3. Model is used to compute performance estimates for variations of all of the cache parameters


Analytical model

Analytical Model

  • A mathematical equation for IPC

  • Combination of observations made in research papers and from intuitive knowledge

  • Based mainly on a detailed cache model that was the focus of the project

  • Processor model is very simple


Analytical model1

Analytical Model

  • 3 part processor model:

    M(C,t) =

    M(C,t)startup+ M(C,t)nonstationary + M(C,t)intrinsic

  • C is a cache configuration

  • t is a time granule of r references


Analytical model2

Analytical Model

  • Startup effects are the number of unique blocks accessed in the first time granule, u(B)

  • The miss rate is u(B) divided by the total number of references.

  • M(C,t)startup = u(B)

    rt


Analytical model3

Analytical Model

  • Nonstationary misses are caused by the change in the working set of a process

  • This is the difference between the total number of unique blocks accessed to this point U(B) and the blocks attributed to startup:

  • M(C,t)nonstationary = U(B) – u(B)

    rT

  • The sum of the startup and nonstationary misses are simply the number of unique blocks in the trace.


Analytical model4

Analytical Model

  • Intrinsic misses are caused inherently caused because of program behavior

  • The natural sequence of accesses cause cache lines to displace each other

  • M(C,t)intrinsic =

    c * [u(B) - ∑D S*d* P(B,d) ]

    r d=0


Analytical model5

Analytical Model

  • Intrinsic misses are caused inherently caused because of program behavior

  • The natural sequence of accesses cause cache lines to displace each other

  • M(C,t)intrinsic =

    c * [u(B) - ∑D S*d* P(B,d) ]

    r d=0

S is the number of sets


Analytical model6

Analytical Model

  • Intrinsic misses are caused inherently caused because of program behavior

  • The natural sequence of accesses cause cache lines to displace each other

  • M(C,t)intrinsic =

    c * [u(B) - ∑D S*d* P(B,d) ]

    r d=0

P(B,d) is the probability that that d blocks of size B map into any cache set.

S is the number of sets


Analytical model7

Analytical Model

  • Intrinsic misses are caused inherently caused because of program behavior

  • The natural sequence of accesses cause cache lines to displace each other

  • M(C,t)intrinsic =

    c * [u(B) - ∑D S*d* P(B,d) ]

    r d=0

P(B,d) is the probability that that d blocks of size B map into any cache set.

S is the number of sets

The sum computes the probability that a cache set has any number fewer than D blocks that map to it


Analytical model8

Analytical Model

  • Intrinsic misses are caused inherently caused because of program behavior

  • The natural sequence of accesses cause cache lines to displace each other

  • M(C,t)intrinsic =

    c * [u(B) - ∑D S*d* P(B,d) ]

    r d=0

P(B,d) is the probability that that d blocks of size B map into any cache set.

S is the number of sets

This many blocks cannot cause misses. Fewer blocks than the set size map to the set.


Analytical model9

Analytical Model

  • Intrinsic misses are caused inherently caused because of program behavior

  • The natural sequence of accesses cause cache lines to displace each other

  • M(C,t)intrinsic =

    c * [u(B) - ∑D S*d* P(B,d) ]

    r d=0

  • However the other blocks (u(B) – the sum) may cause misses. This is estimated by multiplying by the measure collision rate c

P(B,d) is the probability that that d blocks of size B map into any cache set.

S is the number of sets

This many blocks cannot cause misses. Fewer blocks than the set size map to the set.


Analytical model10

Analytical Model

  • Combining all three components of the miss rate results in the equation:

    M(C,t) =

    u(B) + U(B)–u(B) + c * [u(B)-∑D S*d* P(B,d)]

    rt rt r d=0

  • Average memory access time can be trivially computed:

    AMA = (1 – ML1) * HitTimeL1

    + (ML1 * (1 - ML2) * HitTimeL2

    + (ML1* ML2) * HitTimemain


Analytical model11

Analytical Model

  • The final model then incorporates the AMA into a simple processor model based on:

    • Issue Width (IW)

    • Issue Capabilities (IC)

    • Average memory penalty (AMP)

    • Average memory access time (AMA)

  • Pi = IWi * [(AMAi + 2)/3 * AMPi ]

  • Total performance is the sum of each processor

  • Pcmp = ∑Nc Pi

    i=1


Results

Results


Future work

Future work

  • Modify SimpleMP or RSIM

    • Modify SMP or DSM system to CMP system

    • Fix or create the multi-programmed loader

  • Incorporate processor parameters into model

    • Issue width, instruction window size, branch predictor policy, # of ALUs, etc.


Conclusion

Conclusion

  • ###

  • Questions?


References

References

  • [1] AGARWAL, A., HOROWITZ, M., AND HENNESSY, J. An analytical cache model. Computer Systems Lab. Rep. TR 86-304, Stanford Univ. Stanford, Calif., Sept. 1986

  • [2] J. Huh, D. Burger, and S. W. Keckler. Exploring the design space of future CMPs. In The 10th International Conference on Parallel Architectures and Compilation Techniques, pages 199-210, September 2001.

  • [3] L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In The 27th Annual International Symposium on Computer Architecture, pages282–293, June 2000.


  • Login