an analytical model for cmps
Download
Skip this Video
Download Presentation
An Analytical Model for CMPs

Loading in 2 Seconds...

play fullscreen
1 / 24

An Analytical Model for CMPs - PowerPoint PPT Presentation


  • 144 Views
  • Uploaded on

An Analytical Model for CMPs. Spring 2003 ECE/CS 757 University of Wisconsin - Madison. Peter McClone Kim-Huei Low. Overview. Introduction Simulation Environment Analytical Model ### Results Future work Conclusion. Introduction. Performance limits of superscalar processors

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' An Analytical Model for CMPs' - aulii


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
an analytical model for cmps

An Analytical Model for CMPs

Spring 2003 ECE/CS 757

University of Wisconsin - Madison

Peter McClone

Kim-Huei Low

overview
Overview
  • Introduction
  • Simulation Environment
  • Analytical Model ###
  • Results
  • Future work
  • Conclusion
introduction
Introduction
  • Performance limits of superscalar processors
  • Relative chip area increases
  • Chip Multiprocessors (CMP)
  • Design by simulation data
  • Time constraint
  • Analytical Model
cmp system model

CPU0

CPUN

iL1

dL1

iL1

dL1

L2

Memory

CMP System Model
  • Like Piranha
  • Multi-programmed workload
  • SimpleMP?
processor model
Processor Model
  • Most area-efficient model
  • Area vs performance
  • SimpleScalar is used
simplescalar
SimpleScalar
  • Generate address traces
  • Ignore instruction addresses
    • ~100% hit rate
    • Fetched directly from memory, no allocation in L2
    • Very little interference in L2
    • Performance
simulator combination
Simulator Combination
  • 1. Generate address traces using word size cache blocks at all levels and minimal cache sizes
  • 2. Fed into CacheTracer to simulate the cache interference at the L2 level and generate the statistics needed for the model computation
  • 3. Model is used to compute performance estimates for variations of all of the cache parameters
analytical model
Analytical Model
  • A mathematical equation for IPC
  • Combination of observations made in research papers and from intuitive knowledge
  • Based mainly on a detailed cache model that was the focus of the project
  • Processor model is very simple
analytical model1
Analytical Model
  • 3 part processor model:

M(C,t) =

M(C,t)startup+ M(C,t)nonstationary + M(C,t)intrinsic

  • C is a cache configuration
  • t is a time granule of r references
analytical model2
Analytical Model
  • Startup effects are the number of unique blocks accessed in the first time granule, u(B)
  • The miss rate is u(B) divided by the total number of references.
  • M(C,t)startup = u(B)

rt

analytical model3
Analytical Model
  • Nonstationary misses are caused by the change in the working set of a process
  • This is the difference between the total number of unique blocks accessed to this point U(B) and the blocks attributed to startup:
  • M(C,t)nonstationary = U(B) – u(B)

rT

  • The sum of the startup and nonstationary misses are simply the number of unique blocks in the trace.
analytical model4
Analytical Model
  • Intrinsic misses are caused inherently caused because of program behavior
  • The natural sequence of accesses cause cache lines to displace each other
  • M(C,t)intrinsic =

c * [u(B) - ∑D S*d* P(B,d) ]

r d=0

analytical model5
Analytical Model
  • Intrinsic misses are caused inherently caused because of program behavior
  • The natural sequence of accesses cause cache lines to displace each other
  • M(C,t)intrinsic =

c * [u(B) - ∑D S*d* P(B,d) ]

r d=0

S is the number of sets

analytical model6
Analytical Model
  • Intrinsic misses are caused inherently caused because of program behavior
  • The natural sequence of accesses cause cache lines to displace each other
  • M(C,t)intrinsic =

c * [u(B) - ∑D S*d* P(B,d) ]

r d=0

P(B,d) is the probability that that d blocks of size B map into any cache set.

S is the number of sets

analytical model7
Analytical Model
  • Intrinsic misses are caused inherently caused because of program behavior
  • The natural sequence of accesses cause cache lines to displace each other
  • M(C,t)intrinsic =

c * [u(B) - ∑D S*d* P(B,d) ]

r d=0

P(B,d) is the probability that that d blocks of size B map into any cache set.

S is the number of sets

The sum computes the probability that a cache set has any number fewer than D blocks that map to it

analytical model8
Analytical Model
  • Intrinsic misses are caused inherently caused because of program behavior
  • The natural sequence of accesses cause cache lines to displace each other
  • M(C,t)intrinsic =

c * [u(B) - ∑D S*d* P(B,d) ]

r d=0

P(B,d) is the probability that that d blocks of size B map into any cache set.

S is the number of sets

This many blocks cannot cause misses. Fewer blocks than the set size map to the set.

analytical model9
Analytical Model
  • Intrinsic misses are caused inherently caused because of program behavior
  • The natural sequence of accesses cause cache lines to displace each other
  • M(C,t)intrinsic =

c * [u(B) - ∑D S*d* P(B,d) ]

r d=0

  • However the other blocks (u(B) – the sum) may cause misses. This is estimated by multiplying by the measure collision rate c

P(B,d) is the probability that that d blocks of size B map into any cache set.

S is the number of sets

This many blocks cannot cause misses. Fewer blocks than the set size map to the set.

analytical model10
Analytical Model
  • Combining all three components of the miss rate results in the equation:

M(C,t) =

u(B) + U(B)–u(B) + c * [u(B)-∑D S*d* P(B,d)]

rt rt r d=0

  • Average memory access time can be trivially computed:

AMA = (1 – ML1) * HitTimeL1

+ (ML1 * (1 - ML2) * HitTimeL2

+ (ML1* ML2) * HitTimemain

analytical model11
Analytical Model
  • The final model then incorporates the AMA into a simple processor model based on:
    • Issue Width (IW)
    • Issue Capabilities (IC)
    • Average memory penalty (AMP)
    • Average memory access time (AMA)
  • Pi = IWi * [(AMAi + 2)/3 * AMPi ]
  • Total performance is the sum of each processor
  • Pcmp = ∑Nc Pi

i=1

future work
Future work
  • Modify SimpleMP or RSIM
    • Modify SMP or DSM system to CMP system
    • Fix or create the multi-programmed loader
  • Incorporate processor parameters into model
    • Issue width, instruction window size, branch predictor policy, # of ALUs, etc.
conclusion
Conclusion
  • ###
  • Questions?
references
References
  • [1] AGARWAL, A., HOROWITZ, M., AND HENNESSY, J. An analytical cache model. Computer Systems Lab. Rep. TR 86-304, Stanford Univ. Stanford, Calif., Sept. 1986
  • [2] J. Huh, D. Burger, and S. W. Keckler. Exploring the design space of future CMPs. In The 10th International Conference on Parallel Architectures and Compilation Techniques, pages 199-210, September 2001.
  • [3] L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In The 27th Annual International Symposium on Computer Architecture, pages282–293, June 2000.
ad