Loading in 5 sec....

An Analytical Model for CMPsPowerPoint Presentation

An Analytical Model for CMPs

- 123 Views
- Uploaded on
- Presentation posted in: General

An Analytical Model for CMPs

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

An Analytical Model for CMPs

Spring 2003 ECE/CS 757

University of Wisconsin - Madison

Peter McClone

Kim-Huei Low

- Introduction
- Simulation Environment
- Analytical Model ###
- Results
- Future work
- Conclusion

- Performance limits of superscalar processors
- Relative chip area increases
- Chip Multiprocessors (CMP)
- Design by simulation data
- Time constraint
- Analytical Model

CPU0

…

CPUN

iL1

dL1

iL1

dL1

L2

Memory

- Like Piranha
- Multi-programmed workload
- SimpleMP?

- Most area-efficient model
- Area vs performance
- SimpleScalar is used

- Generate address traces
- Ignore instruction addresses
- ~100% hit rate
- Fetched directly from memory, no allocation in L2
- Very little interference in L2
- Performance

- 1. Generate address traces using word size cache blocks at all levels and minimal cache sizes
- 2. Fed into CacheTracer to simulate the cache interference at the L2 level and generate the statistics needed for the model computation
- 3. Model is used to compute performance estimates for variations of all of the cache parameters

- A mathematical equation for IPC
- Combination of observations made in research papers and from intuitive knowledge
- Based mainly on a detailed cache model that was the focus of the project
- Processor model is very simple

- 3 part processor model:
M(C,t) =

M(C,t)startup+ M(C,t)nonstationary + M(C,t)intrinsic

- C is a cache configuration
- t is a time granule of r references

- Startup effects are the number of unique blocks accessed in the first time granule, u(B)
- The miss rate is u(B) divided by the total number of references.
- M(C,t)startup = u(B)
rt

- Nonstationary misses are caused by the change in the working set of a process
- This is the difference between the total number of unique blocks accessed to this point U(B) and the blocks attributed to startup:
- M(C,t)nonstationary = U(B) – u(B)
rT

- The sum of the startup and nonstationary misses are simply the number of unique blocks in the trace.

- Intrinsic misses are caused inherently caused because of program behavior
- The natural sequence of accesses cause cache lines to displace each other
- M(C,t)intrinsic =
c * [u(B) - ∑D S*d* P(B,d) ]

r d=0

- Intrinsic misses are caused inherently caused because of program behavior
- The natural sequence of accesses cause cache lines to displace each other
- M(C,t)intrinsic =
c * [u(B) - ∑D S*d* P(B,d) ]

r d=0

S is the number of sets

- Intrinsic misses are caused inherently caused because of program behavior
- The natural sequence of accesses cause cache lines to displace each other
- M(C,t)intrinsic =
c * [u(B) - ∑D S*d* P(B,d) ]

r d=0

P(B,d) is the probability that that d blocks of size B map into any cache set.

S is the number of sets

- Intrinsic misses are caused inherently caused because of program behavior
- The natural sequence of accesses cause cache lines to displace each other
- M(C,t)intrinsic =
c * [u(B) - ∑D S*d* P(B,d) ]

r d=0

P(B,d) is the probability that that d blocks of size B map into any cache set.

S is the number of sets

The sum computes the probability that a cache set has any number fewer than D blocks that map to it

- Intrinsic misses are caused inherently caused because of program behavior
- The natural sequence of accesses cause cache lines to displace each other
- M(C,t)intrinsic =
c * [u(B) - ∑D S*d* P(B,d) ]

r d=0

P(B,d) is the probability that that d blocks of size B map into any cache set.

S is the number of sets

This many blocks cannot cause misses. Fewer blocks than the set size map to the set.

- Intrinsic misses are caused inherently caused because of program behavior
- The natural sequence of accesses cause cache lines to displace each other
- M(C,t)intrinsic =
c * [u(B) - ∑D S*d* P(B,d) ]

r d=0

- However the other blocks (u(B) – the sum) may cause misses. This is estimated by multiplying by the measure collision rate c

P(B,d) is the probability that that d blocks of size B map into any cache set.

S is the number of sets

This many blocks cannot cause misses. Fewer blocks than the set size map to the set.

- Combining all three components of the miss rate results in the equation:
M(C,t) =

u(B) + U(B)–u(B) + c * [u(B)-∑D S*d* P(B,d)]

rt rt r d=0

- Average memory access time can be trivially computed:
AMA = (1 – ML1) * HitTimeL1

+ (ML1 * (1 - ML2) * HitTimeL2

+ (ML1* ML2) * HitTimemain

- The final model then incorporates the AMA into a simple processor model based on:
- Issue Width (IW)
- Issue Capabilities (IC)
- Average memory penalty (AMP)
- Average memory access time (AMA)

- Pi = IWi * [(AMAi + 2)/3 * AMPi ]
- Total performance is the sum of each processor
- Pcmp = ∑Nc Pi
i=1

- Modify SimpleMP or RSIM
- Modify SMP or DSM system to CMP system
- Fix or create the multi-programmed loader

- Incorporate processor parameters into model
- Issue width, instruction window size, branch predictor policy, # of ALUs, etc.

- ###
- Questions?

- [1] AGARWAL, A., HOROWITZ, M., AND HENNESSY, J. An analytical cache model. Computer Systems Lab. Rep. TR 86-304, Stanford Univ. Stanford, Calif., Sept. 1986
- [2] J. Huh, D. Burger, and S. W. Keckler. Exploring the design space of future CMPs. In The 10th International Conference on Parallel Architectures and Compilation Techniques, pages 199-210, September 2001.
- [3] L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In The 27th Annual International Symposium on Computer Architecture, pages282–293, June 2000.