1 / 10

Microbenchmarks for Memory Hierarchy

This overview explores the use of microbenchmarks and Vtune to verify known attributes of the P4 processor and determine the least recently used (LRU) policy. It includes benchmark results for L1 and L2 cache sizes, as well as a discussion on the tree-based pseudo LRU policy.

valeriee
Download Presentation

Microbenchmarks for Memory Hierarchy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microbenchmarks for Memory Hierarchy Brooks Mattox Matthew Sweet

  2. Overview • Objective • Microbenchmarks • Verifying Known P4 Specifications • Vtune Data observations • Tree-based Pseudo Least Recently Used Policy • Conclusions

  3. Objective • Verify known attributes of P4 using Vtune and microbenchmarks • Determine the LRU policy of Pentium 4 using similar benchmarks

  4. Microbenchmark for (i = 0; i < iterations; i++) { for (j = 0; j < vectorSize; j = j + stride) { vector[j] = vector[j] + 1; }

  5. Verify L1 & L2 cache size • Measure the number of cache misses over an interval of vector size increases • Point at which cache misses begin to increase substantially with corresponding vector size, indicates cache size

  6. L1 Cache Size Benchmark Results ~8KB

  7. L2 Cache Size Benchmark Results ~512KB

  8. Suspected P4 LRU Policy • Tree-based Pseudo LRU • Characteristics • Requires only one track bit for 2-way associativity • With higher associativity PLRUt still has better performance and lower complexity than the basic LRU, Round Robin, or Random policies.

  9. Tree-based Pseudo Least Recently Used Policy

  10. Sources • Aleksandar Milenkovic, “Cache Replacement Polices for Future Processors” • Rafael H. Saavedra, Chapter 5 - "Locality Effects and Characterization of the Memory Hierarchy" in “CPU Evaluation Performance and Execution Time Prediction Using Narrow Spectrum Benchmarking”

More Related