1 / 30

Lecture 14: Caching, cont.

Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing. Department of Electrical and Computer Engineering Spring 2014, Dr. Rozier (UM). EXAM. Exam is one week from Thursday. Thursday, April 3 rd Covers: Chapter 4 Chapter 5 Buffer Overflows

floria
Download Presentation

Lecture 14: Caching, cont.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr. Rozier (UM)

  2. EXAM

  3. Exam is one week from Thursday • Thursday, April 3rd • Covers: • Chapter 4 • Chapter 5 • Buffer Overflows • Exam 1 material fair game

  4. LAB QUESTIONS?

  5. CACHING

  6. Caching Review • What is temporal locality? • What is spatial locality?

  7. Caching Review • What is a cache set? • What is a cache line? • What is a cache block?

  8. Caching Review • What is a direct mapped cache? • What is a set associative cache?

  9. Caching Review • What is a cache miss? • What happens on a miss?

  10. ASSOCIATIVITY AND MISS RATES

  11. Increased associativity decreases miss rate But with diminishing returns Simulation of a system with 64KBD-cache, 16-word blocks, SPEC2000 1-way: 10.3% 2-way: 8.6% 4-way: 8.3% 8-way: 8.1% How Much Associativity

  12. Set Associative Cache Organization

  13. Primary cache attached to CPU Small, but fast Level-2 cache services misses from primary cache Larger, slower, but still faster than main memory Main memory services L-2 cache misses Some high-end systems include L-3 cache Multilevel Caches

  14. Given CPU base CPI = 1, clock rate = 4GHz Miss rate/instruction = 2% Main memory access time = 100ns With just primary cache Miss penalty = 100ns/0.25ns = 400 cycles Effective CPI = 1 + 0.02 × 400 = 9 Multilevel Cache Example

  15. Now add L-2 cache Access time = 5ns Global miss rate to main memory = 0.5% Primary miss with L-2 hit Penalty = 5ns/0.25ns = 20 cycles Primary miss with L-2 miss Extra penalty = 500 cycles CPI = 1 + 0.02 × 20 + 0.005 × 400 = 3.4 Performance ratio = 9/3.4 = 2.6 Example (cont.)

  16. Primary cache Focus on minimal hit time L-2 cache Focus on low miss rate to avoid main memory access Hit time has less overall impact Results L-1 cache usually smaller than a single cache L-1 block size smaller than L-2 block size Multilevel Cache Considerations

  17. Out-of-order CPUs can execute instructions during cache miss Pending store stays in load/store unit Dependent instructions wait in reservation stations Independent instructions continue Effect of miss depends on program data flow Much harder to analyse Use system simulation Interactions with Advanced CPUs

  18. Misses depend on memory access patterns Algorithm behavior Compiler optimization for memory access Interactions with Software

  19. Direct mapped: no choice Set associative Prefer non-valid entry, if there is one Otherwise, choose among entries in the set Least-recently used (LRU) Choose the one unused for the longest time Simple for 2-way, manageable for 4-way, too hard beyond that Random Gives approximately the same performance as LRU for high associativity Replacement Policy

  20. Replacement Policy • Optimal algorithm: • The clairvoyant algorithm • Not a real possibility • Why is it useful to consider?

  21. Replacement Policy • Optimal algorithm: • The clairvoyant algorithm • Not a real possibility • Why is it useful to consider? • We can never beat the Oracle • Good baseline to compare to • Get as close as possible

  22. LRU • Least Recently Used • Discard the data which was used least recently. • Need to maintain information on the age of a cache. How do we do so efficiently? Break into Groups Propose an algorithm

  23. Using a BDD

  24. LRU is Expensive • Age bits take a lot to store • What about pseudo LRU?

  25. Using a BDD • BDDs have decisions at their leaf nodes • BDDs have variables at their non-leaf nodes • Each path from a non-leaf to a leaf node is an evaluation of the variable at the node • Standard convention: • Left node on 0 • Right node on 1

  26. BDD for LRU 4-way associative cache all lines valid b0 is 0 Choose invalid Line b1 is 0 b1 is 0 line0 line1 line2 line3

  27. BDD for LRU 4-way associative cache all lines valid b0 is 0 Choose invalid Line b1 is 0 b1 is 0 line0 line1 line2 line3

  28. BDD for LRU 4-way associative cache • Pseudo-LRU • Uses only 3-bits

  29. WRAP UP

  30. For next time • Read Chapter 5, section 5.4

More Related