Lecture 14 caching cont
This presentation is the property of its rightful owner.
Sponsored Links
1 / 30

Lecture 14: Caching, cont. PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on
  • Presentation posted in: General

Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing. Department of Electrical and Computer Engineering Spring 2014, Dr. Rozier (UM). EXAM. Exam is one week from Thursday. Thursday, April 3 rd Covers: Chapter 4 Chapter 5 Buffer Overflows

Download Presentation

Lecture 14: Caching, cont.

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Lecture 14 caching cont

Lecture 14: Caching, cont.

EEN 312: Processors: Hardware, Software, and Interfacing

Department of Electrical and Computer Engineering

Spring 2014, Dr. Rozier (UM)


Lecture 14 caching cont

EXAM


Exam is one week from thursday

Exam is one week from Thursday

  • Thursday, April 3rd

  • Covers:

    • Chapter 4

    • Chapter 5

    • Buffer Overflows

    • Exam 1 material fair game


Lecture 14 caching cont

LAB QUESTIONS?


Lecture 14 caching cont

CACHING


Caching review

Caching Review

  • What is temporal locality?

  • What is spatial locality?


Caching review1

Caching Review

  • What is a cache set?

  • What is a cache line?

  • What is a cache block?


Caching review2

Caching Review

  • What is a direct mapped cache?

  • What is a set associative cache?


Caching review3

Caching Review

  • What is a cache miss?

  • What happens on a miss?


Lecture 14 caching cont

ASSOCIATIVITY AND MISS RATES


How much associativity

Increased associativity decreases miss rate

But with diminishing returns

Simulation of a system with 64KBD-cache, 16-word blocks, SPEC2000

1-way: 10.3%

2-way: 8.6%

4-way: 8.3%

8-way: 8.1%

How Much Associativity


Set associative cache organization

Set Associative Cache Organization


Multilevel caches

Primary cache attached to CPU

Small, but fast

Level-2 cache services misses from primary cache

Larger, slower, but still faster than main memory

Main memory services L-2 cache misses

Some high-end systems include L-3 cache

Multilevel Caches


Multilevel cache example

Given

CPU base CPI = 1, clock rate = 4GHz

Miss rate/instruction = 2%

Main memory access time = 100ns

With just primary cache

Miss penalty = 100ns/0.25ns = 400 cycles

Effective CPI = 1 + 0.02 × 400 = 9

Multilevel Cache Example


Example cont

Now add L-2 cache

Access time = 5ns

Global miss rate to main memory = 0.5%

Primary miss with L-2 hit

Penalty = 5ns/0.25ns = 20 cycles

Primary miss with L-2 miss

Extra penalty = 500 cycles

CPI = 1 + 0.02 × 20 + 0.005 × 400 = 3.4

Performance ratio = 9/3.4 = 2.6

Example (cont.)


Multilevel cache considerations

Primary cache

Focus on minimal hit time

L-2 cache

Focus on low miss rate to avoid main memory access

Hit time has less overall impact

Results

L-1 cache usually smaller than a single cache

L-1 block size smaller than L-2 block size

Multilevel Cache Considerations


Interactions with advanced cpus

Out-of-order CPUs can execute instructions during cache miss

Pending store stays in load/store unit

Dependent instructions wait in reservation stations

Independent instructions continue

Effect of miss depends on program data flow

Much harder to analyse

Use system simulation

Interactions with Advanced CPUs


Interactions with software

Misses depend on memory access patterns

Algorithm behavior

Compiler optimization for memory access

Interactions with Software


Replacement policy

Direct mapped: no choice

Set associative

Prefer non-valid entry, if there is one

Otherwise, choose among entries in the set

Least-recently used (LRU)

Choose the one unused for the longest time

Simple for 2-way, manageable for 4-way, too hard beyond that

Random

Gives approximately the same performance as LRU for high associativity

Replacement Policy


Replacement policy1

Replacement Policy

  • Optimal algorithm:

    • The clairvoyant algorithm

    • Not a real possibility

    • Why is it useful to consider?


Replacement policy2

Replacement Policy

  • Optimal algorithm:

    • The clairvoyant algorithm

    • Not a real possibility

    • Why is it useful to consider?

    • We can never beat the Oracle

    • Good baseline to compare to

      • Get as close as possible


Lecture 14 caching cont

LRU

  • Least Recently Used

    • Discard the data which was used least recently.

    • Need to maintain information on the age of a cache. How do we do so efficiently?

      Break into Groups

      Propose an algorithm


Using a bdd

Using a BDD


Lru is expensive

LRU is Expensive

  • Age bits take a lot to store

  • What about pseudo LRU?


Using a bdd1

Using a BDD

  • BDDs have decisions at their leaf nodes

  • BDDs have variables at their non-leaf nodes

  • Each path from a non-leaf to a leaf node is an evaluation of the variable at the node

  • Standard convention:

    • Left node on 0

    • Right node on 1


Bdd for lru 4 way associative cache

BDD for LRU 4-way associative cache

all lines valid

b0 is 0

Choose invalid Line

b1 is 0

b1 is 0

line0

line1

line2

line3


Bdd for lru 4 way associative cache1

BDD for LRU 4-way associative cache

all lines valid

b0 is 0

Choose invalid Line

b1 is 0

b1 is 0

line0

line1

line2

line3


Bdd for lru 4 way associative cache2

BDD for LRU 4-way associative cache

  • Pseudo-LRU

    • Uses only 3-bits


Lecture 14 caching cont

WRAP UP


For next time

For next time

  • Read Chapter 5, section 5.4


  • Login