Chapter 7b:
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

Chapter 7b: Cache Memory Performance PowerPoint PPT Presentation


  • 42 Views
  • Uploaded on
  • Presentation posted in: General

Chapter 7b: Cache Memory Performance. 6-bit Address. Main Memory. 00 00 00. 5600. 00 01 00. 3223. 00 10 00. 2. 1. 0. 31. 23. Cache. Tag. Index. 00 11 00. 1122. 01 00 00. 0. Valid. Index. Tag. Data. 01 01 00. 32324. 00. Y. 00. 5600. 01 10 00. 845. 01. Y. 11.

Download Presentation

Chapter 7b: Cache Memory Performance

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Chapter 7b cache memory performance

Chapter 7b:

Cache Memory Performance


Direct mapping review

6-bit Address

Main Memory

00 00 00

5600

00 01 00

3223

00 10 00

2

1

0

31

23

Cache

Tag

Index

00 11 00

1122

01 00 00

0

Valid

Index

Tag

Data

01 01 00

32324

00

Y

00

5600

01 10 00

845

01

Y

11

775

01 11 00

43

10

Y

01

845

10 00 00

976

11

N

00

33234

10 01 00

77554

10 10 00

433

10 11 00

7785

11 00 00

2447

11 01 00

775

11 10 00

433

11 11 00

3649

Split depends oncache size

Memory Address:

Direct Mapping Review

Each word has only one placeit can be in the cache:

Index must match exactly

Tag

Index

Always zero (words)

7.2


Missed me missed me

Missed me, Missed me...

  • What to do on a hit:

    • Carry on... (Hits should take one cycle or less)

  • What to do on an instruction fetch miss:

    • Undo PC increment (PC <-- PC-4)

    • Do a memory read

    • Stall until memory returns the data

    • Update the cache (data, tag and valid) at index

    • Un-stall

  • What to do on a load miss

    • Same thing, except don’t mess with the PC

7.2


Missed me missed me1

Missed me, Missed me...

  • What to do on a store (hit or miss)

    • Won’t do to just write it to the cache

    • The cache would have a different (newer) value than main memory

  • Simple Write-Through

    • Write both the cache and memory

      • Works correctly, but slowly

  • Buffered Write-Through

    • Write the cache

    • Buffer a write request to main memory

      • 1 to 10 buffer slots are typical

7.2


Splitting up

IF

RF

M

WB

EX

Splitting up

  • It is common to use two separate caches for Instructions and for Data

    • All Instruction fetches use the I-cache

    • All data accesses (loads and stores) use the D-cache

  • This allows the CPU to access the I-cache at the same time it is accessing the D-cache

    • Still have to share a single memory

Note: The hit rate will probably be lower than for a combined cache of the same total size.

7.2


What about spatial locality

Data

V

Tag

CacheEntry

Word

4

3

2

1

0

31

14

13

Address

10

2

2

18

Index

Blockoffset

Byteoffset

Tag

What about Spatial Locality?

  • Spatial locality says that physically close data is likely to be accessed close together

  • On a cache miss, don’t just grab the word needed, but also the words nearby

  • The easiest way to do this is to increase the block size

3

Word 2

Word 1

Word 0

One 4-word Block

All words in the same block have the same index and tag

Note: 22 = 4

7.2


32kbyte 4 word block d m cache

32 KB / 4 Words/Block / 4 Bytes/Word --> 2K blocks

4

3

1

2

0

31

15

14

Data (4-word Blocks)

Index

V

Tag

0

1

2

...

...

2046

2047

Mux

3

2

1

0

32

32KByte/4-Word Block D.M. Cache

211=2K

Tag

Index

Byte offset

11

Block offset

17

17

Hit!

Data

7.2


How much change

BenchmarkBlock SizeInstructionData missCombined

(words)miss ratemiss rate

How Much Change?

Miss rates for DEC 3100 (MIPS machine)

Separate 64KB Instruction/Data Caches (16K 1-word blocks or 4K 4-word blocks)

gcc16.1%2.1%5.4%

gcc42.0%1.7%1.9%

spice11.2%1.3%1.2%

spice40.3%0.6%0.4%

7.2


The cost of a cache miss

This actuallydepends onthe busspeed

The cost of a cache miss

  • For a memory access, assume:

    • 1 clock cycle to send address to memory

    • 40 clock cycles for each DRAM access(clock cycle 0.5ns, 20 ns access time)

    • 1 clock cycle to send each resulting data word

  • Miss access time (4-word block)

    • 4 x (Address + access + sending data word)

    • 4 x (1 + 40 + 1) = 168= 168 cycles for each miss

7.2


Memory interleaving

4 bytes

4 bytes

CPU

CPU

Bus

Bus

Cache

Cache

Bus

1

1

1

1

40

40

40

40

1

1

1

1

Memory

Bus

Bus

Bus

Bus

Memory3

Memory2

Memory1

Memory0

Memory Interleaving

Interleaving

Default

Begin accessing one word, and while waiting, start accessing other three words (pipelining)

Must finish accessing one word before starting the next access

(1+40+1)x4 = 168 cycles

45 cycles

Requires 4 separate memories, each 1/4 size

Interleaving worksperfectly with caches

Spread out addresses among the memories

Sophisticated DRAMs (EDO, SDRAM, etc.) provide support for this

7.2


The issue of writes

Block with index 1000:

tag

word 3

word 2

word 0

word 1

V

1

3000

23

322

355

2

The issue of Writes

Ö

On a read miss, we read the entire block from memory into the cache

On a write hit, we write one word into the block. The other words in theblock are unchanged.

Ö

On a write miss, we write one word into the block and update the tag.

4334

2420

Perform a write to a location with index 1000, tag 2420, word 1 (value 4334)

The other words are still the old data (for tag 3000). Bad news!

Solution 1: Don’t update the cache on a write miss. Write only to memory.

Solution 2:On a write miss, first read the referenced block in (including the old value of the word being written), then write the new word into the cache and write-through to memory.

7.2


Choosing a block size

Choosing a block size

  • Large block sizes help with spatial locality, but...

    • It takes time to read the memory in

      • Larger block sizes increase the time for misses

    • It reduces the number of blocks in the cache

      • Number of blocks = cache size/block size

  • Need to find a middle ground

    • 16-64 bytes works nicely

7.2


Other cache organizations

V

Tag

Data

V

Tag

Data

0:

1:

2

3:

4:

5:

6:

7:

8

9:

10:

11:

12:

13:

14:

15:

Other Cache organizations

Fully Associative

Direct Mapped

Index

No Index

Each address has only one possible location

Address = Tag | Index | Block offset

Address = Tag | Block offset

7.3


Fully associative vs direct mapped

Fully Associative vs. Direct Mapped

  • Fully associative caches provide much greater flexibility

    • Nothing gets “thrown out” of the cache until it is completely full

  • Direct-mapped caches are more rigid

    • Any cached data goes directly where the index says to, even if the rest of the cache is empty

  • A problem, though...

    • Fully associative caches require a complete search through all the tags to see if there’s a hit

    • Direct-mapped caches only need to look one place

7.3


A compromise

V

Tag

Data

V

Tag

Data

0:

0:

1:

1:

2:

3:

2:

4:

5:

3:

6:

7:

A Compromise

4-Way set associative

2-Way set associative

Each address has four possible

locations with the same index

Each address has two possible

locations with the same index

One fewer index bit: 1/2 the indexes

Two fewer index bits: 1/4 the indexes

Address = Tag | Index | Block offset

Address = Tag | Index | Block offset

7.3


Set associative example

Index

Index

Index

V

Tag

Data

V

Tag

Data

V

Tag

Data

000:

0

0

0

00:

001:

0

0

0

0:

010:

0

0

0

01:

011:

0

0

0

100:

0

0

0

10:

101:

0

0

0

1:

110:

0

0

0

11:

111:

0

0

0

Byte offset (2 bits)Block offset (2 bits)Index (1-3 bits)Tag (3-5 bits)

Set Associative Example

128-byte cache, 4-word blocks, 10 bit addresses,1-4 way assocativity

0100111000

0100111000

0100111000

Miss

Miss

Miss

Miss

Miss

Miss

1100110100

1100110100

1100110100

Miss

Hit

Hit

0100111100

0100111100

0100111100

Miss

Miss

Miss

0110110000

0110110000

0110110000

1100111000

Miss

1100111000

Miss

1100111000

Hit

-

010

011

110

110

010

1

-

01001

1

-

11001

1

-

01101

-

0100

1100

1

1

-

1100

0110

1

Direct-Mapped

2-Way Set Assoc.

4-Way Set Assoc.

7.3


New performance numbers

New Performance Numbers

Miss rates for DEC 3100 (MIPS machine)

Separate 64KB Instruction/Data Caches (4K 4-word blocks)

BenchmarkAssociativityInstructionData missCombined

ratemiss rate

gccDirect2.0%1.7%1.9%

gcc2-way1.6%1.4%1.5%

gcc4-way1.6%1.4%1.5%

spiceDirect0.3%0.6%0.4%

spice2-way0.3%0.6%0.4%

spice4-way0.3%0.6%0.4%

7.3


Block replacement strategies

Block Replacement Strategies

  • We have to replace a block when there is a collision

    • Collisions occur whenever the selected set is full

  • Strategy 1: Ideal (Oracle)

    • Replace the block that won’t be used again for the longest time

    • Drawback - Requires knowledge of the future

  • Strategy 2: Least Recently Used (LRU)

    • Replace the block that was last used (hit) the longest time ago

    • Drawback - Requires difficult bookkeeping

  • Strategy 3: Approximate LRU

    • Set a use bit for each block every time it is hit, clear all periodically

    • Replace a block without its use bit set

  • Strategy 4: Random

    • Pick a block at random (works almost as well as approx. LRU)

7.5


The three c s of misses

The Three C’s of Misses

  • Compulsory Misses

    • The first time a memory location is accessed, it is always a miss

    • Also known as cold-start misses

    • Only way to decrease miss rate is to increase the block size

  • Capacity Misses

    • Occur when a program is using more data than can fit in the cache

    • Some misses will result because the cache isn’t big enough

    • Increasing the size of the cache solves this problem

  • Conflict Misses

    • Occur when a block forces out another block with the same index

    • Increasing Associativityreduces conflict misses

    • Worst in Direct-Mapped, non-existent in Fully Associative

7.5


  • Login