chapter 4 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Chapter 4 PowerPoint Presentation
Download Presentation
Chapter 4

Loading in 2 Seconds...

play fullscreen
1 / 41

Chapter 4 - PowerPoint PPT Presentation


  • 127 Views
  • Uploaded on

Chapter 4. Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples. Memory: Location. Registers : inside cpu Fastest – on CPU chip Cache : very fast, semiconductor, close to CPU Internal or main memory Typically semiconductor media (transistors)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Chapter 4' - izzy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
chapter 4

Chapter 4

Cache Memory

4.1 Memory system

4.2 Cache principles

4.3 Cache design

4.4 Examples

SYSC 2001 - F09. SYSC2001-Ch4.ppt

memory location
Memory: Location
  • Registers: inside cpu
    • Fastest – on CPU chip
  • Cache: very fast, semiconductor, close to CPU
  • Internal or main memory
    • Typically semiconductor media (transistors)
    • Fast, random access, on system bus
  • External or secondary memory
    • peripheral storage devices (e.g. disk, tape)
    • Slower, often magnetic media , maybe slower bus

SYSC 2001 - F09. SYSC2001-Ch4.ppt

memory capacity
Memory: Capacity
  • Word size: # of bits in natural unit of organization
    • Usually related to length of an instruction or the number of bits used to represent an integer number
  • Capacity expressed as number of words or number of bytes
    • Usually a power of 2, e.g. 1 KB  1024 bytes

why?

SYSC 2001 - F09. SYSC2001-Ch4.ppt

other memory system characteristics
Other Memory System Characteristics
  • Unit of Transfer: Number of bits read from, or written into memory at a time
    • Internal : usually governed by data bus width
    • External : usually a block of words e.g 512 or more
  • Addressable unit: smallest location which can be uniquely addressed
    • Internal : word or byte
    • External : device dependent e.g. a disk “cluster”

SYSC 2001 - F09. SYSC2001-Ch4.ppt

sequential access method
Sequential Access Method
  • Start at the beginning – read through in order
  • Access time depends on location of data and previous location
  • e.g. tape

start

first location

. . .

read to here

location of interest

SYSC 2001 - F09. SYSC2001-Ch4.ppt

direct access
Direct Access
  • Individual blocks have unique address
  • Access is by jumping to vicinity plus sequential search (or waiting! e.g. waiting for disk to rotate)
  • Access time depends on target location and previous location
  • e.g. disk

. . .

jump to here

block i

read to here

SYSC 2001 - F09. SYSC2001-Ch4.ppt

random access method
Random Access Method
  • Individual addresses identify specific locations
  • Access time independent of location or previous access
  • e.g. RAM

main memory

types

Ch. 5.1

. . .

read here

SYSC 2001 - F09. SYSC2001-Ch4.ppt

performance speed
Performance (Speed)
  • Access time
    • Time between presenting the address and getting the valid data (memory or other storage)
  • Memory cycle time
    • Some time may be required for the memory to “recover” before next access
    • cycle time = access + recovery
  • Transfer rate
    • rate at which data can be moved
    • for random access memory = 1 / cycle time

(cycle time)-1

SYSC 2001 - F09. SYSC2001-Ch4.ppt

memory hierarchy
Memory Hierarchy
  • size ? speed ? cost ?
  • registers
    • in CPU
  • internal
    • may include one or more levels of cache
  • external memory
    • backing store

smallest, fastest, most expensive, most frequently accessed

medium, quick, price varies

largest, slowest, cheapest, least frequently accessed

SYSC 2001 - F09. SYSC2001-Ch4.ppt

performance hierarchy list
Performance &Hierarchy List

Faster, +$/byte

  • Registers
  • Level 1 Cache
  • Level 2 Cache
  • Main memory
  • Disk cache
  • Disk
  • Optical
  • Tape

soon

( 2 slides ! )

Slower, -$/byte

SYSC 2001 - F09. SYSC2001-Ch4.ppt

locality of reference circa 1968
Locality of Reference (circa 1968)
  • During program execution memory references tend to cluster, e.g. loops
  • Many instructions in localized areas of pgm are executed repeatedly during some time period, and remainder of pgm is accessed infrequently. (Tanenbaum)
  • Temporal LOR: a recently executed instruction is likely to be executed again soon
  • Spatial LOR: instructions with addresses close to a recently executed instruction are likely to be executed soon.
  • Same principles apply to data references.

SYSC 2001 - F09. SYSC2001-Ch4.ppt

cache

smaller than

main memory

recall “blocks” from a few slides back

Cache
  • small amount of fast memory
  • sits between normal main memory and CPU
  • may be located on CPU chip or module

cache views

main memory

as organized

in “blocks”

block

transfer

word

transfer

cache

SYSC 2001 - F09. SYSC2001-Ch4.ppt

why does caching improve speed
Why does Caching Improve Speed?

Example:

  • Main memory has 100,000 words, access time is 0.1 s.
  • Cache has 1000 words and access time is 0.01 s.
  • If word is
    • in cache (hit), it can be accessed directly by processor.
    • in memory (miss), it must be first transferred to cache before access.
  • Suppose that 95% of access requests are hits.
  • Average time to access a word (0.95)(0.01 s)+0.05(0.1 s+ 0.01 s) = 0.015 s

Key proviso

Close to cache speed

SYSC 2001 - F09. SYSC2001-Ch4.ppt

cache read operation
Cache Read Operation
  • CPU requests contents of memory location
  • check cache for contents of location

cache hit !

present get data from cache (fast)

cache miss !

not present read required block from main to cache

  • then deliver data from cache to CPU

SYSC 2001 - F09. SYSC2001-Ch4.ppt

cache design
Cache Design
  • Size
  • Mapping Function
  • Replacement Algorithm
  • Write Policy
  • Block Size
  • Number of Caches

SYSC 2001 - F09. SYSC2001-Ch4.ppt

slide16
Size
  • Cost
    • More cache is expensive
  • Speed
    • More cache is faster (up to a point)
    • Checking cache for data takes time

SYSC 2001 - F09. SYSC2001-Ch4.ppt

mapping function

cache

tagdata block

Mapping Function
  • how doescachecontents map tomain memorycontents?

main memory

address contents

000

xxx

line

blocki

. . .

blockj

use tag (and maybe

line address) to

identify block address

SYSC 2001 - F09. SYSC2001-Ch4.ppt

cache basics
Cache Basics

cache line width

bigger than

memory location

width !

  • cache line vs. main memory location
      • same concept – avoid confusion (?)
      • line has address and contents
  • contents of cache line divided into tag and data fields
    • fixed width
    • fields used differently !
    • data field holds contents of a block of main memory
    • tag field helps identify the start address of the block of memory that is in the data field

SYSC 2001 - F09. SYSC2001-Ch4.ppt

mapping function example
Mapping Function Example

holds up to 64 Kbytes

of main memory contents

  • cache of 64 KByte
    • 16 K (214) lines – each line is 5 bytes wide = 40 bits
  • 16 MBytes main memory
  • 24 bit address
    • 224= 16 M
  • will consider DIRECT and ASSOCIATIVE mappings

tag field: 1 byte

4 byte blocks

of main memory

data field: 4 bytes

SYSC 2001 - F09. SYSC2001-Ch4.ppt

direct mapping
Direct Mapping
  • each block of main memory maps to only one cache line
    • i.e. if a block is in cache, it must be in one specific place – based on address!
  • split address into two parts
    • least significantw bits identify unique word in block
    • most significants bits specify one memory block
      • split s bits into:
        • cache line address field r bits
        • tag field of s-r most significant bits

address

s w

s

line field

identifies line

containing

block !

tagline

s – r r

SYSC 2001 - F09. SYSC2001-Ch4.ppt

direct mapping address structure for example
Direct Mapping: Address Structure for Example

24 bit address

  • two blocks may have the same r value, but then always have different tag value !

word

w

tag

s-r

line address

r

14

2

8

2 bit word identifier

(4 byte block)

s = 22 bit block identifier

SYSC 2001 - F09. SYSC2001-Ch4.ppt

direct mapping cache line table
Direct Mapping Cache Line Table

each block = 4 bytes

cache linemain memory blocks held

0 0, m, 2m, 3m, … 2s-m

1 1, m+1, 2m+1, … 2s-m+1

m-1 m-1, 2m-1,3m-1, …2s-1

. . .

. . .

s=22

m=214

But…a line can contain only

one of these at a time!

SYSC 2001 - F09. SYSC2001-Ch4.ppt

direct mapping summary
Direct Mapping Summary
  • Address length = (s + w) bits
  • Number of addressable units = 2s+w words or bytes
  • Block size = line size – tag size = 2w words or bytes
  • Number of blocks in main memory = 2s+ w/2w = 2s
  • Number of lines in cache = m = 2r
  • Size of tag = (s – r) bits

SYSC 2001 - F09. SYSC2001-Ch4.ppt

direct mapping pros cons
Direct Mapping pros & cons
  • Simple
  • Inexpensive
  • Fixed location for given block
    • If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high

SYSC 2001 - F09. SYSC2001-Ch4.ppt

associative memory
Associative Memory
  • read: specify tag field value and word select
  • checks all lines – finds matching tag
    • return contents of data field @ selected word
  • access time independent of location or previous access
  • write to data field @ tag value + word select
    • what if no words with matching tag?

Ch. 4.3

SYSC 2001 - F09. SYSC2001-Ch4.ppt

associative mapping
Associative Mapping
  • main memory block can load into any line of cache
  • memory address is interpreted as tag and word select in block
  • tag uniquely identifies block of memory !
  • every line’s tag is examined for a match
  • cache searching gets expensive

s = tag does not use line address !

SYSC 2001 - F09. SYSC2001-Ch4.ppt

associative mapping address structure
Associative MappingAddress Structure
  • 22 bit tag stored with each 32 bit block of data
  • Compare tag field with tag entry in cache to check for hit
  • Least significant 2 bits of address identify which 8 bit word is required from 32 bit data block
  • e.g.
    • Address Tag Data Cache line
    • FFFFFC 3FFFFF 24682468 any, e.g. 3FFF

Word

2 bit

Tag 22 bit

SYSC 2001 - F09. SYSC2001-Ch4.ppt

associative mapping summary
Associative Mapping Summary
  • Address length = (s + w) bits
  • Number of addressable units = 2s+w words or bytes
  • Block size = line size – tag size = 2w words or bytes
  • Number of blocks in main memory = 2s+ w/2w = 2s
  • Number of lines in cache = undetermined
  • Size of tag = s bits

SYSC 2001 - F09. SYSC2001-Ch4.ppt

set associative mapping
Set Associative Mapping
  • Cache is divided into a number of sets
  • Each set contains k lines  k – way associative
  • A given block maps to any line in a given set
    • e.g. Block B can be in any line of set i
  • e.g. 2 lines per set
    • 2 – way associative mapping
    • A given block can be in one of 2 lines in only one set

SYSC 2001 - F09. SYSC2001-Ch4.ppt

set associative mapping address structure
Set Associative MappingAddress Structure
  • Use set field to determine which set of cache lines to look in (direct)
  • Within this set, compare tag fields to see if we have a hit (associative)
  • e.g
    • Address Tag Data Set number
    • FFFFFC 1FF 12345678 1FFF
    • 00FFFF 001 11223344 1FFF

Word

2 bit

Tag 9 bit

Set 13 bit

Same Set,

different Tag,

different Word

SYSC 2001 - F09. SYSC2001-Ch4.ppt

e g breaking into tag set word
e.g Breaking into Tag, Set, Word
  • Given Tag=9 bits, Set=13 bits, Word=2 bits
  • Given address FFFFFD16
  • What are values of Tag, Set, Word?
    • First 9 bits are Tag, next 13 are Set, next 2 are Word
    • Rewrite address in base 2: 1111 111111111111 11111101
    • Group each field in groups of 4 bits starting at right
    • Add zero bits as necessary to leftmost group of bits
  • 0001 1111 1111 0001 1111 111111110001
  •  1FF 1FFF 1 (Tag, Set, Word)

SYSC 2001 - F09. SYSC2001-Ch4.ppt

replacement algorithms direct mapping
Replacement Algorithms Direct Mapping
  • what if bringing in a new block, but no line available in cache?
  • must replace (overwrite) a line – which one?
  • direct  no choice
    • each block only maps to one line
  • replace that line

SYSC 2001 - F09. SYSC2001-Ch4.ppt

replacement algorithms associative set associative
Replacement Algorithms Associative & Set Associative
  • hardware implemented algorithm (speed)
  • Least Recently Used (LRU)
  • e.g. in 2-way set associative
    • which of the 2 blocks is LRU?
  • First In first Out (FIFO)
    • replace block that has been in cache longest
  • Least Frequently Used (LFU)
    • replace block which has had fewest hits
  • Random

SYSC 2001 - F09. SYSC2001-Ch4.ppt

write policy
Write Policy
  • must not overwrite a cache block unless main memory is up to date
  • Complication: Multiple CPUs may have individual caches!!
  • Complication: I/O may address main memory too (read and write)!!
  • N.B. 15% of memory references are writes

SYSC 2001 - F09. SYSC2001-Ch4.ppt

write through method
Write Through Method
  • all writes go to main memory as well as cache
  • Each of multiple CPUs can monitor main memory traffic to keep its own local cache up to date
  • lots of traffic  slows down writes

SYSC 2001 - F09. SYSC2001-Ch4.ppt

write back method
Write Back Method
  • updates initially made in cache only
  • update (dirty) bit for cache slot is set when update occurs
  • if block is to be replaced, write to main memory only if update bit is set
  • Other caches get out of sync
  • I/O must access main memory through cache

SYSC 2001 - F09. SYSC2001-Ch4.ppt

multiple caches on one processor
Multiple Caches on one processor
  • two levels – L1 close to processor (often on chip)
      • L2 – between L1 and main memory
  • check L1 first – if miss – then check L2
    • if L2 miss – get from memory

processor

L1

L2

system bus

local

bus

to high speed bus

SYSC 2001 - F09. SYSC2001-Ch4.ppt

unified vs split caches
Unified vs. Split Caches
  • unified both instruction and data in same cache
  • split separate caches for instructions and data
    • separate local busses to cache
  • increased concurrency  pipelining
    • allows instruction fetch to be concurrent with operand access

Ch. 12

SYSC 2001 - F09. SYSC2001-Ch4.ppt

pentium family cache evolution
Pentium Family Cache Evolution
  • 80386 – no on chip cache
  • 80486 – 8k using 16 byte lines and four way set associative organization
  • Pentium (all versions) – two on chip L1 (split) caches
    • data & instructions

SYSC 2001 - F09. SYSC2001-Ch4.ppt

pentium 4 cache
Pentium 4 Cache
  • Pentium 4 – split L1 caches
    • 8k bytes
    • 128 lines of 64 bytes each
    • four way set associative = 32 sets
  • unified L2 cache – feeding both L1 caches
    • 256k bytes
    • 2048 (2k) lines of 128 bytes each
    • 8 way set associative = 256 sets
  • how many bits ?
    • w words
    • s set

SYSC 2001 - F09. SYSC2001-Ch4.ppt

power pc cache evolution
Power PC Cache Evolution
  • 601 – single 32kb 8 way set associative
  • 603 – 16kb (2 x 8kb) two way set associative
  • 604 – 32kb
  • 610 – 64kb
  • G3 & G4
    • 64kb L1 cache  8 way set associative
    • 256k, 512k or 1M L2 cache  two way set associative

SYSC 2001 - F09. SYSC2001-Ch4.ppt