Prof sirer cs 316 cornell university
Download
1 / 22

Multilevel Memory Caches - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

Prof. Sirer CS 316 Cornell University. Multilevel Memory Caches. Storage Hierarchy. SRAM on chip. Technology Capacity Cost/GB Latency Tape 1 TB $.17 100s Disk 300 GB $.34 4ms DRAM 4GB $520 20ns SRAM off 512KB $123000 5ns SRAM on 16 KB ??? 2ns

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Multilevel Memory Caches' - fausto


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Prof sirer cs 316 cornell university

Prof. Sirer

CS 316

Cornell University

Multilevel MemoryCaches


Storage hierarchy
Storage Hierarchy

SRAM

on chip

Technology Capacity Cost/GB Latency

Tape 1 TB $.17 100s

Disk 300 GB $.34 4ms

DRAM 4GB $520 20ns

SRAM off 512KB $123000 5ns

SRAM on 16 KB ??? 2ns

Capacity and latency are closely coupled, cost is inversely proportional

How do we create the illusion of large and fast memory?

SRAM

off chip

DRAM

Disk

Tape


Memory hierarchy
Memory Hierarchy

  • Principle: Hide latency using small, fast memories called caches

  • Caches exploit locality

    • Temporal locality: If a memory location is referenced, it is likely to be referenced again in the near future

    • Spatial locality: If a memory location is referenced, other locations near it will be referenced in the near future


Cache lookups read
Cache Lookups (Read)

  • Look at address issued by processor, search cache tags to see if that block is in the cache

    • Hit: Block is in the cache, return requested data

    • Miss: Block is not in the cache, read line from memory, evict an existing line from the cache, place new line in cache, return requested data


Cache organization
Cache Organization

  • Cache has to be fast and small

    • Gain speed by performing lookups in parallel, requires die real estate

    • Reduce hardware required by limiting where in the cache a block might be placed

  • Three common designs

    • Fully associative: Block can be anywhere in the cache

    • Direct mapped: Block can only be in one line in the cache

    • Set-associative: Block can be in a few (2 to 8) places in the cache


Tags and offsets
Tags and Offsets

  • Cache block size determines cache organization

31 Virtual Address 0

31 Tag 5

4 Offset 0

Block


Fully associative cache
Fully Associative Cache

V

Tag

Block

=

word/byte

select

line

select

Offset Tag

=

hit encode


Direct mapped cache
Direct Mapped Cache

V

Tag

Block

Offset Index Tag

=


2 way set associative cache
2-Way Set-Associative Cache

V

Tag

Block

V

Tag

Block

Offset Index Tag

=

=


Valid bits
Valid Bits

  • Valid bits indicate whether cache line contains an up-to-date copy of the values in memory

    • Must be 1 for a hit

    • Reset to 0 on power up

  • An item can be removed from the cache by setting its valid bit to 0


Eviction
Eviction

  • Which cache line should be evicted from the cache to make room for a new line?

    • Direct-mapped

      • no choice, must evict line selected by index

    • Associative caches

      • random: select one of the lines at random

      • round-robin: similar to random

      • FIFO: replace oldest line

      • LRU: replace line that has not been used in the longest time


Cache writes
Cache Writes

Memory

DRAM

  • No-Write

    • writes invalidate the cache and go to memory

  • Write-Through

    • writes go to main memory and cache

  • Write-Back

    • write cache, write main memory only when block is evicted

CPU

addr

Cache

SRAM

data


Dirty bits and write back buffers
Dirty Bits and Write-Back Buffers

  • Dirty bits indicate which lines have been written

  • Dirty bits enable the cache to handle multiple writes to the same cache line without having to go to memory

  • Write-back buffer

    • A queue where dirty lines are placed

    • Items added to the end as dirty lines are evicted from the cache

    • Items removed from the front as memory writes are completed

D

V

Tag

Data Byte 0, Byte 1 … Byte N

Line

1

0

1

1

1

0


Misses
Misses

  • Three types of misses

    • Cold

      • The line is being referenced for the first time

    • Capacity

      • The line was evicted because the cache was not large enough

    • Conflict

      • The line was evicted because of another access whose index conflicted


Cache design
Cache Design

  • Need to determine parameters

    • Block size

    • Number of ways

    • Eviction policy

    • Write policy

    • Separate I-cache from D-cache


Virtual vs physical caches
Virtual vs. Physical Caches

Memory

DRAM

CPU

  • L1 (on-chip) caches are typically virtual

  • L2 (off-chip) caches are typically physical

addr

Cache

SRAM

MMU

data

Cache works on physical addresses

Memory

DRAM

CPU

addr

Cache

SRAM

MMU

data

Cache works on virtual addresses


Cache conscious programming
Cache Conscious Programming

int a[NCOL][NROW];

int sum = 0;

for(i = 0; i < NROW; ++i)

for(j = 0; j < NCOL; ++j)

sum += a[j][i];

  • Speed up this program


Cache conscious programming1
Cache Conscious Programming

int a[NCOL][NROW];

int sum = 0;

for(j = 0; j < NCOL; ++j)

for(i = 0; i < NROW; ++i)

sum += a[j][i];

  • Every access is a cache miss!


Cache conscious programming2
Cache Conscious Programming

int a[NCOL][NROW];

int sum = 0;

for(i = 0; i < NROW; ++i)

for(j = 0; j < NCOL; ++j)

sum += a[j][i];

  • Same program, trivial transformation, 3 out of four accesses hit in the cache


ad