Prof sirer cs 316 cornell university
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

Multilevel Memory Caches PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on
  • Presentation posted in: General

Prof. Sirer CS 316 Cornell University. Multilevel Memory Caches. Storage Hierarchy. SRAM on chip. TechnologyCapacityCost/GBLatency Tape1 TB$.17100s Disk300 GB$.344ms DRAM4GB$52020ns SRAM off512KB$1230005ns SRAM on16 KB???2ns

Download Presentation

Multilevel Memory Caches

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Prof sirer cs 316 cornell university

Prof. Sirer

CS 316

Cornell University

Multilevel MemoryCaches


Storage hierarchy

Storage Hierarchy

SRAM

on chip

TechnologyCapacityCost/GBLatency

Tape1 TB$.17100s

Disk300 GB$.344ms

DRAM4GB$52020ns

SRAM off512KB$1230005ns

SRAM on16 KB???2ns

Capacity and latency are closely coupled, cost is inversely proportional

How do we create the illusion of large and fast memory?

SRAM

off chip

DRAM

Disk

Tape


Memory hierarchy

Memory Hierarchy

  • Principle: Hide latency using small, fast memories called caches

  • Caches exploit locality

    • Temporal locality: If a memory location is referenced, it is likely to be referenced again in the near future

    • Spatial locality: If a memory location is referenced, other locations near it will be referenced in the near future


Cache lookups read

Cache Lookups (Read)

  • Look at address issued by processor, search cache tags to see if that block is in the cache

    • Hit: Block is in the cache, return requested data

    • Miss: Block is not in the cache, read line from memory, evict an existing line from the cache, place new line in cache, return requested data


Cache organization

Cache Organization

  • Cache has to be fast and small

    • Gain speed by performing lookups in parallel, requires die real estate

    • Reduce hardware required by limiting where in the cache a block might be placed

  • Three common designs

    • Fully associative: Block can be anywhere in the cache

    • Direct mapped: Block can only be in one line in the cache

    • Set-associative: Block can be in a few (2 to 8) places in the cache


Tags and offsets

Tags and Offsets

  • Cache block size determines cache organization

31 Virtual Address 0

31 Tag 5

4 Offset 0

Block


Fully associative cache

Fully Associative Cache

V

Tag

Block

=

word/byte

select

line

select

Offset Tag

=

hit encode


Direct mapped cache

Direct Mapped Cache

V

Tag

Block

Offset Index Tag

=


2 way set associative cache

2-Way Set-Associative Cache

V

Tag

Block

V

Tag

Block

Offset Index Tag

=

=


Valid bits

Valid Bits

  • Valid bits indicate whether cache line contains an up-to-date copy of the values in memory

    • Must be 1 for a hit

    • Reset to 0 on power up

  • An item can be removed from the cache by setting its valid bit to 0


Eviction

Eviction

  • Which cache line should be evicted from the cache to make room for a new line?

    • Direct-mapped

      • no choice, must evict line selected by index

    • Associative caches

      • random: select one of the lines at random

      • round-robin: similar to random

      • FIFO: replace oldest line

      • LRU: replace line that has not been used in the longest time


Cache writes

Cache Writes

Memory

DRAM

  • No-Write

    • writes invalidate the cache and go to memory

  • Write-Through

    • writes go to main memory and cache

  • Write-Back

    • write cache, write main memory only when block is evicted

CPU

addr

Cache

SRAM

data


Dirty bits and write back buffers

Dirty Bits and Write-Back Buffers

  • Dirty bits indicate which lines have been written

  • Dirty bits enable the cache to handle multiple writes to the same cache line without having to go to memory

  • Write-back buffer

    • A queue where dirty lines are placed

    • Items added to the end as dirty lines are evicted from the cache

    • Items removed from the front as memory writes are completed

D

V

Tag

Data Byte 0, Byte 1 … Byte N

Line

1

0

1

1

1

0


Misses

Misses

  • Three types of misses

    • Cold

      • The line is being referenced for the first time

    • Capacity

      • The line was evicted because the cache was not large enough

    • Conflict

      • The line was evicted because of another access whose index conflicted


Cache design

Cache Design

  • Need to determine parameters

    • Block size

    • Number of ways

    • Eviction policy

    • Write policy

    • Separate I-cache from D-cache


Virtual vs physical caches

Virtual vs. Physical Caches

Memory

DRAM

CPU

  • L1 (on-chip) caches are typically virtual

  • L2 (off-chip) caches are typically physical

addr

Cache

SRAM

MMU

data

Cache works on physical addresses

Memory

DRAM

CPU

addr

Cache

SRAM

MMU

data

Cache works on virtual addresses


Cache conscious programming

Cache Conscious Programming

int a[NCOL][NROW];

int sum = 0;

for(i = 0; i < NROW; ++i)

for(j = 0; j < NCOL; ++j)

sum += a[j][i];

  • Speed up this program


Cache conscious programming1

Cache Conscious Programming

int a[NCOL][NROW];

int sum = 0;

for(j = 0; j < NCOL; ++j)

for(i = 0; i < NROW; ++i)

sum += a[j][i];

  • Every access is a cache miss!


Cache conscious programming2

Cache Conscious Programming

int a[NCOL][NROW];

int sum = 0;

for(i = 0; i < NROW; ++i)

for(j = 0; j < NCOL; ++j)

sum += a[j][i];

  • Same program, trivial transformation, 3 out of four accesses hit in the cache


  • Login