memory hierarchies n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Memory hierarchies PowerPoint Presentation
Download Presentation
Memory hierarchies

Loading in 2 Seconds...

play fullscreen
1 / 22

Memory hierarchies - PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on

Memory hierarchies. An introduction to the use of cache memories as a technique for exploiting the ‘locality-of-reference’ principle. Recall system diagram. Main memory. CPU. system bus. I/O. I/O. I/O. I/O. I/O. SRAM versus DRAM.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Memory hierarchies' - kami


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
memory hierarchies

Memory hierarchies

An introduction to the use of cache memories as a technique for exploiting the ‘locality-of-reference’ principle

recall system diagram
Recall system diagram

Main

memory

CPU

system bus

I/O

I/O

I/O

I/O

I/O

sram versus dram
SRAM versus DRAM
  • Random Access Memory uses two distinct technologies: ‘static’ and ‘dynamic’
  • Static RAM is much faster
  • Dynamic RAM is much cheaper
  • Static RAM offers ‘persistent’ storage
  • Dynamic RAM provides ‘volatile’ storage
  • Our workstations have 1GB of DRAM, but only 512KB of SRAM (ratio: 2-million to 1)
data access time
Data access time
  • The time-period between the supplying of the data access signal and the output (or acceptance) of the data by the addressed subsystem
  • FETCH-OPERATION: cpu places address for a memory-location on the system bus, and memory-system responds by placing memory-location’s data on the system bus
comparative access timings
Comparative access timings
  • Processor registers: 5 nanoseconds
  • SRAM memory: 15 nanoseconds
  • DRAM memory: 60 nanoseconds
  • Magnetic disk: 10 milliseconds
  • Optical disk: (even slower)
system design decisions
System design decisions
  • How much memory?
  • How fast?
  • How expensive?
  • Tradeoffs and compromises
  • Modern systems employ a ‘hybrid’ design in which small, fast, expensive SRAM is supplemented by larger, slower, cheaper DRAM
memory hierarchy
Memory hierarchy

CPU

‘word’ transfers

CACHE MEMORY

‘burst’ transfers

MAIN MEMORY

instructions versus data
Instructions versus Data
  • Modern system designs frequently use a pair of separate cache memories, one for storing processor instructions and another for storing the program’s data
  • Why is this a good idea?
  • Let’s examine the way a cache works
fetching program instructions
Fetching program-instructions
  • CPU fetches its next program-instruction by placing the value from register EIP on the system bus and issuing a ‘read’ signal
  • If that instruction has not previously been fetched, it will not be available in the ‘fast’ cache memory, so it must be gotten from the ‘slower’ main memory; however, it will be ‘saved’ in the cache in case it’s needed again very soon (e.g., in a program loop)
most programs use loops
Most programs use ‘loops’

// EXAMPLE:

int sum = 0, i = 0;

do {

sum += array[ i ];

++ i;

}

while ( i < 64 );

assembly language loop
Assembly language loop

movl $0, %eax # initialize accumulator

movl $0, %esi # initialize array-index

again:

addl array(%esi, 4), %eax # add next int

incl %esi # increment array-index

cmp $64, %esi # check for exit-condition

jae finis # index is beyond bounds

jmp again # else do another addition

finis:

benefits of cache mechanism
Benefits of cache-mechanism
  • During the first pass through the loop, all of the loop-instructions must be fetched from the (slow) DRAM memory
  • But loops are typically short – maybe only a few dozen bytes – and multiple bytes can be fetched together in a single ‘burst’
  • Thus subsequent passes through the loop can fetch all of these same instructions from the (fast) SRAM memory
locality of reference
‘Locality-of-Reference’
  • The ‘locality-of-reference’ principle is says that, with most computer programs, when the CPU fetches an instruction to execute, it is very likely that the next instruction it fetches will be located at a nearly address
accessing data operands
Accessing data-operands
  • Accessing a program’s data differs from accessing its instructions in that the data consists of ‘variables’ as well ‘constants’
  • Instructions are constant, but data may be modified (i.e., may ‘write’ as well as ‘read’)
  • This introduces the problem known as ‘cache coherency’ – if a new value gets assigned to a variable, its cached value will be modified, but what will happen to its original memory-value?
cache write policies
Cache ‘write’ policies
  • ‘Write-Through’ policy: a modified variable in the cache is immediately copied back to the original memory-location, to insure that the memory and the cache are kept ‘consistent’ (synchronized)
  • ‘Write-Back’ policy: a modified variable in the cache is not immediately copied back to its original memory-location – some values nearby might also get changed soon, in which case all adjacent changes could be done in one ‘burst’
locality of reference for data
Locality-of-reference for data

// EXAMPLE (from array-processing):

int price[64], int quantity[64], revenue[64];

for (int i = 0; i < 64; i++)

revenue[ i ] = price[ i ] * quantity[ i ];

cache hit or cache miss
Cache ‘hit’ or cache ‘miss’
  • When the CPU accesses an address, it first looks in the (fast) cache to see if the address’s value is perhaps already there
  • In case the CPU finds that address-value is in the cache, this is called a ‘cache hit’
  • Otherwise, if the CPU finds that the value it wants to access is not presently in the cache, this is called a ‘cache miss’
cache line replacement
Cache-Line Replacement
  • Small-size cache-memory quickly fills up
  • So some ‘stale’ items need to be removed in order to make room for the ‘fresh’ items
  • Various policies exist for ‘replacement’
  • But ‘write-back’ of any data in the cache which is ‘inconsistent’ with data in memory can safely be deferred until it’s time for the cached data to be ‘replaced’
performance impact
Performance impact

y

y = 75 – 60x

75ns

Examples:

If x = .90, then y = 21ns

If x = .95, then y = 18ns

60ns

Average

access

time

15ns

0

x

1

Access frequency (proportion of cache-hits)

control register 0
Control Register 0
  • Privileged software (e.g., a kernel module) can disable Pentium’s cache-mechanism (by using some inline assembly language)

asm(“ movl %cr0, %eax “);

asm(“ orl $0x60000000, %eax “);

asm(“ movl %eax, %cr0 “);

31 30 29 18 16 5 4 3 2 1 0

PG

CD

NW

AM

WP

NE

ET

TS

EM

MP

PE

in class demonstration
In-class demonstration
  • Install ‘nocache.c’ module to turn off cache
  • Execute ‘cr0.cpp’ to verify cache disabled
  • Run ‘sieve.cpp’ using the ‘time’ command to measure performance w/o caching
  • Remove ‘nocache’ to re-enable the cache
  • Run ‘sieve.cpp’ again to measure speedup
  • Run ‘fsieve.cpp’ for comparing the speeds of memory-access versus disk-access
in class exercises
In-class exercises
  • Try modifying the ‘sieve.cpp’ source-code so that each array-element is stored at an address which is in a different cache-line (cache-line size on Pentium is 32 bytes)
  • Try modifying the ‘fsieve.cpp’ source-code to omit the file’s ‘O_SYNC’ flag; also try shrinking the ‘m’ parameter so that more than one element is stored in a disk-sector (size of each disk-sector is 512 bytes)