memory hierarchies n.
Skip this Video
Loading SlideShow in 5 Seconds..
Memory hierarchies PowerPoint Presentation
Download Presentation
Memory hierarchies

Loading in 2 Seconds...

play fullscreen
1 / 22

Memory hierarchies - PowerPoint PPT Presentation

  • Uploaded on

Memory hierarchies. An introduction to the use of cache memories as a technique for exploiting the ‘locality-of-reference’ principle. Recall system diagram. Main memory. CPU. system bus. I/O. I/O. I/O. I/O. I/O. SRAM versus DRAM.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Memory hierarchies' - kami

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
memory hierarchies

Memory hierarchies

An introduction to the use of cache memories as a technique for exploiting the ‘locality-of-reference’ principle

recall system diagram
Recall system diagram




system bus






sram versus dram
SRAM versus DRAM
  • Random Access Memory uses two distinct technologies: ‘static’ and ‘dynamic’
  • Static RAM is much faster
  • Dynamic RAM is much cheaper
  • Static RAM offers ‘persistent’ storage
  • Dynamic RAM provides ‘volatile’ storage
  • Our workstations have 1GB of DRAM, but only 512KB of SRAM (ratio: 2-million to 1)
data access time
Data access time
  • The time-period between the supplying of the data access signal and the output (or acceptance) of the data by the addressed subsystem
  • FETCH-OPERATION: cpu places address for a memory-location on the system bus, and memory-system responds by placing memory-location’s data on the system bus
comparative access timings
Comparative access timings
  • Processor registers: 5 nanoseconds
  • SRAM memory: 15 nanoseconds
  • DRAM memory: 60 nanoseconds
  • Magnetic disk: 10 milliseconds
  • Optical disk: (even slower)
system design decisions
System design decisions
  • How much memory?
  • How fast?
  • How expensive?
  • Tradeoffs and compromises
  • Modern systems employ a ‘hybrid’ design in which small, fast, expensive SRAM is supplemented by larger, slower, cheaper DRAM
memory hierarchy
Memory hierarchy


‘word’ transfers


‘burst’ transfers


instructions versus data
Instructions versus Data
  • Modern system designs frequently use a pair of separate cache memories, one for storing processor instructions and another for storing the program’s data
  • Why is this a good idea?
  • Let’s examine the way a cache works
fetching program instructions
Fetching program-instructions
  • CPU fetches its next program-instruction by placing the value from register EIP on the system bus and issuing a ‘read’ signal
  • If that instruction has not previously been fetched, it will not be available in the ‘fast’ cache memory, so it must be gotten from the ‘slower’ main memory; however, it will be ‘saved’ in the cache in case it’s needed again very soon (e.g., in a program loop)
most programs use loops
Most programs use ‘loops’


int sum = 0, i = 0;

do {

sum += array[ i ];

++ i;


while ( i < 64 );

assembly language loop
Assembly language loop

movl $0, %eax # initialize accumulator

movl $0, %esi # initialize array-index


addl array(%esi, 4), %eax # add next int

incl %esi # increment array-index

cmp $64, %esi # check for exit-condition

jae finis # index is beyond bounds

jmp again # else do another addition


benefits of cache mechanism
Benefits of cache-mechanism
  • During the first pass through the loop, all of the loop-instructions must be fetched from the (slow) DRAM memory
  • But loops are typically short – maybe only a few dozen bytes – and multiple bytes can be fetched together in a single ‘burst’
  • Thus subsequent passes through the loop can fetch all of these same instructions from the (fast) SRAM memory
locality of reference
  • The ‘locality-of-reference’ principle is says that, with most computer programs, when the CPU fetches an instruction to execute, it is very likely that the next instruction it fetches will be located at a nearly address
accessing data operands
Accessing data-operands
  • Accessing a program’s data differs from accessing its instructions in that the data consists of ‘variables’ as well ‘constants’
  • Instructions are constant, but data may be modified (i.e., may ‘write’ as well as ‘read’)
  • This introduces the problem known as ‘cache coherency’ – if a new value gets assigned to a variable, its cached value will be modified, but what will happen to its original memory-value?
cache write policies
Cache ‘write’ policies
  • ‘Write-Through’ policy: a modified variable in the cache is immediately copied back to the original memory-location, to insure that the memory and the cache are kept ‘consistent’ (synchronized)
  • ‘Write-Back’ policy: a modified variable in the cache is not immediately copied back to its original memory-location – some values nearby might also get changed soon, in which case all adjacent changes could be done in one ‘burst’
locality of reference for data
Locality-of-reference for data

// EXAMPLE (from array-processing):

int price[64], int quantity[64], revenue[64];

for (int i = 0; i < 64; i++)

revenue[ i ] = price[ i ] * quantity[ i ];

cache hit or cache miss
Cache ‘hit’ or cache ‘miss’
  • When the CPU accesses an address, it first looks in the (fast) cache to see if the address’s value is perhaps already there
  • In case the CPU finds that address-value is in the cache, this is called a ‘cache hit’
  • Otherwise, if the CPU finds that the value it wants to access is not presently in the cache, this is called a ‘cache miss’
cache line replacement
Cache-Line Replacement
  • Small-size cache-memory quickly fills up
  • So some ‘stale’ items need to be removed in order to make room for the ‘fresh’ items
  • Various policies exist for ‘replacement’
  • But ‘write-back’ of any data in the cache which is ‘inconsistent’ with data in memory can safely be deferred until it’s time for the cached data to be ‘replaced’
performance impact
Performance impact


y = 75 – 60x



If x = .90, then y = 21ns

If x = .95, then y = 18ns









Access frequency (proportion of cache-hits)

control register 0
Control Register 0
  • Privileged software (e.g., a kernel module) can disable Pentium’s cache-mechanism (by using some inline assembly language)

asm(“ movl %cr0, %eax “);

asm(“ orl $0x60000000, %eax “);

asm(“ movl %eax, %cr0 “);

31 30 29 18 16 5 4 3 2 1 0












in class demonstration
In-class demonstration
  • Install ‘nocache.c’ module to turn off cache
  • Execute ‘cr0.cpp’ to verify cache disabled
  • Run ‘sieve.cpp’ using the ‘time’ command to measure performance w/o caching
  • Remove ‘nocache’ to re-enable the cache
  • Run ‘sieve.cpp’ again to measure speedup
  • Run ‘fsieve.cpp’ for comparing the speeds of memory-access versus disk-access
in class exercises
In-class exercises
  • Try modifying the ‘sieve.cpp’ source-code so that each array-element is stored at an address which is in a different cache-line (cache-line size on Pentium is 32 bytes)
  • Try modifying the ‘fsieve.cpp’ source-code to omit the file’s ‘O_SYNC’ flag; also try shrinking the ‘m’ parameter so that more than one element is stored in a disk-sector (size of each disk-sector is 512 bytes)