Week 9 Lecture slides

Cosc 3P92 Week 9 Lecture slides The human brain starts working the moment you are born and never stops until you stand up to speak in public. George Jessel

Memory Organization • In a typical computer system, the storage system is organized according to the following hierarchy: slow access (1-5 s.) and large capacity (almost unlimited) Archival Storage (magnetic tape or photographic) Moving head disk (magnetic or optical) High speed drum decreasing access time Charge Coupled Device decreasing cost/bit Main memory Cache Internal fast access (1-20 ns.) and small capacity (1-4K byte)

Memory speed • Access time (Ta) • the average time taken to read a unit of information e.g., 100 ns (100 x 10**-9 s) • Access rate (Ra) = 1/Ta (bits/second) e.g., 1/100ns = 10 Mb/s • Cycle time (Tc) • the average time lapse between two successive read operations e.g., 500 ns (500 x 10**-9 s) • Bandwidth or transfer rate (Rc) = 1/Tc (bits/second) e.g., 1/500ns = 2 Mb/s

Classes of Memory • RAM (“normal memory”) • Direct-access storage: HD, CD ROM, DVD • Sequential access storage tapes: DAT • Associative (content-addressable) memory: searches for data via bit patterns • CAM (Content Addressable Memory) • Includes comparison logic with each bit of storage. • A data value is broadcast to all words of storage and compared with the values there. • Words which match are flagged. • Subsequent operations can then work on flagged words. • (computing-dictionary.thefreedictionary.com) • ROM

RAM ROM Bipolar MOS magnetic semiconductor core static dynamic Mask PROM ROM Mask PROM EPROM, ROM EAROM Categories of RAM and ROM primary memory

10 A9-A0 1K x 4 RAM chip 4 WE D3-D0 CS Main Memory Design • CS WE MODE Status of the Power • Bi-directional • Datelines D3-D0 • H X not selected High impedance Standby • L L Write Acts as input bus Active • L H Read Acts as output bus Active

Main Memory Design Q. How do we build a 4K x 4 RAM using four 1K x 4 RAM chips? Chip A11 A10 A9 A8 A7 . . . A0 Range 0 0 0 x x x . . . x 0000 to 1023 1 0 1 x x x . . . x 1024 to 2047 2 1 0 x x x . . . x 2048 to 3071 3 1 1 x x x . . . x 3072 to 4096

log2 n 1-of-n Decoder Processor n 1 2 Enable n Enable 2 Addr bus Enable 1 Memory bank (On bus in parallel ) Main Memory Design • Q. How do we build a 256KB RAM system with an 16-bit address bus and four 64KB RAM chips? • Memory band-switching

Data bus 4-bit Base base address Processor 20 bit Physical address to memory 16-bit Address bus Offset Main memory design • Memory address extension

c a c h e C a c h e external storage CPU Main memory Cache Memory • Cache: fast-access memory buffer • locality principle: programs usually use limited memory areas, in contrast to totally random access • spatial: location, address • temporal: time accessed • if commonly used memory can be buffered in high-speed cache, overall performance enhanced • cache takes form of small amount of store, with hardwaresupport for maintenance and lookup • each cache cell saves a cache line - block of main memory (4-64 words) • cache hit: • requested memory resides in cache

Cache • cache miss: • requested memory not in cache, and must be fetched from main memory and put into cache • unified cache: • instns, data share same cache • split cache: • separate instn, data caches • parallel access: • double the bandwidth • level 2 cache: • between instn/data cache and main memory • Cache maintenance algorithms similar in spirit to virtual memory ideas at operating system level; main difference is that cache is hardware-supported, whereas v.m. is software implemented

Measuring cache performance • c - cache access time • m - main memory access time • hr - hit ratio ( 0 <= hr <= 1) : • # cache hits / total memory requests • mr - miss ratio (1-hr) • mean access time = c + (1-hr)m • if hr --> 1 then m.a.t. = c • if hr --> 0 then m.a.t. = c + m

Cache Example • example: let c = 160 ns m = 960 ns h = .90 (common)mean = 160 + (1-.90)960 = 256 nsefficiency = c / mean = 160/256 = 62.5%

Main memory 0 1 Cache 0 i i mod N N-1 M-1 Direct mapping

Direct Mapped Cache

Direct mapping • use a hash function to find cache location • normally, modulo some bit field of address, then just use low end field • cache fields: • valid bit • tag - block # being held • value - data block • scheme: • memory request: • compute cache slot (low n bits) • check block (tag) field • hit: return value • miss: fetch block from memory, give to CPU, and put into that computed slot (replace existing item if there) • can occasionally produce thrashing • eg. addresses that are multiple of cache size (64K) will reside at same entry • split instn/data cache helps avoid thrashing

Main memory 0 1 Cache Set 0 S blocks per set Set 1 i Set i mod (N/S) Set N/S - 1 M-1 Set associative mapping

Set associative mapping

Set associative mapping • [4.39] • use same hash function as direct mapping, except that each cache slot holds multiple data blocks • usually max. 4 blocks (“4-way”) • searching blocks in a slot done associatively: simultaneous pattern matching • more flexible than direct: multiple blocks in set • use smaller tag than associative, therefore cheaper to implement associative matching • commonly used in larger systems (VAX 11-780) • which line should be replaced when slot full? • eg. LRU (least recently used)

Writing back to the memory • only write to memory if cache data modified. • write back (write-deferred): • (i) use a modified bit. When swapping a cache slot or ending job, write slot if its modified bit is set • write through: • (ii) whenever modifying data, always write it back to main memory • have to do this if memory being shared in a DMA or multiprocessing system

Example: direct mapping 4 byte blocks 1 byte words 8 slots in cache

Example (cont)

The end

Week 9 Lecture slides