William Stallings Computer Organization and Architecture

William Stallings Computer Organization and Architecture Chapter 4 & 5 Cache Memory and Internal Memory Rev. by Luciano Gualà (2008)

Computer Components: Top Level View registers Rev. by Luciano Gualà (2008)

Memory • How much ? • As much as possible • How fast ? • As fast as possible • How expensive ? • As cheap as possible • Fast memory is expensive • Large memory is expensive • The larger the memory, the slower the access Rev. by Luciano Gualà (2008)

Memory Hierarchy • CPU Registers • L1 cache (on chip) • L2 cache (on board) • Main memory • Disk cache • Disk • Optical • Tape Size Access Frequency Access time Cost per bit Rev. by Luciano Gualà (2008)

Characteristics • Location • Capacity • Unit of transfer • Access method • Performance • Physical type • Physical characteristics • Organisation Rev. by Luciano Gualà (2008)

Location • CPU • Registers • Internal: access directly from CPU • Cache • RAM • External: access through I/O module • Disks • CD-ROM, … Rev. by Luciano Gualà (2008)

Capacity • Word size • The natural unit of organisation • Usually, it is equal to the numer of bits used for representing numbers or instructions • Typical word size: 8 bits, 16 bits, 32 bits • Number of words (or Bytes) 1 Byte = 8 bits = 23 bits 1 K Byte = 210 Bytes = 210 x 23 bits = 1024 bytes (Kilo) 1 M Byte = 210 K Bytes = 1024 K Bytes (Mega) 1 G Byte = 210 M Bytes = 230 Bytes (Giga) 1 T Byte = 210 G Bytes = 1024 G Bytes (Tera) Rev. by Luciano Gualà (2008)

Unit of Transfer • Number of bits can be read/written at the same time • Internal • Usually governed by data bus width • bus width may be equal to word size or (often) larger • Typical bus width: 64, 128, 256 bits • External • Usually a block which is much larger than a word • A related concept: addressable unit • Smallest location which can be uniquely addressed • Word internally • Cluster on M$ disks Rev. by Luciano Gualà (2008)

Access Methods (1) • Sequential • Start at the beginning and read through in order • Access time depends on location of data and previous location • e.g. tape • Direct • Individual blocks have unique address • Access is by jumping to vicinity plus sequential search • Access time depends on location and previous location • e.g. disk Rev. by Luciano Gualà (2008)

Access Methods (2) • Random • Individual addresses identify locations exactly • Access time is independent of location or previous access • e.g. RAM • Associative • Data is located by a comparison with contents of a portion of the store • Access time is independent of location or previous access • e.g. cache Rev. by Luciano Gualà (2008)

Performance • Access time • Time between presenting the address and getting the valid data • Memory Cycle time • Time may be required for the memory to “recover” before next access • Cycle time is access + recovery • Transfer Rate • Rate at which data can be moved • TN=TA+ N/R N: number of bits TA: access time TN: time need to read N bits R: transfer rate Rev. by Luciano Gualà (2008)

Physical Types • Semiconductor • RAM, ROM, EPROM, Cache • Magnetic • Disk & Tape • Optical • CD & DVD • Others • … Rev. by Luciano Gualà (2008)

Semiconductor Memory • RAM (Random Access Memory) • Misnamed as all semiconductor mem. are random access • Read/Write • Volatile • Temporary storage • Static or dynamic • ROM (Read only memory) • Permanent storage • Read only Rev. by Luciano Gualà (2008)

Dynamic RAM • Bits stored as charge in capacitors • Charges leak • Need refreshing even when powered • Simpler construction • Smaller per bit • Less expensive • Need refresh circuits • Slower • Main memory (static RAM would be too expensive) Rev. by Luciano Gualà (2008)

Static RAM • Bits stored as on/off switches • No charges to leak • No refreshing needed when powered • More complex construction • Larger per bit • More expensive • Does not need refresh circuits • Faster • Cache (here the faster the better) Rev. by Luciano Gualà (2008)

Read Only Memory (ROM) • Permanent storage • Microprogramming (see later) • Library subroutines • Systems programs (BIOS) • Function tables Rev. by Luciano Gualà (2008)

Types of ROM • Written during manufacture • Very expensive for small runs • Programmable (once) • PROM • Needs special equipment to program • Read “mostly” • Erasable Programmable (EPROM) • Erased by UV (it can take up to 20 minuts) • Electrically Erasable (EEPROM) • Takes much longer to write than read • a single byte can be erased • Flash memory • Erase memory electrically “block-at-a-time” Rev. by Luciano Gualà (2008)

Physical Characteristics • Decay (refresh time) • Volatility (needs power source) • Erasable • Power consumption Rev. by Luciano Gualà (2008)

Organisation • Physical arrangement of bits into words • Not always obvious • e.g. interleaved Rev. by Luciano Gualà (2008)

Basic Organization (1) • Basic element: memory cell • has 2 stable states: one represent 0, the other 1 • can be written at least once • can be read Write Read R/W Control R/W Control Cell Cell Select Select Input Data Output Data Rev. by Luciano Gualà (2008)

Basic Organization (2) • Basic organization of a 512x512 bits chip Timing and control Array of Memory Cells (512x512) Row Address Decoder A0 9 A8 D0 1 Sense Amplifier and I/O Gate A9 9 Column Address Decoder A17 Rev. by Luciano Gualà (2008)

Module Organisation • Basic organization of a 256KB chip • 8 times a 512x512 bits chip • …For a 1 MB chip replicate 4 times this organization… Rev. by Luciano Gualà (2008)

Module Organisation (1 MByte) Rev. by Luciano Gualà (2008)

Organisation for larger sizes • The larger the size the higher the number of address pins • For 2k words, k pins are needed • A solution to reduce the number of address pins • Multiplex row address and column address • k/2 pins to address 2k Bytes • Adding one more pin doubles range of values so x4 capacity Rev. by Luciano Gualà (2008)

Typical 16 Mb DRAM (4M x 4) X X Rev. by Luciano Gualà (2008)

Refreshing (Dynamic RAM) • Refresh circuit included on chip • Disable chip • Count through rows • Read & Write back • Takes time • Slows down apparent performance Rev. by Luciano Gualà (2008)

Packaging X Rev. by Luciano Gualà (2008)

Error Correction • Hard Failure • Permanent defect • Soft Error • Random, non-destructive • No permanent damage to memory • Detected using Hamming error correcting code • it is able to detect and correct 1-bit errors Rev. by Luciano Gualà (2008)

Error Correcting Code Function Rev. by Luciano Gualà (2008)

A simple example of correction (1) B A • Correcting errors in 4 bits words • 3 control groups • In each control group add 1 parity bit 1 1 1 0 C B A 1 1 0 1 1 0 0 C Rev. by Luciano Gualà (2008)

A simple example of correction (2) B A • One of the bits change value • Using control bit the right value is restored 1 1 0 1 0 0 0 C B A 1 1 0 1 1 0 0 C Rev. by Luciano Gualà (2008)

Compare Circuit • it takes two K-length binary strings X, Y as input • X=XK…X1 • Y=YK…Y1 • it returns a K-length binary string Z (syndrome) • Z=ZK…Z1 • Zi=Xi  Yi for each i=1,…,K • Z=0…0 means no error Rev. by Luciano Gualà (2008)

Relation between M and K • Z may assume 2K values • the value Z=0…0 means no error • the error may be in any bit among the M+K bits • it must be 2K -1  M+K Rev. by Luciano Gualà (2008)

How to arrange the M+K bits • the M+K bits are arranged so that • If Z0, error occured in the i-th bit where i is the value (in binary) of Z Rev. by Luciano Gualà (2008)

The case M=4 D1 C1= D1  D2  D4 C2= D1  D3  D4 C4= D2  D3  D4 C1 C2 D4 D2 D3 C4 Rev. by Luciano Gualà (2008)

Exercise • Design a Hamming error correcting code for 8-bit words • See the textbook for the solution Rev. by Luciano Gualà (2008)

Cache • Small amount of fast memory • Sits between normal main memory and CPU • May be located on CPU chip or module Rev. by Luciano Gualà (2008)

Cache operation - overview • CPU requests contents of memory location • Check cache for this data • If present (hit), get from cache (fast) • If not present (miss), read required block from main memory to cache • Then deliver from cache to CPU Rev. by Luciano Gualà (2008)

Cache Performance • Cache access time: t=1 • Memory access time: T=10 • Hit Probability: H Taverage access=t*H+(T+t)*(1-H)=t+(1-H)*T T average access Rev. by Luciano Gualà (2008) H

Locality of Reference (Denning’68) • Spatial Locality • Memory cells physically close to those just accessed tend to be accessed • Temporal Locality • During the course of the execution of a program, all accesses to the same memory cells tend to close in time • e.g. loops, arrays Rev. by Luciano Gualà (2008)

An example 200 … 201 … 202 SUB X, Y 203 BRZ 211 … … … … … … 210 BRA 202 211 … … … … … 225 BRE R1, R2, 235 … … … … 235 unconditional branch conditional branch conditional branch Rev. by Luciano Gualà (2008)

Typical Cache Organization Rev. by Luciano Gualà (2008)

Cache Design • Size • Mapping Function • Replacement Algorithm • Write Policy • Block Size • Number of Caches Rev. by Luciano Gualà (2008)

Size does matter • Cost • More cache is expensive • Speed • More cache is faster (up to a point) • Checking cache for data takes time Rev. by Luciano Gualà (2008)

Cache-memory mapping • There are M=2n/K blocks • C << M • Each block is mapped to a cache line Rev. by Luciano Gualà (2008)

A simple example of Direct Mapping w s-r r 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 …….. …….. …….. 11110 11111 { Line 0 Block 0 { Block 1 Line 1 { Block 2 Line 2 { Block 3 Line 3 { Block 4 Line 0 { Line 3 Block 15 Rev. by Luciano Gualà (2008)

Direct Mapping (1) • Each block of main memory is mapped to a specific cache line • i.e. if a block is in cache, it must be in one specific place • In a cache of C lines, block j is stored into line i, where: i = j mod C Rev. by Luciano Gualà (2008)

Direct Mapping (2) • Address is in two parts • w Least Significant Bits (LSB) identify unique word • s Most Significant Bits (MSB) specify one memory block • The MSBs are split into • a cache line field r (least significant) • a tag of s-r (most significant) Rev. by Luciano Gualà (2008)

Direct Mapping: Summarizing • address length: n=s+w bits • number of addressable units (words): 2s+w • block size=cache line size= 2w words • number of memory bocks: 2s+w/2w= 2s • number of cache lines: C= 2r • tag length: (s-r) bits Rev. by Luciano Gualà (2008)

Cache Line Mapping Table Cache line Main Memory blocks held • 0 0, C, 2C, …,2s-C • 1 1, C+1, 2C+1, …, 2s-C+1 • C-1 C-1, 2C-1, 3C-1, …, 2s-1 Rev. by Luciano Gualà (2008)

William Stallings Computer Organization and Architecture