220 likes | 481 Views
TDC 311. The Microarchitecture. Introduction. As mentioned earlier in the class, one Java statement generates multiple machine code statements Then one machine code statement generates one or more micro-code statements. Introduction Continued. For example, in Java: counter += 1;
E N D
TDC 311 The Microarchitecture
Introduction • As mentioned earlier in the class, one Java statement generates multiple machine code statements • Then one machine code statement generates one or more micro-code statements
Introduction Continued • For example, in Java: counter += 1; Might generate the following machine code: load reg1,counter inc reg1 store reg1,counter
machine code instr C Bus A Bus B Bus PC 1 Control Store MAR 2 Memory MDR 3 Reg A 4 Read, write signals Reg B 5 MIR Reg C 6 Addr C Bus (32 individual signals) B Bus Decoder Reg BB 31 A Bus Decoder (assume 31 registers, 0 means no register) Dec A 4 Dec B 5 AND 6 OR 7 Pass A 8 TwosC A 9 ALU control Add 0 Multiply 1 Inc A 2 Inc B 3 ALU
Clock Subcycles • Subcycle 1 – set up signals to drive data path • Subcycle 2 – drive A and B buses • Subcycle 3 – ALU operation • Subcycle 4 – drive C bus Registers loaded from C Bus Cycle starts here Next microinstruction loaded from control store 1 2 3 4 Requires 2 complete clock cycles to perform a microinstruction.
Simple Example • Java statement: counter += 1; • What might the microinstructions look like? • load reg1,counter • (Assume the address of counter is currently in Register C) • Rd=1; Wr=0; A=00110 (Reg C); B=00000; C=00010 (MAR); ALU=1000 (pass A thru) • Rd=1; all else 0 (counter should now be sitting in MDR) • Rd=0; Wr=0; A=00011 (MDR); B=00000; C=00100 (Reg A/1); ALU=1000 • inc reg1 • Rd=0; Wr=0; A=00100 (Reg A/1); B=00000; C=00100 (Reg A); ALU=0010 (Inc A) • store reg1,counter • Rd=0; Wr=1; A=00100 (Register A); B=00000; C=00011 (MDR); ALU=8 (assume address of counter is still in MAR) • Rd=0; Wr=1; all else 0
Design Issues • Speed vs. cost • reduce the number of clock cycles needed to execute an instruction • simplify the organization so that the clock cycle can be shorter • overlap the execution of instructions • Any way to improve upon the micro-architecture?
Design Issues • Create independent units that fetch and process the instructions? (double-up on other things? Everything?) • Pre-fetch one/two/three instructions? • Perform pipelining?
Pipeline Problems • Pipe stall – when a subsequent instruction must wait before it can proceed • What causes stalls? • waiting for memory • waiting for subsequent instruction • determining the next instruction • What if you encounter a branch instruction? • Also takes time to fill the pipeline
Design Issues • Perform branch prediction? • Perform out-of-order execution • add two register contents and store in register • increment counter by 1 • start a write operation • changed to: • add two register contents and store in register • start a write operation • increment counter by 1
Design Issues • Perform speculative execution? • Re-use registers that are no longer used? • Have a large register set and keep all current values in registers? • Use cache memory?
Cache Memory • Main memory is usually referenced near one location (locality principle) • Program code should be in one location (if good programmer) and data often in another (but grouped together) • Bring most recently referenced values into a high speed cache • How does the CPU know something is in cache or not?
Direct-mapped Cache • Most common form of cache memory • Let’s consider a cache which has 2048 entries, each entry holding 32 bytes (not bits) of data • 2048 entries times 32 bytes per entry equals 64 KB
Addresses that use this entry: 65504-65535, 131040- 131071,… 64-95, 65600-65631,… 32-63, 65568-65599,… 0-31, 65536-65567, 131072-131103,… 2047 2046 2045 : : 2 1 0 V bit Tag (16 bits) Data (32 bytes)
Cache Address • When a program generates a 32-bit address, it has the following form: Tag – 16 bits Line – 11 bits Word – 3 bits Byte – 2 bits
Cache Hit • To see if a data item is in the cache, use the 11-bit LINE portion (of the address) to point to one of the 2048 cache row entries • Then the 16-bit TAG of the address is compared to the 16-bit TAG value in the cache entry • If there is a match, the data is there
Cache Hit • If the data is there, use the 3-bit WORD portion of the address to tell you which word from the 8 words (32 bytes) in the cache line should be fetched • If necessary, the 2-bit BYTE address will tell you which one of the four bytes to fetch
Cache Memory • Note that since this cache only holds 64KB, it holds data for addresses 0 – 65535. • But it may also hold data for the addresses 65536 – 131071. • That is why you must compare the TAG fields to see if there is a match
Cache Miss • If no match (of TAG fields), then there is a cache miss • The CPU goes to main memory and fetches the next block of data and stores it in the cache (thus wiping out the old block in the cache)
Cache Example • Consider that the CPU wants to fetch data from location 3610 (or 00000024 in hex) • Tag = 0000 0000 0000 0000 • Line = 0000 0000 001 • Word = 001 • Byte = 00