Create Presentation
Download Presentation

Download Presentation
## Computer Architecture The Anatomy of Modern Processors

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Computer ArchitectureThe Anatomy of Modern Processors**Processor Organization (Part 2) John Morris**EN**EN EN EN OE OE OE 6 5 Speeding it up MAR MainMemory • Observe there are several common operations for each instruction • Fetch next instruction • Increment PC • See mcode in Tanenbaum for actual examples • Introduce an Instruction Fetch Unit (IFU) • Add an additional adder • Incrementer - slightly simpler than a general purpose adder • Some operations can now be performed in parallel • Execute one instruction • Fetch the next • mcode is shorter • PC = PC + 1 not needed • IFU does it! MDR EN PC IFU MIR B bus C bus H c n z v ALU control ALU 4 Shift bits Shifter Shift control Anatomy of Modern Processors**EN**EN EN EN OE OE OE 6 5 Speeding it up MAR MainMemory • Provide 3 data buses • More flexibility for instructions • Doesn’t require a ‘move-only’ cycle to bring an operand into H • Question: • What consequence does the addition of the A bus have on the microcode word width? MDR EN PC IFU MIR B bus A bus C bus H c n z v ALU control ALU 4 Shift bits Shifter Shift control Anatomy of Modern Processors**MAR**MainMemory MDR EN EN EN EN OE OE OE EN PC IFU MIR B bus A bus FetchedInstruction ALU Operands C bus H c n z v ALU control ALU 4 Shift bits Shifter Shift control 5 6 ALU Result Speeding it up • Add registers! • Pipelined machine • Fetched instruction • ALU operands • ALU Result • Latency of an instruction? • 4 stage pipeline • Include write back to memory • 4 clock cycles • Throughput • 1 Instruction completes / clock cycle • Up to 4 instructions ‘in flight’ at any time • Long latency instructions reduce this eg Memory fetch Anatomy of Modern Processors**MAR**MainMemory MDR EN EN EN EN OE OE OE EN PC IFU MIR B bus A bus FetchedInstruction ALU Operands C bus H c n z v ALU control ALU 4 Shift bits Shifter Shift control 6 5 ALU Result Speeding it up • There are many more things … • SOFTENG 363 covers the most important tricks learnt in 40 years of computer architecture research! Anatomy of Modern Processors**b**b b a c c a a c 31 31 31 1 0 0 1 1 0 Computational Elements • How does a circuit compute? • Simple boolean operations egc = a v b are straightforward 32 a 32 32 b c Anatomy of Modern Processors**Computational Elements - Adders**• How does a circuit compute? • Something more complex: egc = a + b • Adders are crucial! • Programs - ~25% of instructions • Arithmetic • Array addressing • String indexing • … • Program counter • PC = PC + 1 (logically) • Actually PC = PC+4 in a 32-bit machine • Relative jump • Jump to PC + n instructions Anatomy of Modern Processors**Computational Elements - Adders**• Adders • Start by adding two bits • Observe we need a carry from 1 + 1 • Circuit block • What happens to couti? • It is fed to block i+1 0 + 0 0 0 + 1 1 1 + 0 1 1 + 1 10 ci ai + couti bi Anatomy of Modern Processors**Computational Elements - Adders**• Adder Circuit block • This is known as a Full Adder • A Half Adder doesn’t have carry in + cini (couti-1) sumi ai couti (cini+1) bi ci ai + couti bi Anatomy of Modern Processors**a**b cin FA FA FA FA carry c Computational Elements - Adders • 32-bit adder • Note there’s a carry out for the overflow bit • What do we do with carry in? a0 a31 b0 b31 sum0 sum31 • First solution: • Use a half adder! - It doesn’t have one! • Second (usual) solution: • Set it to zero • Why use a more complex full adder when a half adder will do? • Later Anatomy of Modern Processors**Full Adder**• Truth Table • Observe that cout|sum read as a binary number counts the number of input bits Anatomy of Modern Processors**cin**sum a b cout Full Adder • Logic equations • sum = a xor b xor cin • carry = (a b) (a cin) (b cin) • Implementation Anatomy of Modern Processors**a**a0 a31 b cin b0 b31 FA FA FA FA carry sum0 sum31 c Adder - Performance • 32-bit adder • FAi • Cin is Cout of FAi-1 • So FAi can’t produce a result until FAi-1 has settled • Long tpd • tpd((n bits) = n * tpd((full adder) • This is known as a Ripple Carry Adder • Simple, regular, but sloooooooooooow …. Anatomy of Modern Processors**a**a0 a31 b cin b0 b31 FA FA FA FA carry sum0 sum31 c Adder - Performance • 32-bit adder • Ripple carry adder has a long propagation delay! • Adders are crucial • Improving adders can make a big difference • 40+ years of intense research • Just in binary arithmetic!! Anatomy of Modern Processors**Carry Select Adder**a4-7 b4-7 0 cin a0-3 cout7 b0-3 n-bit Ripple Carry Adder sum04-7 cout3 n-bit Ripple Carry Adder 1 b4-7 cout7 n-bit Ripple Carry Adder sum0-3 sum14-7 ‘Standard’ n-bit ripple carry adders n = any suitable value 0 1 0 1 Here we build an 8-bit adder from 4-bit blocks carry sum4-7 Anatomy of Modern Processors**These two blocks**‘speculate’ on the value of cout3 This block adds the 4 low order bits After 4*tpd it will produce a carry out Carry Select Adder a4-7 b4-7 0 cin a0-3 cout7 b0-3 n-bit Ripple Carry Adder sum04-7 cout3 n-bit Ripple Carry Adder 1 b4-7 cout7 n-bit Ripple Carry Adder sum0-3 sum14-7 One assumes it will be 0 the other assumes 1 0 1 0 1 carry sum4-7 Anatomy of Modern Processors**This block adds**the 4 low order bits After 4*tpd it will produce a carry out Carry Select Adder • After 4*tpd we will have: • sum0-3 (final sum bits) • cout3 (from low order block) • sum04-7 • cout07 (from block assuming 0 cin) • sum14-7 • cout17 (from block assuming 1 cin) a4-7 b4-7 0 cin a0-3 cout7 b0-3 n-bit Ripple Carry Adder sum04-7 cout3 n-bit Ripple Carry Adder 1 b4-7 cout7 n-bit Ripple Carry Adder sum0-3 sum14-7 0 1 0 1 carry sum4-7 Anatomy of Modern Processors**Carry Select Adder**a4-7 b4-7 0 cin a0-3 cout7 b0-3 n-bit Ripple Carry Adder Cout3 selects correct sum4-7 and carry out sum04-7 cout3 n-bit Ripple Carry Adder 1 b4-7 cout7 n-bit Ripple Carry Adder sum0-3 sum14-7 0 1 0 1 All 8 bits + carry are available after 4*tpd(FA) + tpd(multiplexor) carry sum4-7 Anatomy of Modern Processors**Carry Select Adder**• This scheme can be generalized to any number of bits • Select a suitable block size (eg 4, 8) • Replicate all blocks except the first • One with cin = 0 • One with cin = 1 • Use final cout from preceding block to select correct set of outputs for current block Anatomy of Modern Processors**Fast Adders**• Many other fast adder schemes have been proposedeg • Carry-skip • Manchester • Carry-save • Carry Look Ahead • If implementing an adder (eg in programmable logic) • do a little research first! Anatomy of Modern Processors**What about that carry in?**• In an ALU, we usually need to do more than just add! • Subtractions are common also • Observe • c = a - b is equivalent to • c = a + (-b) • So we can use an adder for subtractions if we can negate the 2nd operand • Negation in 2’s complement arithmetic? Anatomy of Modern Processors**Adder / Subtractor**• Negation in 2’s complement arithmetic? • Rule: • Complement each bit • Add 1 • eg Binary Decimal 0001 1 Complement 1110 Add 1 1111 -1 0110 6 Complement 1001 Add 1 1010 -6 Anatomy of Modern Processors**FA**FA Adder / Subtractor • Using an adder • Complement each bit using an inverter • Use the carry in to add 1! a b 0 1 add/ subtract cin FA carry c Anatomy of Modern Processors