Chapter 4: Arithmetic for Computers (Part 1)

Chapter 4: Arithmetic for Computers(Part 1) CS 447 Jason Bakos

Notes on Project 1 • There are two different ways the following two words can be stored in a computer memory… • word1 .byte 0,1,2,3 • word2 .half 0,1 • One way is big-endian, where the word is stored in memory in its original order… • word1: • word2: • Another way is little-endian, where the word is stored in memory in reverse order… • word1: • word2: • Of course, this affects the way in which the lw instruction works…

Notes on Project 1 • MIPS uses the endian-style that the architecture underneath it uses • Intel uses little-endian, so we need to deal with that • This affects assignment 1 because the input data is stored as a series of bytes • If you use lw’s on your data set, the values will be loaded into your dest. register in reverse order • Hint: Try the lb/sb instruction • This instruction will load/store a byte from an unaligned address and perform the translation for you

Notes on Project 1 • Hint: Use SPIM’s breakpoint and single-step features to help debug your program • Also, make sure you use the registers and memory/stack displays • Hint: You may want to temporarily store your input set into a word array for sorting • Make sure you check Appendix A for additional useful instructions that I didn’t cover in class • Make sure you comment your code!

Goals of Chapter 4 • Data representation • Hardware mechanisms for performing arithmetic on data • Hardware implications on the instruction set design

Review of Binary Representation • Binary/Hex -> Decimal conversion • Decimal -> Binary/Hex conversion • Least/Most significant bits • Highest representable number/maximum number of unique representable symbols • Two’s compliment representation • One’s compliment • Finding signed number ranges (-2n-1 to 2n-1-1) • Doing arithmetic with two’s compliment • Sign extending with load half/byte • Unsigned loads • Signed/unsigned comparison

Binary Addition/Subtraction • Binary subtraction works exactly like addition, except the second operand is converted to two’s compliment • Overflow in signed arithmetic occurs under the following conditions:

What Happens When Overflow Occurs? • MIPS detects overflow with an exception/interrupt • When an interrupt occurs, a branch occurs to code in the kernel at address 80000080 where special registers (BadVAddr, Status, Cause, and EPC) are used to handle the interrupt • SPIM has a simple interrupt handler built-in that deals with interrupts • We may come back to interrupts later

Review of Shift and Logical Operations • MIPS has operations for SLL, SRL, and SRA • We covered this in the last chapter • MIPS implements bit-wise AND, OR, and XOR logical operations • These operations perform a bit-by-bit parallel logical operation on two registers • In C, use << and >> for arithmetic shifts, and &, |, ^, and ~ for bitwise and, or, xor, and NOT, respectively

Review of Logic Operations • The three main parts of a CPU • ALU (Arithmetic and Logic Unit) • Performs all logical, arithmetic, and shift operations • CU (Control Unit) • Controls the CPU – performs load/store, branch, and instruction fetch • Registers • Physical storage locations for data

Review of Logic Operations • In this chapter, our goal is to learn how the ALU is implemented • The ALU is entirely constructed using boolean functions as hardware building blocks • The 3 basic digital logic building blocks can be used to construct any digital logic system: AND, OR, and NOT • These functions can be directly implemented using electric circuits (wires and transistors)

Review of Logic Operations • These “combinational” logic devices can be assembled to create a much more complex digital logic system

Review of Logic Operations • We need another device to build an ALU… • This is called a multiplexor… it implements an if-then-else in hardware

A 1-bit ALU • Perform logic operations in parellel and mux the output • Next, we want to include addition, so let’s build a single-bit adder • Called a full adder

Full Adder • From the following table, we can construct the circuit for a full adder and link multiple full adders together to form a multi-bit adder • We can also add this input to our ALU • How do we give subtraction ability to our adder? • How do we detect overflow and zero results?

Chapter 4: Arithmetic for Computers(Part 2) CS 447 Jason Bakos

Logic/Arithmetic • From the truth table for the mux, we can use sum-of-products to derive the logic equation • With sum-of-products, for each ‘1’ row for each output, we AND together all the inputs (inverting the input 0’s), then OR all the row products • To make it simpler, let’s add “don’t cares” to the table…

Logic/Arithmetic • This gives us the following equation • (A and (not D)) or (B and D) • We don’t need the inputs for the “don’t cares” in our partial products • This is one way to simplify our logic equation • Other ways include propositional calculus, Karnaugh Maps, and the Quine-McCluskey algorithm

Logic/Arithmetic • Here is a (crude) digital logic design for the 2-to-1 mux • Note that multiple muxes can be assembled in stages to implement multiple-input muxes

Logic/Arithmetic • For the adder, let’s minimize the logic using a Karnaugh Map… • For CarryOut, we need 23 entries… • We can minimize this to • CarryOut=AB+CarryInB+CarryInC

Logic/Arithmetic • There’s no way to minimize this equation, so we need the full sum of products: • Sum=(not A)(not B)CarryIn + ABCarryIn + (not A)BCarryIn + A(not B)CarryIn

Logic/Arithmetic • In order to implement subtraction, we can invert the B input to the adder and set CarryIn to be 1 • This can be implemented with a mux: select B or not B (call this input Binvert) • Now we can build a 1-bit ALU using an AND, OR, addition, and subtraction operation • We can perform the AND, OR, and ADD in parallel and switch the results with a 4-input mux (Operation will be our D-input) • To make the adder a subtractor, we’ll need to have to set Binvert and CarryIn to 1

Lecture 4: Arithmetic for Computers(Part 3) CS 447 Jason Bakos

Chapter 4 Review • So far, we’ve covered the following topics for this chapter • Binary representation of signed integers • 16 to 32 bit signed conversion • Binary addition/subtraction • Overflow detection/overflow exception handling • Shift and logical operations • Parts of the CPU • AND, OR, XOR, and inverter gates • Multiplexor (mux) and full adder • Sum-of-products logic equations (truth tables) • Logic minimization techniques • Don’t cares and Karnaugh Maps

1-bit ALU Design • A 1-bit ALU can be constructed • Components • AND, OR, and adder • 4-to-1 mux • “Binverter” (inverter and 2-to-1 mux) • Interface • Inputs: A, B, Binvert, Operation (2 bits), CarryIn, and Less • Outputs: CarryOut and Result • Digital functions are performed in parallel and the outputs are routed into a mux • The mux will also accept a Less input which we’ll accept from outside the 1-bit ALU • The select lines of the mux make up the “operation” input to the ALU

32-bit ALU • In order to create a multi-bit ALU, array 32 1-bit ALUs • Connect the CarryOut of each bit to the CarryIn of the next bit • A and B of each 1-bit ALU will be connected to each successive bit of the 32-bit A and B • The Result outputs of each 1-bit ALU will form the 32-bit result • We need to add an SLT unit and connect the output to the least significant 1-bit ALU’s Less input • Hardwire the other “Less” inputs to 0 • We need to add an Overflow unit • We need to add a Zero detection unit

SLT Unit • To compute SLT, we need to make sure that when the 1-bit ALU’s Operation is set to 11, a subtract operation is also being computed • With this happening, the SLT unit can compute Less based on the MSB (sign) of A, B, and Result

Overflow Unit • When doing signed arithmetic, we need to follow this table, as we covered previously… • How do we implement this in hardware?

Overflow Unit • We need a truth table… • Since we’ll be computing the logic equation with SOP, we only need the rows where the output is 1

Zero Detection Unit • “Or” together all the 1-bit ALU outputs – the result is the Zero output to the ALU

32-bit ALU Operation • We need a 3-bit ALU Operation input into our 32-bit ALU • The two least significant bits can be routed into all the 1-bit ALUs internally • The most significant bit can be routed into the least significant 1-bit ALU’s CarryIn, and to Binvert of all the 1-bit ALUs

32-bit ALU Operation • Here’s the final ALU Operation table:

32-bit ALU • In the end, our ALU will have the following interface: • Inputs: • A and B (32 bits each) • ALU Operation (3 bits) • Outputs: • CarryOut (1 bit) • Zero (1 bit) • Result (32 bits) • Overflow (1 bit)

Carry Lookahead • The adder architecture we previously looked at requires n*2 gate delays to compute its result (worst case) • The longest path that a digital signal must propagate through is called the “critical path” • This is WAAAYYYY too slow! • There other ways to build an adder that require lg n delay • Obviously, using SOP, we can build a circuit that will compute ANY function in 2 gate delays (2 levels of logic) • Obviously, in the case of a 64-input system, the resulting design will be too big and too complex

Carry Lookahead • For example, we can easily see that the CarryIn for bit 1 is computed as: • c1=(a0b0)+(a0c0)+(b0c0) • c2=(a1b1)+(a1c1)+(b1c1) • Hardware executes in parallel, so using the following fast CarryIn computation, we can perform an add with 3 gate delays • c2=(a1b1)+(a1a0b0)+(a1a0c0)+(a1b0c0)+(b1a0b0)+(b1a0c0)+(b1b0c0) • I used the logical distributive law to compute this • As you can see, the CarryIn logic gets bigger and bigger for consecutive bits

Carry Lookahead • Carry Lookahead adders are faster than ripple-carry adders • Recall: • ci+1=(aibi)+(aici)+(bici) • ci can be factored out… • ci+1=(aibi)+(ai+bi)ci • So… • c2=(a1b1)+(a1+b1)((a0b0)+(a0+b0)c0)

Carry Lookahead • Note the repeated appearance of (aibi) and (ai+bi) • They are called generate (gi) and propagate (pi) • gi=aibi, pi=ai+bi • ci+1=gi+pici • This means if gi=1, a CarryOut is generated • If pi=1, a CarryOut is propagated from CarryIn

Carry Lookahead • c1=g0+(p0c0) • c2=g1+(p1g0)+(p1p0c0) • c3=g2+(p2g1)+(p2p1g0)+(p2p1p0c0) • c4=g3+(p3g2)+(p3p2g1)+(p3p2p1g0)+(p3p2p1p0c0) • …This system will give us an adder with 5 gate delays but it is still too complex

Carry Lookahead • To solve this, we’ll build our adder using 4-bit adders with carry lookahead, and connect them using “super”-propagate and generate logic • The superpropagate is only true if all the bits propagate a carry • P0=p0p1p2p3 • P1=p4p5p6p7 • P2=p8p9p10p11 • P3=p12p13p14p15

Carry Lookahead • The supergenerate follows a similar equation: • G0=g3+(p3g2)+(p2p2g1)+(p3p2p1g0) • G1=g7+(p7g6)+(p7p6g5)+(p7p6p5g4) • G2=g11+(p11g10)+(p11p10g9)+(p11p10p9g8) • G3=g15+(p15g14)+(p15p14g13)+(p15p14p13g12) • The supergenerate and superpropagate logic for the 4-4 bit Carry Lookahead adders is contained in a Carry Lookahead Unit • This yields a worst-case delay of 7 gate delays • Reason?

Carry Lookahead • We’ve covered all ALU functions except for the shifter • We’ll talk after the shifter later

Lecture 4: Arithmetic for Computers(Part 4) CS 447 Jason Bakos

Binary Multiplication • In multiplication, the first operand is called the multiplicand, and the second is called the multiplier • The result is called the product • Not counting the sign bits, if we multiply an n-bit multiplicand with a m-bit multiplier, we’ll get a n+m-bit product

Binary Multiplication • Binary multiplication works exactly like decimal multiplication • In fact, multiply 100101 by 111001 and pretend you’re using decimal numbers

First Hardware Design for Multiplier Note that the multiplier is not routed into the ALU

Second Hardware Design for Multiplier • Architects realized that at the least, half of the bits in the multiplicand register were 0 • Reduce ALU to 32 bits, shift the product right instead of shifting the multiplicand left • In this case, the product is only 32 bits

Second Hardware Design for Multiplier

Final Hardware Design for Multiplier • Let’s combine the product register with the multiplier register… • Put the multiplier in the right half of the product register and initialize the left half with zeros – when we’re done, the product will be in the right half

Final Hardware Design for Multiplier

Final Hardware Design for Multiplier • For the first two designs, we need to convert the multiplicand and the multiplier must be converted to positive • The signs would need to be remembered so the product can be converted to whatever sign it needs to be • The third design will deal with signed numbers, as long as the sign bit is extended in the product register

Chapter 4: Arithmetic for Computers (Part 1)