Advanced ALU

Advanced ALU Lecture 5 CS365

This lecture RoadMap • Implementation of MIPS ALU • Signed and unsigned numbers (Section 3.2) • Addition and subtraction (Section 3.3) • Constructing an arithmetic logic unit (Appendix B.5, B.6) • Multiplication (Section 3.4) • Division (Section 3.5) • Floating point (Section 3.6) CS465 F08

Unsigned Multiply • Paper and pencil example (unsigned): Multiplicand 1000 Multiplier 1001 1000 0000 0000 1000 Product 01001000 • m bits x n bits = m+n bit product (at most) • Binary makes it easy: • 0 => place 0 ( 0 x multiplicand) • 1 => place a copy ( 1 x multiplicand) • 4 versions of multiply hardware & algorithm: • Successive refinement CS465 F08

0 0 0 0 A3 A2 A1 A0 B0 A3 A2 A1 A0 B1 A3 A2 A1 A0 B2 A3 A2 A1 A0 B3 P7 P6 P5 P4 P3 P2 P1 P0 Unsigned Combinational Multiplier • Stage i accumulates A * 2i if Bi == 1 CS465 F08

A3 A2 A1 A0 A3 A2 A1 A0 A3 A2 A1 A0 A3 A2 A1 A0 How Does It Work? • At each stage shift A left ( x 2) • Use next bit of B to determine whether to add in shifted multiplicand • Accumulate 2n bit partial product at each stage 0 0 0 0 0 0 0 B0 B1 B2 B3 P7 P6 P5 P4 P3 P2 P1 P0 CS465 F08

Unsigned Multiplier (Version 1) • Shift-add multiplier • 64-bit Multiplicand reg, 64-bit ALU, 64-bit Product reg, 32-bit Multiplier reg Shift Left Multiplicand 64 bits Multiplier Shift Right 64-bit ALU 32 bits Write Product Control 64 bits Figure 3.5 CS465 F08

Start Multiplier0 = 1 Multiplier0 = 0 1. Test Multiplier0 1a. Add multiplicand to product & place the result in Product register 2. Shift the Multiplicand register left 1 bit. 3. Shift the Multiplier register right 1 bit. 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done Multiply Algorithm Version 1 0010 x 0011 Product Multiplier Multiplicand 0000 0000 0011 0000 0010 0000 0010 0001 0000 0100 0000 0110 0000 0000 1000 0000 0110 0000 0001 0000 0000 0110 Figure 3.6 CS465 F08

Observations on Multiply Version 1 • One clock cycle per step => ~100 clocks per multiply • Half bits in multiplicand always 0 => 64-bit adder is wasted • 0 is inserted when shift the multiplicand left=> least significant bits of product never changed once formed • Instead of shifting multiplicand to left, shift product to right? CS465 F08

A3 A2 A1 A0 A3 A2 A1 A0 A3 A2 A1 A0 A3 A2 A1 A0 How Does It Work? 0 0 0 0 • Multiplicand stays still and product moves right B0 B1 B2 B3 P7 P6 P5 P4 P3 P2 P1 P0 CS465 F08

Unsigned Multiplier Version 2 • Refined from the first version: 32-bit Multiplicand reg, 32 -bit ALU, 64-bit Product reg, 32-bit Multiplier reg Multiplicand 32 bits Multiplier Shift Right 32-bit ALU 32 bits Shift Right Product Control Write 64 bits CS465 F08

Start Multiplier0 = 1 Multiplier0 = 0 1. Test Multiplier0 1a. Add multiplicand to the left half ofproduct & place the result in the left half ofProduct register 2. Shift the Product register right 1 bit. 3. Shift the Multiplier register right 1 bit. 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done Multiply Algorithm Version 2 0010 x 0011 Product Multiplier Multiplicand 0000 0000 0011 0010 1: 0010 0000 0011 0010 2: 0001 0000 0011 0010 3: 0001 0000 0001 0010 1: 0011 0000 0001 0010 2: 0001 1000 0001 0010 3: 0001 1000 0000 0010 1: 0001 1000 0000 0010 2: 0000 1100 0000 0010 3: 0000 1100 0000 0010 1: 0000 1100 0000 0010 2: 0000 0110 0000 0010 3: 0000 0110 0000 0010 CS465 F08

Observations on Multiply Version 2 • Useful portion of multiplier shrinks while valid portion of product grows in the process of multiplication • Product register wastes space that exactly matches size of multiplier • =>Combine Multiplier register and Product register? CS465 F08

MULTIPLY Hardware Version 3 • 32-bit Multiplicand reg, 32 -bit ALU, 64-bit Product reg, (0-bit Multiplier reg) Multiplicand 32 bits 32-bit ALU Shift Right Product (Multiplier) Control Write 64 bits CS465 F08

1. Test Product0 1a. Add multiplicand to the left half of product & place the result in the left half of Product register Multiply Algorithm Version 3 Start Multiplicand Product0010 0000 0011 00100011 0010 00010001 00110001 0010 00011000 0010 00001100 0010 0000 0110 Product0 = 1 Product0 = 0 0010 x 0011 2. Shift the Product register right 1 bit. 32nd repetition? No: < 32 repetitions Yes: 32 repetitions CS465 F08 Done

Multiply Version 3 • Two steps per bit because Multiplier & Product combined • MIPS multiplication (unsigned) multu $t1, $t2 # t1 * t2 • No destination register: product could be ~264; need two special registers to hold it • Registers Hi and Lo are left and right half of Product • To use the result: mfhi $t3 mflo $t4 CS465 F08

Signed Multiplication • What about signed multiplication? • Easiest solution is to make both positive & remember whether to complement product when done • Apply definition of 2’s complement • Need to sign-extend partial products • Booth’s algorithm is an elegant way to multiply signed and unsigned numbers using the same hardware and save cycles • Can handle multiple bits at a time CS465 F08

0010 x (-2) 0010 x 8 Motivation for Booth’s Algorithm • Example 2 x 6 = 0010 x 0110: 0010 x 0110 + 0000 shift (0 in multiplier) + 0010 add (1 in multiplier) + 0010 add (1 in multiplier) + 0000 shift (0 in multiplier) 00001100 • Basic idea: replace a string of 1s with an initial subtract on seeing a one and add after last one 0010 x 0110 0000 shift (0 in multiplier) – 0010 sub (first 1 in multiplier) 0000 shift (mid string of 1s) + 0010 add (prior step had last 1) 00001100 Can get the same result in more than one way: 6= – 2 + 8 0110 = – 0010 + 1000 CS465 F08

Booth’s Algorithm Current Bit to Explanation Example Op bit right 1 0 Begins run of 1s 0001111000 sub 1 1 Middle of run of 1s 0001111000 none 0 1 End of run of 1s 0001111000 add 0 0 Middle of run of 0s 0001111000 none • Originally for speed when shift was faster than add • Big advantage: it handles signed number • Why it works? –1 + 10000 01111 CS465 F08

Booth’s Algorithm 1. Depending on the current and previous bits, do one of the following:00: Middle of a string of 0s, no arithmetic op01: End of a string of 1s, so add multiplicand to the left half of the product10: Beginning of a string of 1s, so subtract multiplicand from the left half of the product11: Middle of a string of 1s, so no arithmetic op 2. As in the previous algorithm, shift the Product register right (arithmetically) 1 bit CS465 F08

Booths Example (2 x 7) 1. P = P - m 1110 +1110 1110 01110 shift P(sign ext) 2. 0010 1111 00111 11 -> nop, shift 3. 0010 1111 10011 11 -> nop, shift 4. 0010 1111 11001 01 -> add 0010 +0010 0001 11001 shift 0010 0000 11100 done step Multiplicand Product operation 0. initial value 0010 0000 0111 0 10 -> sub CS465 F08

Booths Example (2 x -3) 1. P = P - m 1110 +1110 1110 1101 0 shift P(sign ext) 2. 0010 1111 0110 1 01 -> add 0010 + 0010 0001 01101 shift P 3. 0010 0000 10110 10 -> sub 1110 +1110 0010 1110 10110 shift 4. 0010 1111 01011 11 -> nop 1111 01011 shift 0010 1111 10101 done step Multiplicand Product operation 0. initial value 0010 0000 1101 0 10 -> sub CS465 F08

Outline • Implementation of MIPS ALU • Signed and unsigned numbers (Section 3.2) • Addition and subtraction (Section 3.3) • Constructing an arithmetic logic unit (Appendix B.5, B.6) • Multiplication (Section 3.4) • Booth’s algorithm: CD: 3.23 In More Depth • Division (Section 3.5) • Floating point (Section 3.6) CS465 F08

DIVIDE: Paper & Pencil 1001 Quotient Divisor 1000 1001010 Dividend–1000 10 101 1010–1000 10 Remainder (or Modulo result) • See how big a number can be subtracted, creating quotient bit on each step • Binary => 1 * divisor or 0 * divisor • Dividend = Quotient x Divisor + Remainder • 3 versions of divide, successive refinement CS465 F08

DIVIDE Hardware Version 1 • 64-bit Divisor reg, 64-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg Shift Right Divisor 64 bits Quotient Shift Left 64-bit ALU 32 bits Write Remainder Control 64 bits CS465 F08

Start: Place Dividend in Remainder 1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register. Remainder < 0 Test Remainder Remainder  0 2a. Shift the Quotient register to the left, setting the new rightmost bit to 1. 2b. Restore the original value by adding the Divisor register to the Remainder register, & place the sum in the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0. 3. Shift the Divisor register right 1 bit. n+1 repetition? No: < n+1 repetitions Yes: n+1 repetitions (n = 4 here) Done Divide Algorithm Version 1 • Takes n+1 iterations for n-bit Quotient & Remainder Quot. Divisor Rem. 0000 00100000 00000111 11100111 000001110000 00010000 00000111 11110111 000001110000 00001000 00000111 11111111 000001110000 00000100 00000111 000000110001 000000110001 00000010 00000011 000000010011 000000010011 00000001 00000001 CS465 F08

Observations on Divide Version 1 • 1st step cannot produce a 1 in quotient bit (otherwise quotient is too big for the register) => switch order to shift first and then subtract => save 1 iteration • 1/2 bits in divisor always 0 => 1/2 of 64-bit adder is wasted => 1/2 of divisor is wasted • Instead of shifting divisor to right, shift remainder to left? CS465 F08

DIVIDE Hardware Version 2 • 32-bit Divisor reg, 32-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg Divisor 32 bits Quotient Shift Left 32-bit ALU 32 bits Shift Left Remainder Control Write 64 bits CS465 F08

Start: Place Dividend in Remainder 1. Shift the Remainder register left1 bit. 2. Subtract the Divisor register from the left half of theRemainder register, & place the result in the left half of theRemainder register. Test Remainder Remainder 0 Remainder < 0 3b. Restore the original value by adding the Divisor register to the left half of theRemainder register, &place the sum in the left half of theRemainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0. 3a. Shift the Quotient register to the left setting the new rightmost bit to 1. nth repetition? No: < nrepetitions Yes: nrepetitions Done Divide Algorithm Version 2 CS465 F08

Observations on Divide Version 2 • Remainder shrinks while Quotient grows • Eliminate Quotient register by combining with Remainder register • Start by shifting the Remainder left as before • Only shifting once for both Remainder(left half) and Quotient(right half): 2 steps each iteration • Another consequence is that the remainder will be shifted left one more time • Thus the final correction step must shift back only the remainder in the left half of the register CS465 F08

DIVIDE HARDWARE Version 3 • 32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder reg, (0-bit Quotient reg) Divisor 32 bits 32-bit ALU “HI” “LO” Shift Left (Quotient) Remainder Control Write 64 bits CS465 F08

Start: Place Dividend in Remainder 1. Shift the Remainder register left 1 bit. 2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. Test Remainder Remainder < 0 Remainder ≥ 0 3b. Restore the original value by adding the Divisor register to the left half of the Remainder register, &place the sum in the left half of the Remainder register. Also shift the Remainder register to the left, setting the new least significant bit to 0. 3a. Shift the Remainder register to the left setting the new rightmost bit to 1. nth repetition? No: < n repetitions Yes: n repetitions (n = 4 here) Done. Shift left half of Remainder right 1 bit Divide Algorithm Version 3 Step Remainder Div.00000 0111 0010 1.1 00001110 1.2 11101110 1.3b 00011100 2.2 11111100 2.3b 00111000 3.2 00011000 3.3a 00110001 4.2 0001 0001 4.3a 0010 0011 00010011 CS465 F08

More Division Issues • Signed divides: simplest is to remember signs, make positive, and complement quotient and remainder if necessary • Satisfy Dividend =Quotient x Divisor + Remainder • Note: Dividend and Remainder must have same sign • Note: Quotient negated if Divisor sign & Dividend sign disagree, e.g., –7 ÷ 2 = –3, remainder = –1 • Quotient = 4 and remainder =1 is not correct • Possible for quotient to be too large • If divide 64-bit integer by 1, quotient is 64 bits (“called saturation”) • MIPS ignores overflow, SW detection and processing CS465 F08

Recap • Same hardware for Divide as Multiply: just need ALU to add or subtract, and 64-bit register to shift left or shift right • Hi and Lo registers in MIPS combine to act as 64-bit register for multiply and divide • div $s2, $s3 #Lo=$S2/$s3, Hi=$s2 mod $s3 • Successive refinement to see final design • Booth’s algorithm to handle signed multiplies • Overflow • Both MIPS multiply and divide instructions ignore overflow • Software must check for overflow scenarios CS465 F08

Outline • Implementation of MIPS ALU • Signed and unsigned numbers (Section 3.2) • Addition and subtraction (Section 3.3) • Constructing an arithmetic logic unit (Appendix B.5, B.6) • Multiplication (Section 3.4, CD: 3.23 In More Depth) • Division (Section 3.5) • Floating point (Section 3.6) CS465 F08

Floating-Point: Motivation • What can be represented in N bits? • Unsigned 0 to 2N-1 • 2’s Complement - 2N-1 to 2N-1 - 1 • 1’s Complement -2N-1+1to 2N-1-1 • Binary Coded Decimal 0 to 10N/4 –1 • But, what about? • Very large numbers? 9,349,398,989,787,762,244,859,087,678 • Very small number? 0.0000000000000000000000045691 • Rational numbers 2/3 • Irrational numbers pi CS465 F08

Recall Scientific Notation: Binary • Normalized form: no leading 0s, exactly one digit to left of decimal/binary point • Alternatives to represent 1/1,000,000,000 • Normalized: 1.0 x 10-9 • Not normalized: 0.1 x 10-8, 10.0 x 10-10 • Computer arithmetic that supports scientific notation numbers is called floating point, because the binary point is not fixed, as it is for integers binary point exponent 23 1.01 x 2 radix (base) Significand(Mantissa) CS465 F08

31 30 23 22 0 S Exponent Fraction 1 bit 8 bits 23 bits FP Representation • Normalized format: 1.xxxxxxxxxxtwo 2yyyytwo • Want to put it into multiple words: 32 bits for single-precision and 64 bits for double-precision • A simple single-precision representation: S represents signExponent represents y’sFraction represents x’s • Represent numbers as small as 2.0 x 10-38 to as large as 2.0 x 1038 CS465 F08

31 30 20 19 0 S Exponent Fraction 1 bit 11 bits 20 bits Fraction (cont’d) 32 bits Double Precision Representation • Next multiple of word size (64 bits) • Double precision (vs. single precision) • Represent numbers almost as small as 2.0 x 10-308 to almost as large as 2.0 x 10308 • But primary advantage is greater accuracy due to larger fraction CS465 F08

IEEE 754 Standard (1/4) • Regarding single precision, DP similar • Sign bit: 1 means negative 0 means positive • Significand: • To pack more bits, leading 1 implicit for normalized numbers • Significand = 1 + fraction: 1 + 23 bits single, 1 + 52 bits double • Always true: 0 < fraction < 1 (for normalized numbers) • Note: 0 has no leading 1, so reserve exponent value 0 just for number 0 CS465 F08

0 1111 1111 000 0000 0000 0000 0000 0000 1/2 2 0 0000 0001 000 0000 0000 0000 0000 0000 IEEE 754 Standard (2/4) • Exponent: • Need to represent positive and negative exponents • Also want to compare FP numbers as if they were integers, to help in value comparisons • If use 2’s complement to represent?e.g., 1.0 x 2-1 versus 1.0 x2+1 (1/2 versus 2) • If we use integer comparison for these two words, we will conclude that 1/2 > 2!!! CS465 F08

Biased (Excess) Notation • Remapping: shift all values by subtracting a bias from the unsigned representation • Biased 7 Bit-pattern biased-7 value 1000 1 1001 2 1010 3 1011 4 1100 5 1101 6 1110 7 1111 8 Bit-pattern biased-7 value 0000 -7 0001 -6 0010 -5 0011 -4 0100 -3 0101 -2 0110 -1 0111 0 • All-zero pattern is the smallest; all-one pattern is the largest • Now integer comparison can be used directly to compare exponents CS465 F08

0 0111 1110 000 0000 0000 0000 0000 0000 1/2 2 0 1000 0000 000 0000 0000 0000 0000 0000 IEEE 754 Standard (3/4) • Use biased notation for exponents, where bias is the number subtracted to get the real number • IEEE 754 uses bias of 127 for single precision:subtract 127 from Exponent field to get actual value for exponent • 1023 is the bias for double precision • All-zero and all-one exponent patterns are reserved for special numbers (Fig 3.15) CS465 F08

31 30 23 22 0 S Exponent Fraction 1 bit 8 bits 23 bits IEEE 754 Standard (4/4) • Summary (single precision): (-1)S x (1.Fraction) x 2(Exponent-127) • Double precision identical, except longer fraction and exponent segments with exponent bias of 1023 CS465 F08

0 0110 1000 101 0101 0100 0011 0100 0010 Example: FP to Decimal • Sign: 0 => positive • Exponent: • 0110 1000two = 104ten • Bias adjustment: 104 - 127 = -23 • Fraction: • 1+2-1+2-3 +2-5 +2-7 +2-9 +2-14 +2-15 +2-17 +2-22= 1.0 + 0.666115 • Represents: 1.666115ten2-23  1.986 10-7 CS465 F08

1 0111 1110 100 0000 0000 0000 0000 0000 Example 1: Decimal to FP • Number = - 0.75 = - 0.11two 20 (scientific notation) = - 1.1two 2-1 (normalized scientific notation) • Sign: negative => 1 • Exponent: • Bias adjustment: -1 +127 = 126 • 126ten = 0111 1110two CS465 F08

0 0111 1101 0101 0101 0101 0101 0101 010 Example 2: Decimal to FP • A more difficult case: representing 1/3? = 0.33333…10 = 0.0101010101… 2  20 = 1.0101010101… 2 2-2 • Sign: 0 • Exponent = -2 + 127 = 12510=011111012 • Fraction = 0101010101… CS465 F08

Special Numbers • What have we defined so far? (single precision) ExponentFractionObject 0 0 0 0 nonzero ??? 1-254 anything +/- floating-point 255 0 ??? 255 nonzero ??? • Range: 1.0  2-126  1.8  10-38What if result too small? (>0, < 1.8x10-38 => Underflow!) (2 – 2-23)  2127  3.4  1038What if result too large? (> 3.4x1038 => Overflow!) CS465 F08

Denormalized Numbers • For single-precision f. p. (page 217) • 0 is 00000000000000000000000000000000 • Smallest positive normalized number is 1.0000 0000 0000 0000 0000 000 x 2-126 • This is actually a pretty big gap... • Gradual underflow: allow a number to degrade in significance until it becomes 0 • By getting rid of implicit leading 1 in front of the fraction... we can represent a smaller number • We denote a denormalized number by 0 exponent and a non-zero fraction • The smallest denormalized number is 0.0000 0000 0000 0000 0000 001 x 2-126 or 1.0 x 2-149 • For double-precision f. p. • Smallest denormalized number: 1.0 x 2-1074 CS465 F08

Special Numbers • What have we defined so far? (single precision) ExponentFractionObject 0 0 0 0 nonzero denorm 1-254 anything +/- floating-point 255 0 ??? 255 nonzero ??? CS465 F08

S 1111 1111 0000 0000 0000 0000 0000 000 Representation for +/- Infinity • In FP, divide by zero should produce +/- infinity, not overflow • Why? • OK to do further computations with infinity, e.g., X/0 > Y may be a valid comparison • IEEE 754 represents +/- infinity by all-one exponent and all-zero fraction CS465 F08

Advanced ALU

Advanced ALU

Presentation Transcript

ALU

Jeff Alu

ALU Software

b1000 ALU

Alu Elements

ALU

ALU

ALU

ALU

MIPS ALU

ALU (Continued)

Alu Alu Blister Foil

The ALU

The ALU

Alu schuifwand