Chapter 4: Arithmetic for Computers (Part 1)

1 / 74

Chapter 4: Arithmetic for Computers (Part 1) - PowerPoint PPT Presentation

Chapter 4: Arithmetic for Computers (Part 1). CS 447 Jason Bakos. Notes on Project 1. There are two different ways the following two words can be stored in a computer memory… word1 .byte 0,1,2,3 word2 .half 0,1

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'Chapter 4: Arithmetic for Computers (Part 1)' - salena

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Chapter 4: Arithmetic for Computers(Part 1)

CS 447

Jason Bakos

Notes on Project 1
• There are two different ways the following two words can be stored in a computer memory…
• word1 .byte 0,1,2,3
• word2 .half 0,1
• One way is big-endian, where the word is stored in memory in its original order…
• word1:
• word2:
• Another way is little-endian, where the word is stored in memory in reverse order…
• word1:
• word2:
• Of course, this affects the way in which the lw instruction works…
Notes on Project 1
• MIPS uses the endian-style that the architecture underneath it uses
• Intel uses little-endian, so we need to deal with that
• This affects assignment 1 because the input data is stored as a series of bytes
• If you use lw’s on your data set, the values will be loaded into your dest. register in reverse order
• Hint: Try the lb/sb instruction
• This instruction will load/store a byte from an unaligned address and perform the translation for you
Notes on Project 1
• Hint: Use SPIM’s breakpoint and single-step features to help debug your program
• Also, make sure you use the registers and memory/stack displays
• Hint: You may want to temporarily store your input set into a word array for sorting
• Make sure you check Appendix A for additional useful instructions that I didn’t cover in class
• Make sure you comment your code!
Goals of Chapter 4
• Data representation
• Hardware mechanisms for performing arithmetic on data
• Hardware implications on the instruction set design
Review of Binary Representation
• Binary/Hex -> Decimal conversion
• Decimal -> Binary/Hex conversion
• Least/Most significant bits
• Highest representable number/maximum number of unique representable symbols
• Two’s compliment representation
• One’s compliment
• Finding signed number ranges (-2n-1 to 2n-1-1)
• Doing arithmetic with two’s compliment
• Sign extending with load half/byte
• Signed/unsigned comparison
• Binary subtraction works exactly like addition, except the second operand is converted to two’s compliment
• Overflow in signed arithmetic occurs under the following conditions:
What Happens When Overflow Occurs?
• MIPS detects overflow with an exception/interrupt
• When an interrupt occurs, a branch occurs to code in the kernel at address 80000080 where special registers (BadVAddr, Status, Cause, and EPC) are used to handle the interrupt
• SPIM has a simple interrupt handler built-in that deals with interrupts
• We may come back to interrupts later
Review of Shift and Logical Operations
• MIPS has operations for SLL, SRL, and SRA
• We covered this in the last chapter
• MIPS implements bit-wise AND, OR, and XOR logical operations
• These operations perform a bit-by-bit parallel logical operation on two registers
• In C, use << and >> for arithmetic shifts, and &, |, ^, and ~ for bitwise and, or, xor, and NOT, respectively
Review of Logic Operations
• The three main parts of a CPU
• ALU (Arithmetic and Logic Unit)
• Performs all logical, arithmetic, and shift operations
• CU (Control Unit)
• Controls the CPU – performs load/store, branch, and instruction fetch
• Registers
• Physical storage locations for data
Review of Logic Operations
• In this chapter, our goal is to learn how the ALU is implemented
• The ALU is entirely constructed using boolean functions as hardware building blocks
• The 3 basic digital logic building blocks can be used to construct any digital logic system: AND, OR, and NOT
• These functions can be directly implemented using electric circuits (wires and transistors)
Review of Logic Operations
• These “combinational” logic devices can be assembled to create a much more complex digital logic system
Review of Logic Operations
• We need another device to build an ALU…
• This is called a multiplexor… it implements an if-then-else in hardware
A 1-bit ALU
• Perform logic operations in parellel and mux the output
• Next, we want to include addition, so let’s build a single-bit adder
• From the following table, we can construct the circuit for a full adder and link multiple full adders together to form a multi-bit adder
• We can also add this input to our ALU
• How do we give subtraction ability to our adder?
• How do we detect overflow and zero results?

Chapter 4: Arithmetic for Computers(Part 2)

CS 447

Jason Bakos

Logic/Arithmetic
• From the truth table for the mux, we can use sum-of-products to derive the logic equation
• With sum-of-products, for each ‘1’ row for each output, we AND together all the inputs (inverting the input 0’s), then OR all the row products
• To make it simpler, let’s add “don’t cares” to the table…
Logic/Arithmetic
• This gives us the following equation
• (A and (not D)) or (B and D)
• We don’t need the inputs for the “don’t cares” in our partial products
• This is one way to simplify our logic equation
• Other ways include propositional calculus, Karnaugh Maps, and the Quine-McCluskey algorithm
Logic/Arithmetic
• Here is a (crude) digital logic design for the 2-to-1 mux
• Note that multiple muxes can be assembled in stages to implement multiple-input muxes
Logic/Arithmetic
• For the adder, let’s minimize the logic using a Karnaugh Map…
• For CarryOut, we need 23 entries…
• We can minimize this to
• CarryOut=AB+CarryInB+CarryInC
Logic/Arithmetic
• There’s no way to minimize this equation, so we need the full sum of products:
• Sum=(not A)(not B)CarryIn + ABCarryIn + (not A)BCarryIn + A(not B)CarryIn
Logic/Arithmetic
• In order to implement subtraction, we can invert the B input to the adder and set CarryIn to be 1
• This can be implemented with a mux: select B or not B (call this input Binvert)
• Now we can build a 1-bit ALU using an AND, OR, addition, and subtraction operation
• We can perform the AND, OR, and ADD in parallel and switch the results with a 4-input mux (Operation will be our D-input)
• To make the adder a subtractor, we’ll need to have to set Binvert and CarryIn to 1

Lecture 4: Arithmetic for Computers(Part 3)

CS 447

Jason Bakos

Chapter 4 Review
• So far, we’ve covered the following topics for this chapter
• Binary representation of signed integers
• 16 to 32 bit signed conversion
• Overflow detection/overflow exception handling
• Shift and logical operations
• Parts of the CPU
• AND, OR, XOR, and inverter gates
• Multiplexor (mux) and full adder
• Sum-of-products logic equations (truth tables)
• Logic minimization techniques
• Don’t cares and Karnaugh Maps
1-bit ALU Design
• A 1-bit ALU can be constructed
• Components
• 4-to-1 mux
• “Binverter” (inverter and 2-to-1 mux)
• Interface
• Inputs: A, B, Binvert, Operation (2 bits), CarryIn, and Less
• Outputs: CarryOut and Result
• Digital functions are performed in parallel and the outputs are routed into a mux
• The mux will also accept a Less input which we’ll accept from outside the 1-bit ALU
• The select lines of the mux make up the “operation” input to the ALU
32-bit ALU
• In order to create a multi-bit ALU, array 32 1-bit ALUs
• Connect the CarryOut of each bit to the CarryIn of the next bit
• A and B of each 1-bit ALU will be connected to each successive bit of the 32-bit A and B
• The Result outputs of each 1-bit ALU will form the 32-bit result
• We need to add an SLT unit and connect the output to the least significant 1-bit ALU’s Less input
• Hardwire the other “Less” inputs to 0
• We need to add an Overflow unit
• We need to add a Zero detection unit
SLT Unit
• To compute SLT, we need to make sure that when the 1-bit ALU’s Operation is set to 11, a subtract operation is also being computed
• With this happening, the SLT unit can compute Less based on the MSB (sign) of A, B, and Result
Overflow Unit
• When doing signed arithmetic, we need to follow this table, as we covered previously…
• How do we implement this in hardware?
Overflow Unit
• We need a truth table…
• Since we’ll be computing the logic equation with SOP, we only need the rows where the output is 1
Zero Detection Unit
• “Or” together all the 1-bit ALU outputs – the result is the Zero output to the ALU
32-bit ALU Operation
• We need a 3-bit ALU Operation input into our 32-bit ALU
• The two least significant bits can be routed into all the 1-bit ALUs internally
• The most significant bit can be routed into the least significant 1-bit ALU’s CarryIn, and to Binvert of all the 1-bit ALUs
32-bit ALU Operation
• Here’s the final ALU Operation table:
32-bit ALU
• In the end, our ALU will have the following interface:
• Inputs:
• A and B (32 bits each)
• ALU Operation (3 bits)
• Outputs:
• CarryOut (1 bit)
• Zero (1 bit)
• Result (32 bits)
• Overflow (1 bit)
• The adder architecture we previously looked at requires n*2 gate delays to compute its result (worst case)
• The longest path that a digital signal must propagate through is called the “critical path”
• This is WAAAYYYY too slow!
• There other ways to build an adder that require lg n delay
• Obviously, using SOP, we can build a circuit that will compute ANY function in 2 gate delays (2 levels of logic)
• Obviously, in the case of a 64-input system, the resulting design will be too big and too complex
• For example, we can easily see that the CarryIn for bit 1 is computed as:
• c1=(a0b0)+(a0c0)+(b0c0)
• c2=(a1b1)+(a1c1)+(b1c1)
• Hardware executes in parallel, so using the following fast CarryIn computation, we can perform an add with 3 gate delays
• c2=(a1b1)+(a1a0b0)+(a1a0c0)+(a1b0c0)+(b1a0b0)+(b1a0c0)+(b1b0c0)
• I used the logical distributive law to compute this
• As you can see, the CarryIn logic gets bigger and bigger for consecutive bits
• Recall:
• ci+1=(aibi)+(aici)+(bici)
• ci can be factored out…
• ci+1=(aibi)+(ai+bi)ci
• So…
• c2=(a1b1)+(a1+b1)((a0b0)+(a0+b0)c0)
• Note the repeated appearance of (aibi) and (ai+bi)
• They are called generate (gi) and propagate (pi)
• gi=aibi, pi=ai+bi
• ci+1=gi+pici
• This means if gi=1, a CarryOut is generated
• If pi=1, a CarryOut is propagated from CarryIn
• c1=g0+(p0c0)
• c2=g1+(p1g0)+(p1p0c0)
• c3=g2+(p2g1)+(p2p1g0)+(p2p1p0c0)
• c4=g3+(p3g2)+(p3p2g1)+(p3p2p1g0)+(p3p2p1p0c0)
• …This system will give us an adder with 5 gate delays but it is still too complex
• To solve this, we’ll build our adder using 4-bit adders with carry lookahead, and connect them using “super”-propagate and generate logic
• The superpropagate is only true if all the bits propagate a carry
• P0=p0p1p2p3
• P1=p4p5p6p7
• P2=p8p9p10p11
• P3=p12p13p14p15
• The supergenerate follows a similar equation:
• G0=g3+(p3g2)+(p2p2g1)+(p3p2p1g0)
• G1=g7+(p7g6)+(p7p6g5)+(p7p6p5g4)
• G2=g11+(p11g10)+(p11p10g9)+(p11p10p9g8)
• G3=g15+(p15g14)+(p15p14g13)+(p15p14p13g12)
• The supergenerate and superpropagate logic for the 4-4 bit Carry Lookahead adders is contained in a Carry Lookahead Unit
• This yields a worst-case delay of 7 gate delays
• Reason?
• We’ve covered all ALU functions except for the shifter
• We’ll talk after the shifter later

Lecture 4: Arithmetic for Computers(Part 4)

CS 447

Jason Bakos

Binary Multiplication
• In multiplication, the first operand is called the multiplicand, and the second is called the multiplier
• The result is called the product
• Not counting the sign bits, if we multiply an n-bit multiplicand with a m-bit multiplier, we’ll get a n+m-bit product
Binary Multiplication
• Binary multiplication works exactly like decimal multiplication
• In fact, multiply 100101 by 111001 and pretend you’re using decimal numbers
First Hardware Design for Multiplier

Note that the multiplier is not routed into the ALU

Second Hardware Design for Multiplier
• Architects realized that at the least, half of the bits in the multiplicand register were 0
• Reduce ALU to 32 bits, shift the product right instead of shifting the multiplicand left
• In this case, the product is only 32 bits
Final Hardware Design for Multiplier
• Let’s combine the product register with the multiplier register…
• Put the multiplier in the right half of the product register and initialize the left half with zeros – when we’re done, the product will be in the right half
Final Hardware Design for Multiplier
• For the first two designs, we need to convert the multiplicand and the multiplier must be converted to positive
• The signs would need to be remembered so the product can be converted to whatever sign it needs to be
• The third design will deal with signed numbers, as long as the sign bit is extended in the product register
Booth’s Algorithm
• Booth’s Algorithm starts with the observation that if we have the ability to both add and subtract, there are multiple ways to compute a product
• For every 0 in the multiplier, we shift the multiplicand
• For every 1 in the multiplier, we add the multiplicand to the product, then shift the multiplicand
Booth’s Algorithm
• Shift for all 1’s after this, until the first 0 is seen, then add
• The method was developed because in Booth’s era, shifters were faster than adders
Booth’s Algorithm
• Example:

0010 == 2

x 0110 == 6

0000 == 0 shift

0010 == -2 (*21) subtract (first 1)

0000 == 0 shift (second 1)

0010 == 2 (*23) (first 0)

-4+16=2*6=12

Lecture 4: Arithmetic for Computers(Part 5)

CS 447

Jason Bakos

Binary Division
• Again, let’s assume our numbers are base 10, but let’s only use 0’s and 1’s
Binary Division
• Recall:
• Dividend=Quotient*Divisor + Remainder
• Let’s assume that both the dividend and divisor are positive and hence the quotient and the remainder are nonnegative
• The division operands and both results are 32-bit values and we will ignore the sign for now
First Hardware Design for Divider

Initialize the Quotient register to 0, initialize the left-half of the Divisor register with the divisor, and initialize the Remainder register with the dividend (right-aligned)

Second Hardware Design for Divider

Much like with the multiplier, the divisor and ALU can be reduced to 32-bits if we shift the remainder right instead of shifting the divisor to the left

Also, the algorithm must be changed so the remainder is shifted left before the subtraction takes place

Third Hardware Design for Divider

Shift the bits of the quotient into the remainder register…

Also, the last step of the algorithm is to shift the left half of the remainder right 1 bit

Signed Division
• Simplest solution: remember the signs of the divisor and the dividend and then negate the quotient if the signs disagree
• The dividend and the remainder must have the same signs
Considerations
• The same hardware can be used for both multiply and divide
• Requirement: 64-bit register that can shift left or right and a 32-bit ALU that can add or subtract
Floating Point
• Floating point (also called real) numbers are used to represent values that are fractional or that are too big to fit in a 32-bit integer
• Floating point numbers are expressed in scientific notation (base 2) and are normalized (no leading 0’s)
• 1.xxxx2 * 2yyyy
• In this case, xxxx is the significand and yyyy is the exponent
Floating Point
• In MIPS, a floating point is represented in the following manner (IEEE 754 standard):
• bit 31: sign of significand
• bit 30..23 (8) exponent (2’s comp)
• bit 22..0 (23) significand
• Note that size of exponent and significand must be traded off... accuracy vs. range
• This allows us representation for signed numbers as small as 2x10-38 to 2x1038
• Overflow and underflow must be detected
• Double-precision floating point numbers are 2 words... the significand is extended to 52 bits and the exponent to 11 bits
• Also, the first bit of the significand is implicit (only the fractional part is specified)
• In order to represent 0 in a float, put 0 in the exponent field
• So here’s the equation we use: (-1)S x (1+Significand) x 2E
• Or: (-1)S X (1+ (s1x2-1) + (s2x2-2) + (s3x2-3) + (s4x2-4) + ...) x 2E
Considerations
• IEEE 754 sought to make floating-point numbers easier to sort
• sign is first bit
• exponent comes first
• But we want an all-0 (+1) exponent to represent the most-negative exponent and an all-1 exponent to be the most positive
• This is called biased-notation, so we’ll use the following equation:
• (-1)S x (1 + Significand) x 2(Exponent-Bias)
• Bias is 127 for single-precision and 1023 for double-precision

Lecture 4: Arithmetic for Computers(Part 6)

CS 447

Jason Bakos

Converting Decimal Floating Point to Binary
• Use the method I showed last lecture...
• Significand:
• Use the iterative method to convert the fractional part to binary
• Convert the integer part to binary using the “old-fashioned” method
• Shift the decimal point to the left until the number is normalized
• Drop the leading 1, and set the exponent to be the number of positions you shifted the decimal point
• Adjust the exponent for bias (127/1023)
• Let’s add two decimal floating point numbers...
• Let’s try 9.999 x 101 + 1.610 x 10-1
• Assume we can only store 4 digits of the significand and two digits of the exponent
• Match exponents for both operands by un-normalizing one of them
• Match to the exponent of the larger number
• Normalize result
• Round significand
Floating Point Multiplication
• Example: 1.110 x 1010 X 9.200 x 10-5
• Assume 4 digits for significand and 2 digits for exponent
• Calculate the exponent of the product by simply adding the exponents of the operand
• 10+(-5)=5
• Bias the exponents
• 137+122=259
• Something’s wrong! We added the biases with the exponents...
• 5+127=132
Floating Point Multiplication
• Multiply the significands...
• 1.110 x 9.200=10.212000
• 1.0212 x 106
• Round significand to four digits
• 1.021
• Set sign based on signs of operands
• = +1.021 x 106
Accurate Arithmetic
• Integers can represent every value between the largest and smallest possible values
• This is not the case with floating point
• Only 253 unique values can be represented with double precision fp
• IEEE 754 always keeps 2 extra bits on the right of the significand during intermediate calculation called guard and round to minimize rounding errors
Accurate Arithmetic
• Since the worst case for rounding would be when the actual number is halfway between two floating point representations, accuracy is measured as number of least-significant error bits
• This is called units in the last place (ulp)
• IEEE 754 guarantees that the computer is within .5 ulp (using guard and round)