html5-img
1 / 74

Chapter 4: Arithmetic for Computers (Part 1)

Chapter 4: Arithmetic for Computers (Part 1). CS 447 Jason Bakos. Notes on Project 1. There are two different ways the following two words can be stored in a computer memory… word1 .byte 0,1,2,3 word2 .half 0,1

salena
Download Presentation

Chapter 4: Arithmetic for Computers (Part 1)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 4: Arithmetic for Computers(Part 1) CS 447 Jason Bakos

  2. Notes on Project 1 • There are two different ways the following two words can be stored in a computer memory… • word1 .byte 0,1,2,3 • word2 .half 0,1 • One way is big-endian, where the word is stored in memory in its original order… • word1: • word2: • Another way is little-endian, where the word is stored in memory in reverse order… • word1: • word2: • Of course, this affects the way in which the lw instruction works…

  3. Notes on Project 1 • MIPS uses the endian-style that the architecture underneath it uses • Intel uses little-endian, so we need to deal with that • This affects assignment 1 because the input data is stored as a series of bytes • If you use lw’s on your data set, the values will be loaded into your dest. register in reverse order • Hint: Try the lb/sb instruction • This instruction will load/store a byte from an unaligned address and perform the translation for you

  4. Notes on Project 1 • Hint: Use SPIM’s breakpoint and single-step features to help debug your program • Also, make sure you use the registers and memory/stack displays • Hint: You may want to temporarily store your input set into a word array for sorting • Make sure you check Appendix A for additional useful instructions that I didn’t cover in class • Make sure you comment your code!

  5. Goals of Chapter 4 • Data representation • Hardware mechanisms for performing arithmetic on data • Hardware implications on the instruction set design

  6. Review of Binary Representation • Binary/Hex -> Decimal conversion • Decimal -> Binary/Hex conversion • Least/Most significant bits • Highest representable number/maximum number of unique representable symbols • Two’s compliment representation • One’s compliment • Finding signed number ranges (-2n-1 to 2n-1-1) • Doing arithmetic with two’s compliment • Sign extending with load half/byte • Unsigned loads • Signed/unsigned comparison

  7. Binary Addition/Subtraction • Binary subtraction works exactly like addition, except the second operand is converted to two’s compliment • Overflow in signed arithmetic occurs under the following conditions:

  8. What Happens When Overflow Occurs? • MIPS detects overflow with an exception/interrupt • When an interrupt occurs, a branch occurs to code in the kernel at address 80000080 where special registers (BadVAddr, Status, Cause, and EPC) are used to handle the interrupt • SPIM has a simple interrupt handler built-in that deals with interrupts • We may come back to interrupts later

  9. Review of Shift and Logical Operations • MIPS has operations for SLL, SRL, and SRA • We covered this in the last chapter • MIPS implements bit-wise AND, OR, and XOR logical operations • These operations perform a bit-by-bit parallel logical operation on two registers • In C, use << and >> for arithmetic shifts, and &, |, ^, and ~ for bitwise and, or, xor, and NOT, respectively

  10. Review of Logic Operations • The three main parts of a CPU • ALU (Arithmetic and Logic Unit) • Performs all logical, arithmetic, and shift operations • CU (Control Unit) • Controls the CPU – performs load/store, branch, and instruction fetch • Registers • Physical storage locations for data

  11. Review of Logic Operations • In this chapter, our goal is to learn how the ALU is implemented • The ALU is entirely constructed using boolean functions as hardware building blocks • The 3 basic digital logic building blocks can be used to construct any digital logic system: AND, OR, and NOT • These functions can be directly implemented using electric circuits (wires and transistors)

  12. Review of Logic Operations • These “combinational” logic devices can be assembled to create a much more complex digital logic system

  13. Review of Logic Operations • We need another device to build an ALU… • This is called a multiplexor… it implements an if-then-else in hardware

  14. A 1-bit ALU • Perform logic operations in parellel and mux the output • Next, we want to include addition, so let’s build a single-bit adder • Called a full adder

  15. Full Adder • From the following table, we can construct the circuit for a full adder and link multiple full adders together to form a multi-bit adder • We can also add this input to our ALU • How do we give subtraction ability to our adder? • How do we detect overflow and zero results?

  16. Chapter 4: Arithmetic for Computers(Part 2) CS 447 Jason Bakos

  17. Logic/Arithmetic • From the truth table for the mux, we can use sum-of-products to derive the logic equation • With sum-of-products, for each ‘1’ row for each output, we AND together all the inputs (inverting the input 0’s), then OR all the row products • To make it simpler, let’s add “don’t cares” to the table…

  18. Logic/Arithmetic • This gives us the following equation • (A and (not D)) or (B and D) • We don’t need the inputs for the “don’t cares” in our partial products • This is one way to simplify our logic equation • Other ways include propositional calculus, Karnaugh Maps, and the Quine-McCluskey algorithm

  19. Logic/Arithmetic • Here is a (crude) digital logic design for the 2-to-1 mux • Note that multiple muxes can be assembled in stages to implement multiple-input muxes

  20. Logic/Arithmetic • For the adder, let’s minimize the logic using a Karnaugh Map… • For CarryOut, we need 23 entries… • We can minimize this to • CarryOut=AB+CarryInB+CarryInC

  21. Logic/Arithmetic • There’s no way to minimize this equation, so we need the full sum of products: • Sum=(not A)(not B)CarryIn + ABCarryIn + (not A)BCarryIn + A(not B)CarryIn

  22. Logic/Arithmetic • In order to implement subtraction, we can invert the B input to the adder and set CarryIn to be 1 • This can be implemented with a mux: select B or not B (call this input Binvert) • Now we can build a 1-bit ALU using an AND, OR, addition, and subtraction operation • We can perform the AND, OR, and ADD in parallel and switch the results with a 4-input mux (Operation will be our D-input) • To make the adder a subtractor, we’ll need to have to set Binvert and CarryIn to 1

  23. Lecture 4: Arithmetic for Computers(Part 3) CS 447 Jason Bakos

  24. Chapter 4 Review • So far, we’ve covered the following topics for this chapter • Binary representation of signed integers • 16 to 32 bit signed conversion • Binary addition/subtraction • Overflow detection/overflow exception handling • Shift and logical operations • Parts of the CPU • AND, OR, XOR, and inverter gates • Multiplexor (mux) and full adder • Sum-of-products logic equations (truth tables) • Logic minimization techniques • Don’t cares and Karnaugh Maps

  25. 1-bit ALU Design • A 1-bit ALU can be constructed • Components • AND, OR, and adder • 4-to-1 mux • “Binverter” (inverter and 2-to-1 mux) • Interface • Inputs: A, B, Binvert, Operation (2 bits), CarryIn, and Less • Outputs: CarryOut and Result • Digital functions are performed in parallel and the outputs are routed into a mux • The mux will also accept a Less input which we’ll accept from outside the 1-bit ALU • The select lines of the mux make up the “operation” input to the ALU

  26. 32-bit ALU • In order to create a multi-bit ALU, array 32 1-bit ALUs • Connect the CarryOut of each bit to the CarryIn of the next bit • A and B of each 1-bit ALU will be connected to each successive bit of the 32-bit A and B • The Result outputs of each 1-bit ALU will form the 32-bit result • We need to add an SLT unit and connect the output to the least significant 1-bit ALU’s Less input • Hardwire the other “Less” inputs to 0 • We need to add an Overflow unit • We need to add a Zero detection unit

  27. SLT Unit • To compute SLT, we need to make sure that when the 1-bit ALU’s Operation is set to 11, a subtract operation is also being computed • With this happening, the SLT unit can compute Less based on the MSB (sign) of A, B, and Result

  28. Overflow Unit • When doing signed arithmetic, we need to follow this table, as we covered previously… • How do we implement this in hardware?

  29. Overflow Unit • We need a truth table… • Since we’ll be computing the logic equation with SOP, we only need the rows where the output is 1

  30. Zero Detection Unit • “Or” together all the 1-bit ALU outputs – the result is the Zero output to the ALU

  31. 32-bit ALU Operation • We need a 3-bit ALU Operation input into our 32-bit ALU • The two least significant bits can be routed into all the 1-bit ALUs internally • The most significant bit can be routed into the least significant 1-bit ALU’s CarryIn, and to Binvert of all the 1-bit ALUs

  32. 32-bit ALU Operation • Here’s the final ALU Operation table:

  33. 32-bit ALU • In the end, our ALU will have the following interface: • Inputs: • A and B (32 bits each) • ALU Operation (3 bits) • Outputs: • CarryOut (1 bit) • Zero (1 bit) • Result (32 bits) • Overflow (1 bit)

  34. Carry Lookahead • The adder architecture we previously looked at requires n*2 gate delays to compute its result (worst case) • The longest path that a digital signal must propagate through is called the “critical path” • This is WAAAYYYY too slow! • There other ways to build an adder that require lg n delay • Obviously, using SOP, we can build a circuit that will compute ANY function in 2 gate delays (2 levels of logic) • Obviously, in the case of a 64-input system, the resulting design will be too big and too complex

  35. Carry Lookahead • For example, we can easily see that the CarryIn for bit 1 is computed as: • c1=(a0b0)+(a0c0)+(b0c0) • c2=(a1b1)+(a1c1)+(b1c1) • Hardware executes in parallel, so using the following fast CarryIn computation, we can perform an add with 3 gate delays • c2=(a1b1)+(a1a0b0)+(a1a0c0)+(a1b0c0)+(b1a0b0)+(b1a0c0)+(b1b0c0) • I used the logical distributive law to compute this • As you can see, the CarryIn logic gets bigger and bigger for consecutive bits

  36. Carry Lookahead • Carry Lookahead adders are faster than ripple-carry adders • Recall: • ci+1=(aibi)+(aici)+(bici) • ci can be factored out… • ci+1=(aibi)+(ai+bi)ci • So… • c2=(a1b1)+(a1+b1)((a0b0)+(a0+b0)c0)

  37. Carry Lookahead • Note the repeated appearance of (aibi) and (ai+bi) • They are called generate (gi) and propagate (pi) • gi=aibi, pi=ai+bi • ci+1=gi+pici • This means if gi=1, a CarryOut is generated • If pi=1, a CarryOut is propagated from CarryIn

  38. Carry Lookahead • c1=g0+(p0c0) • c2=g1+(p1g0)+(p1p0c0) • c3=g2+(p2g1)+(p2p1g0)+(p2p1p0c0) • c4=g3+(p3g2)+(p3p2g1)+(p3p2p1g0)+(p3p2p1p0c0) • …This system will give us an adder with 5 gate delays but it is still too complex

  39. Carry Lookahead • To solve this, we’ll build our adder using 4-bit adders with carry lookahead, and connect them using “super”-propagate and generate logic • The superpropagate is only true if all the bits propagate a carry • P0=p0p1p2p3 • P1=p4p5p6p7 • P2=p8p9p10p11 • P3=p12p13p14p15

  40. Carry Lookahead • The supergenerate follows a similar equation: • G0=g3+(p3g2)+(p2p2g1)+(p3p2p1g0) • G1=g7+(p7g6)+(p7p6g5)+(p7p6p5g4) • G2=g11+(p11g10)+(p11p10g9)+(p11p10p9g8) • G3=g15+(p15g14)+(p15p14g13)+(p15p14p13g12) • The supergenerate and superpropagate logic for the 4-4 bit Carry Lookahead adders is contained in a Carry Lookahead Unit • This yields a worst-case delay of 7 gate delays • Reason?

  41. Carry Lookahead • We’ve covered all ALU functions except for the shifter • We’ll talk after the shifter later

  42. Lecture 4: Arithmetic for Computers(Part 4) CS 447 Jason Bakos

  43. Binary Multiplication • In multiplication, the first operand is called the multiplicand, and the second is called the multiplier • The result is called the product • Not counting the sign bits, if we multiply an n-bit multiplicand with a m-bit multiplier, we’ll get a n+m-bit product

  44. Binary Multiplication • Binary multiplication works exactly like decimal multiplication • In fact, multiply 100101 by 111001 and pretend you’re using decimal numbers

  45. First Hardware Design for Multiplier Note that the multiplier is not routed into the ALU

  46. Second Hardware Design for Multiplier • Architects realized that at the least, half of the bits in the multiplicand register were 0 • Reduce ALU to 32 bits, shift the product right instead of shifting the multiplicand left • In this case, the product is only 32 bits

  47. Second Hardware Design for Multiplier

  48. Final Hardware Design for Multiplier • Let’s combine the product register with the multiplier register… • Put the multiplier in the right half of the product register and initialize the left half with zeros – when we’re done, the product will be in the right half

  49. Final Hardware Design for Multiplier

  50. Final Hardware Design for Multiplier • For the first two designs, we need to convert the multiplicand and the multiplier must be converted to positive • The signs would need to be remembered so the product can be converted to whatever sign it needs to be • The third design will deal with signed numbers, as long as the sign bit is extended in the product register

More Related