- 54 Views
- Uploaded on
- Presentation posted in: General

CS 232: Computer Architecture II

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

CS 232: Computer Architecture II

Prof. Laxmikant (Sanjay) Kale

Floating point arithmetic

- We need a way to represent
- numbers with fractions, e.g., 3.1416
- very small numbers, e.g., .000000001
- very large numbers, e.g., 3.15576 109

- Representation:
- sign, exponent, significand: (–1)signsignificand 2exponent
- more bits for significand gives more accuracy
- more bits for exponent increases range

- IEEE 754 floating point standard:
- single precision: 8 bit exponent, 23 bit significand
- double precision: 11 bit exponent, 52 bit significand

- The idea is to normalize all numbers, so the significand has exactly one digit to the left of the decimal point.
- 12345 = 1.2345 * 10^4
- .0000012345 = 1.2345 * 10^-6
- Do this in binary: 1.01110 x 2^(1011)

- IEEE FP representation
- (+/-) 1.0101010101010101010101 * 2 ^ ( 10101010)
- This is single precision
- Double precision: 64 bits in all.

- Where does one need accuracy of that level?

- Representation issues:
- sign bit, exponent, significand
- Question: how to represent each field
- Question: which order to lay them out in a word?
- Factor: should be easy to do comparisons (for sorting)
- For arithmetic, we will have special hardware anyway

- Choice:
- Sign + magnitude representation
- Sign bit, followed by exponent, then significand (why?)
- exponent: represented with a “bias”: add 127 (1023 for double precision)
- significand: assume implicit 1. (so 00001 means 1.00001)

- So:
- (+/-) x (1 + significand) x 2 ^ (exponent - bias) is the value of a floating point number
- Example: 0 00001000 01010000000000000000000
- Example: convert -.41 to single precision form

- Leading “1” bit of significand is implicit
- Exponent is “biased” to make sorting easier
- all 0s is smallest exponent all 1s is largest
- bias of 127 for single precision and 1023 for double precision
- summary: (–1)signsignificand) 2exponent – bias

- Example:
- decimal: -.75 = -3/4 = -3/22
- binary: -.11 = -1.1 x 2-1
- floating point: exponent = 126 = 01111110
- IEEE single precision: 10111111010000000000000000000000

- The problem is: the exponents of numbers being added may be different
- 2.0 * 10^1 + 3.0 * 10^(-1)
- 2.0 * 10^1 + .03 * 10^ 1 : Now we can add them
- 2.03 * 10 ^1
- But we are not necessarily done!
- E.g. 9.74 * 10^0 + 3.3 * 10^(-1)
- 10.07 * 10^0 is not correct form!
- Shift again to get the correct form: 1.037 * 10^1

- A + B + C = A + (B+C) = (A+B) + C
- Right?

- Can you see a problem?
- When do you lose bits?

- Add exponents, but subtract bias
- Then multiply significands
- Then normalize