1 / 9

CS 232: Computer Architecture II

CS 232: Computer Architecture II. Prof. Laxmikant (Sanjay) Kale Floating point arithmetic. Floating Point (a brief look). We need a way to represent numbers with fractions, e.g., 3.1416 very small numbers, e.g., .000000001 very large numbers, e.g., 3.15576  10 9 Representation:

hyatt-burt
Download Presentation

CS 232: Computer Architecture II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic

  2. Floating Point (a brief look) • We need a way to represent • numbers with fractions, e.g., 3.1416 • very small numbers, e.g., .000000001 • very large numbers, e.g., 3.15576  109 • Representation: • sign, exponent, significand: (–1)signsignificand 2exponent • more bits for significand gives more accuracy • more bits for exponent increases range • IEEE 754 floating point standard: • single precision: 8 bit exponent, 23 bit significand • double precision: 11 bit exponent, 52 bit significand

  3. Floating point representation: • The idea is to normalize all numbers, so the significand has exactly one digit to the left of the decimal point. • 12345 = 1.2345 * 10^4 • .0000012345 = 1.2345 * 10^-6 • Do this in binary: 1.01110 x 2^(1011) • IEEE FP representation • (+/-) 1.0101010101010101010101 * 2 ^ ( 10101010) • This is single precision • Double precision: 64 bits in all. • Where does one need accuracy of that level?

  4. Floating point numbers • Representation issues: • sign bit, exponent, significand • Question: how to represent each field • Question: which order to lay them out in a word? • Factor: should be easy to do comparisons (for sorting) • For arithmetic, we will have special hardware anyway • Choice: • Sign + magnitude representation • Sign bit, followed by exponent, then significand (why?) • exponent: represented with a “bias”: add 127 (1023 for double precision) • significand: assume implicit 1. (so 00001 means 1.00001)

  5. Floating point representation • So: • (+/-) x (1 + significand) x 2 ^ (exponent - bias) is the value of a floating point number • Example: 0 00001000 01010000000000000000000 • Example: convert -.41 to single precision form

  6. IEEE 754 floating-point standard • Leading “1” bit of significand is implicit • Exponent is “biased” to make sorting easier • all 0s is smallest exponent all 1s is largest • bias of 127 for single precision and 1023 for double precision • summary: (–1)signsignificand) 2exponent – bias • Example: • decimal: -.75 = -3/4 = -3/22 • binary: -.11 = -1.1 x 2-1 • floating point: exponent = 126 = 01111110 • IEEE single precision: 10111111010000000000000000000000

  7. Floating point addition • The problem is: the exponents of numbers being added may be different • 2.0 * 10^1 + 3.0 * 10^(-1) • 2.0 * 10^1 + .03 * 10^ 1 : Now we can add them • 2.03 * 10 ^1 • But we are not necessarily done! • E.g. 9.74 * 10^0 + 3.3 * 10^(-1) • 10.07 * 10^0 is not correct form! • Shift again to get the correct form: 1.037 * 10^1

  8. You can get different results • A + B + C = A + (B+C) = (A+B) + C • Right? • Can you see a problem? • When do you lose bits?

  9. Floating point multiplication • Add exponents, but subtract bias • Then multiply significands • Then normalize

More Related