cs 232 computer architecture ii
Download
Skip this Video
Download Presentation
CS 232: Computer Architecture II

Loading in 2 Seconds...

play fullscreen
1 / 9

CS 232: Computer Architecture II - PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on

CS 232: Computer Architecture II. Prof. Laxmikant (Sanjay) Kale Floating point arithmetic. Floating Point (a brief look). We need a way to represent numbers with fractions, e.g., 3.1416 very small numbers, e.g., .000000001 very large numbers, e.g., 3.15576  10 9 Representation:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CS 232: Computer Architecture II' - hyatt-burt


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cs 232 computer architecture ii

CS 232: Computer Architecture II

Prof. Laxmikant (Sanjay) Kale

Floating point arithmetic

floating point a brief look
Floating Point (a brief look)
  • We need a way to represent
    • numbers with fractions, e.g., 3.1416
    • very small numbers, e.g., .000000001
    • very large numbers, e.g., 3.15576  109
  • Representation:
    • sign, exponent, significand: (–1)signsignificand 2exponent
    • more bits for significand gives more accuracy
    • more bits for exponent increases range
  • IEEE 754 floating point standard:
    • single precision: 8 bit exponent, 23 bit significand
    • double precision: 11 bit exponent, 52 bit significand
floating point representation
Floating point representation:
  • The idea is to normalize all numbers, so the significand has exactly one digit to the left of the decimal point.
    • 12345 = 1.2345 * 10^4
    • .0000012345 = 1.2345 * 10^-6
    • Do this in binary: 1.01110 x 2^(1011)
  • IEEE FP representation
    • (+/-) 1.0101010101010101010101 * 2 ^ ( 10101010)
    • This is single precision
    • Double precision: 64 bits in all.
  • Where does one need accuracy of that level?
floating point numbers
Floating point numbers
  • Representation issues:
    • sign bit, exponent, significand
    • Question: how to represent each field
    • Question: which order to lay them out in a word?
    • Factor: should be easy to do comparisons (for sorting)
      • For arithmetic, we will have special hardware anyway
    • Choice:
      • Sign + magnitude representation
      • Sign bit, followed by exponent, then significand (why?)
      • exponent: represented with a “bias”: add 127 (1023 for double precision)
      • significand: assume implicit 1. (so 00001 means 1.00001)
floating point representation1
Floating point representation
  • So:
    • (+/-) x (1 + significand) x 2 ^ (exponent - bias) is the value of a floating point number
    • Example: 0 00001000 01010000000000000000000
    • Example: convert -.41 to single precision form
ieee 754 floating point standard
IEEE 754 floating-point standard
  • Leading “1” bit of significand is implicit
  • Exponent is “biased” to make sorting easier
    • all 0s is smallest exponent all 1s is largest
    • bias of 127 for single precision and 1023 for double precision
    • summary: (–1)signsignificand) 2exponent – bias
  • Example:
    • decimal: -.75 = -3/4 = -3/22
    • binary: -.11 = -1.1 x 2-1
    • floating point: exponent = 126 = 01111110
    • IEEE single precision: 10111111010000000000000000000000
floating point addition
Floating point addition
  • The problem is: the exponents of numbers being added may be different
    • 2.0 * 10^1 + 3.0 * 10^(-1)
    • 2.0 * 10^1 + .03 * 10^ 1 : Now we can add them
    • 2.03 * 10 ^1
    • But we are not necessarily done!
    • E.g. 9.74 * 10^0 + 3.3 * 10^(-1)
    • 10.07 * 10^0 is not correct form!
    • Shift again to get the correct form: 1.037 * 10^1
you can get different results
You can get different results
  • A + B + C = A + (B+C) = (A+B) + C
    • Right?
  • Can you see a problem?
  • When do you lose bits?
floating point multiplication
Floating point multiplication
  • Add exponents, but subtract bias
  • Then multiply significands
  • Then normalize
ad