1 / 10

Floating Point

Floating Point. Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number of corresponding binary standards exists. There is one common standard: IEEE 754-1985 (IEC 559). 4. IEEE 754-1985. Number representations: Single precision (32 bits)

Download Presentation

Floating Point

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Floating Point Number system corresponding to the decimal notation 1,837 * 10 significandexponent A great number of corresponding binary standards exists. There is one common standard: IEEE 754-1985 (IEC 559) 4

  2. IEEE 754-1985 • Number representations: • Single precision (32 bits) sign: 1 bit exponent: 8 bits fraction: 23 bits • Double precision (64 bits) sign: 1 bit exponent: 11 bits fraction: 52 bits

  3. Single Precision Format 1 8 23 Sign S S E M Exponent E: excess 127 binary integer Mantissa M: normalized binary significand w/ hidden integer bit: 1.M Excess 127; actual exponent is e = E - 127 N = (-1)S * (1.M [bit-string])*2e

  4. Example 1 S E M 1 01111110 10000000000000000000000 e = E - 127 e = 126 - 127 = -1 N = (-1)1 * (1.1 [bit-string]) *2-1 N = -1 * 0.11 [bit-string] N = -1 * (2-1 *1 + 2 -2 *1) N = -1 * (0.5*1 + 0.25*1) = -0.75

  5. Single Precision Range Magnitude of numbers that can be represented is in the range: 2-126 *(1.0) to 2127 *(2-223) which is approximately: 1.8*10-38 to 3.4*1038

  6. IEEE 754-1985 • Single Precision (32 bits) • Fraction part: 23 bits; 0x < 1 • Significand:1 + fraction part.“1” is not stored; “hidden bit”.Corresponds to 7 decimal digits. • Exponent:127 added to the exponent.Corresponds to the range 10 -39 to 10 39 • Double Precision (64 bits) • Fraction part: 52 bits; 0x < 1 • Significand:1 + fraction part.“1” is not stored; “hidden bit”.Corresponds to 16 decimal digits. • Exponent:1023 added to the exponent; Corresponds to the range 10 -308 to 10 308

  7. IEEE 754-1985 • Special features: • Correct rounding of “halfway” result (to even number). • Includes special values: • NaN Not a number •  Infinity • -  - Infinity • Uses denormal number to represent numbers less than 2 -E min • Rounds to nearest by default; Three other rounding modes exist. • Sophisticated exception handling.

  8. Add / Sub (s1 * 2e1) +/- (s2 * 2 e2 ) = (s1 +/- s2) * 2 e3 = s3 * 2 e3 • s = 1.s, the hidden bit is used during the operation. 1: Shift summands so they have the same exponent: • e.g., if e2 < e1: shift s2 right and increment e2 until e1 = e2 2: Add/Sub significands using the sign bits for s1 and s2. • set sign bit accordingly for the result. 3: Normalize result (sign bit kept separate): • shift s3 left and decrement e3 until MSB = 1. 4: Round s3 correctly. • more than 23 / 52 bits is used internally for the addition.

  9. Multiplication (s1 * 2e1) * (s2 * 2 e2 ) = s1 * s2 * 2 e1+e2 so, multiply significands and add exponents. Problem: Significand coded in sign & magnitude; use unsigned multiplication and take care of sign. Round 2n bits significand to n bits significand. Normalize result, compute new exponent with respect to bias.

  10. Division (s1 * 2e1 ) / (s2 * 2 e2 ) = (s1 / s2) * 2 e1-e2 • so, divide significands and subtract exponents • Problem: • Significand coded in signed- magnitude - use unsigned division (different algoritms exists) and take care of sign • Round n + 2 (guard and round) bits significand to n bits significand • Compute new exponent with respect to bias

More Related