1 / 12

IEEE 754 Floating Point

IEEE 754 Floating Point. Luddy Harrison CS433G Spring 2007. What is represented. Real numbers 5.6745 1.23 × 10 19 Remember however that the representation is finite , so only a subset of the reals can be represented No trancendentals Limited range Limited precision (number of digits).

orien
Download Presentation

IEEE 754 Floating Point

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IEEE 754 Floating Point Luddy Harrison CS433GSpring 2007

  2. What is represented • Real numbers • 5.6745 • 1.23 × 1019 • Remember however that the representation is finite, so only a subset of the reals can be represented • No trancendentals • Limited range • Limited precision (number of digits)

  3. Normalizing Numbers • In Scientific Notation, we generally choose one digit to the left of the decimal point • 13.25 × 1010 becomes 1.325 × 1011 • Normalizing means • Shifting the decimal point until we have the right number of digits to its left (normally one) • Adding or subtracting from the exponent to reflect the shift

  4. Binary Floating Point • A binary number in scientific notation is called a floating point number • Examples: • 1.001 × 217 • 0.001 × 2-13

  5. Parts of a floating point number • ±1.mmmmmmm × B±eeee • A signed fixed-point fraction (±1.mmmmmmm) called the mantissa • For non-zero mantissas, the leading 1 is implicit • That is, it is not present in the representation (bit pattern), but it is assumed to be there when interpreting the bit pattern • See the previous lecture for the meaning of fixed point fractions • An implicit base B • A unsigned integer (±eeee) called the exponent • An implicit bias. The actual exponent is eeee – bias • Some bit patterns are reserved for special values • Not ANumber • ±∞

  6. About IEEE 754 • This standard defines several floating point types and the meaning of operations (+, ×, etc.) on them • Single • Double • Extended Precision • It deals at length with the thorny questions of • Erroneous and exceptional results • Rounding and conversion

  7. 32-bit Single Precision S E M 1 8 23 -1S × 1.M × 2E - 127 • E is an unsigned twos-complement integer. A bias of 127 is used, so that the actual exponent is E – 127. • Exponents 00000000 and 11111111 are reserved for special purposes • The sign bit of the mantissa is separated from magnitude bits of the mantissa. The mantissa is therefore an unsigned fixed point fraction with an implicit 1 to the left of the binary point. • All zero bits (S, E, and M) means zero (0). In this case there is no leading 1 mantissa bit implied.

  8. Some examples 0 0 0 = 0 (note that there is no implicit leading 1 here) 1 100 1010…0000 = -1 × 1.101 × 24-127 = -13/8 × 2-123 0 11111110 0000…0000 = 1.0 × 2254-127 = 1 × 2127

  9. Denormalized Numbers 0 00000000 0000…0001 = 0.0000…0001 × 2-126 An exponent field of zero is special; it indicates that there is no implicit leading 1 on the mantissa. This allows very small numbers to be represented. Note that we cannotnormalize this value. (Why?) Zero is effectively a denorm (and it cannot be normalized – why?) 0 11111110 0000…0001 = 1.0000…0001 × 2254-127 = 1.0000…0001 × 2127 Here, the mantissa has an implicit leading 1. If we wanted 0.0000…0001 × 2127 we could obtain it by writing 1.0 × 2104.

  10. 64-bit Double Precision S E M 1 11 52 -1S × 1.M × 2E - 1023 • E is an unsigned twos-complement integer. A bias of 1023 is used, so that the actual exponent is E – 1023. • As before, an exponent of all 0 bits or all 1 bits is reserved for special values. • As before, the mantissa is an unsigned fixed point fraction with an implicit 1 to the left of the binary point. The sign of the entire number is held separately in S. • A representation of all zero bits (S, E, and M) means zero (0). In this case there is no leading 1 mantissa bit implied.

  11. Infinity 0 11111111 0 = +∞ 1 11111111 0 = -∞

  12. Not ANumber x 11111111 ≠0 x 11111111 1xxx…xxxx Quiet NaN x 11111111 0xxx…xxxx ≠ 0 Signalling NaN

More Related