1 / 32

COMS 161 Introduction to Computing

Learn about the representation and limitations of real numbers in floating point format, using IEEE Standard 754. Explore single and double precision, special cases, and exception handling.

danaher
Download Presentation

COMS 161 Introduction to Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMS 161Introduction to Computing Title: Numeric Processing Date: November 05, 2004 Lecture Number: 29

  2. Announcements • Homework 7 • Due November 8, 2004 • Research paper proposal 2 due • November 8, 2004

  3. Review • Integers • Big-endian • Little-endian • Overflow

  4. Outline • Real numbers • Representation • Limitations

  5. Real (Decimal) Number Storage • Real numbers are stored in floating point representation • IEEE Standard 754 • Allows using data on different machines • A sign • An exponent • A mantissa also called a significand (normalized decimal fraction) • Single digit to the left of the decimal point

  6. IEEE Standard 754 • Provides two floating point types • Single • 24-bits of significand precision • Double • 53-bits of significand precision • Five exceptions • Invalid operation • Division by zero • Overflow • Underflow • Inexact

  7. IEEE Standard 754 • Four rounding directions • Toward the nearest representable value • "even" values preferred whenever there are two nearest representable values • Toward negative infinity (down) • Toward positive infinity (up) • Toward 0 (chop)

  8. s exponent significand 30 23 22 31 0 Single Precision • IEEE standard 754 • Floating point number representation • 32-bit s eeeeeeee fffffff ffffffffffffffff • s: (1) sign bit • 0 means positive, 1 means negative

  9. Single Precision s eeeeeeee fffffff ffffffffffffffff • e: (8) exponent bits [-126 … 127] • A bias of 127 is added to the exponent • Exponent of 0 is stored as 127, stored exponent of 200 means actual exponent is (200 – 127) = 73 • Stored exponent of all zeros and ones are reserved for special numbers • f: (24) fractional part [23 bits + 1 implied bit] • Since number to the left of the decimal point is not zero, its binary representation will have a leading one • Saves a bit, a one is implied and does not need to be explicitly stored

  10. Special Single Cases • Two zeros • Signed zero • e = 0, f = 0 (exponent and fractional bits are all 0) • (-1)s x 0.0 • 0000 0000 0000 0000 0000 0000 0000 0000 • 0x0000 0000 (+0) • 1000 0000 0000 0000 0000 0000 0000 0000 • 0x8000 0000 (-0)

  11. Special Single Cases • Positive infinity • +INF • s = 0, e = 255, f = 0 (all fractional bits are all 0) • 0111 1111 1000 0000 0000 0000 0000 0000 • 0x7f80 0000 • Negative infinity • -INF • s = 1, e = 255, f = 0 (all fractional bits are all 0) • 1111 1111 1000 0000 0000 0000 0000 0000 • 0xff80 0000

  12. Special Single Cases • Not-A-Number (NaN) • s = 0 | 1, e = 255, f != 0 (at least one fractional bit is NOT 0) • There are many representations for NaN • Here is one example • 0111 1111 1100 0000 0000 0000 0000 0000 • 0x7fc0 0000

  13. Special Single Cases • Maximum single number • 0111 1111 0111 1111 1111 1111 1111 1111 • 0x7f7f ffff • 3.40282347 x 1038 • Minimum positive single number • 0000 0000 1000 0000 0000 0000 0000 0000 • 0x00800000 • 1.17549435 x 10-38 • To represent larger numbers

  14. Double Precision • IEEE standard 754 • Floating point number representation • 64-bit s eeeeeeeeeeeffffffffffffffffffffffffffffffffffffffffffffffffff • s: (1) sign bit • 0 means positive, 1 means negative s exponent significand 62 52 51 63 32 significand 31 0

  15. Single Precision s eeeeeeeeeeeffffffffffffffffffffffffffffffffffffffffffffffffff • e: (11) exponent bits [-1022 … 1023] • A bias of 1023 is added to the exponent • Exponent of 0 is stored as 1023, stored exponent of 2000 means actual exponent is (2000 – 1023) = 977 • Stored exponent of all zeros and ones are reserved for special numbers • f: (53) fractional part [52 bits + 1 implied bit] • Since number to the left of the decimal point is not zero, its binary representation will have a leading one • Saves a bit, a one is implied and does not need to be explicitly stored

  16. Byte 0 1 2 3 seeeeeee eee f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f Byte 4 5 6 7 Real (Decimal) Number Storage • Double precision floating point numbers • s: (1) sign bit • e: (11) exponent bits [-1022 … 1023] • f: (53) fractional part [52 bits + 1 implied bit]

  17. Special Double Cases • Two zeros • Signed zero • e = 0, f = 0 (exponent and fractional bits are all 0) • (-1)s x 0.0 • 64 bits • 0000 0000 0000 0000 0000 0000 0000 … 0000 • 0x0000 0000 0000 0000 (+0) • 1000 0000 0000 0000 0000 0000 0000 … 0000 • 0x8000 0000 0000 0000 (-0)

  18. Special Double Cases • Positive infinity • +INF • s = 0, e = 2047, f = 0 (all fractional bits are all 0) • 0111 1111 1111 0000 0000 0000 0000 … 0000 • 0x7ff0 0000 0000 0000 • Negative infinity • -INF • s = 1, e = 2047, f = 0 (all fractional bits are all 0) • 1111 1111 1111 0000 0000 0000 0000 … 0000 • 0xfff0 0000 0000 0000

  19. Special Double Cases • Not-A-Number (NaN) • s = 0 | 1, e = 2047, f != 0 (at least one fractional bit is NOT 0) • There are many representations for NaN • Here is one example • 0111 1111 1111 1000 0000 0000 0000 … 0000 • 0x7ff8 0000 0000 0000

  20. Special Double Cases • Maximum double number • 0111 1111 1110 1111 1111 1111 1111 … 1111 • 0x7fef ffff ffff ffff • 1.7976931348623157 x 10308 • Minimum positive single number • 0000 0000 0001 0000 0000 0000 0000 … 0000 • 0x0010 0000 0000 0000 • 2.2250738585072014 x 10-308

  21. Decimal to Float Conversion • Show –24.12510 in IEEE single precision format • First, save sign (negative so 1) and convert to binary… • 24.12510 = 11000.0012 x 20 • Normalize… • = 1.10000012 x 24 • Strip 1 off the mantissa and extend to form significand • = .10000010000000000000000 • Bias the exponent… • Exp + Bias = 4 + 127 = 131 = 100000112

  22. Real (Decimal) Number Storage • 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 • 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 • Hex value : 0xC1C10000 • Link me baby

  23. Real (Decimal) Number Storage • Numbers have limited precision Compute 1

  24. Real (Decimal) Number Storage #include <iostream.h> void main() { cout << "precision example" << endl; cout << "Number of bytes in a float: " << sizeof(float) << endl; float epsilon = 1.0f, value; int iteration = 0; int maxIteration = 100; while(iteration < maxIteration) { epsilon /= 2.0; value = 1.0f + epsilon; if (value == 1) break; iteration++; } // end while(...) cout << "Iteration: " << iteration << " Epsilon: " << epsilon << " Value: " << value << endl << endl; iteration = 0; double epsilonD = 1.0, valueD; cout << "Number of bytes in a double: " << sizeof(double) << endl; while(iteration < maxIteration) { epsilonD /= 2.0; valueD = 1.0 + epsilonD; if (valueD == 1) break; iteration++; } // end while(...) cout << "Iteration: " << iteration << " Epsilon: " << epsilonD << " Value: " << valueD << endl; }

  25. Real (Decimal) Number Storage • Numbers have limited precision • Most real numbers have an infinite decimal expansion

  26. Real Number StorageLimited Range and Precision • There are three categories of numbers left out when floating point representation is used • Numbers out of range because their absolute value is too large (similar to integer overflow) • Numbers out of range because their absolute value is too small (numbers too near zero to be stored given the precision available • Numbers whose binary representations require either an infinite number of binary digits or more binary digits than the bits available

  27. Real Number StorageLimited Range and Precision Illustrated With one bit to the right of the decimal point, only the real number 0.5 can be represented.

  28. Real Number StorageLimited Range and Precision Illustrated real numbers that can be represented with two bits 0.25, 0.5, 0.75 real numbers that can be represented with three bits 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875 The holes correspond to all the unrepresented numbers: 0.126, 0.255, 0.3, …

  29. Limited Range and PrecisionSome Consequences • Limited range will invalidate certain calculations • If integers are involved, this can often be avoided by switching to real numbers • For real number calculations, this problem arises infrequently and in those cases can sometimes be handled by special methods • It is not a common occurrence in non-scientific work

  30. Limited Range and PrecisionSome Consequences • Limited precision for real numbers is very pervasive • Assume that most decimal calculations will, in fact, be in error! • Evaluate and use computer calculations with this in mind

  31. Social ThemesRisks in Numerical Computing • Almost all computer calculations involve roundoff error (limited precision error) • If not monitored and planned for carefully, such errors can lead to unexpected and catastrophic results • Arianne 5 Rocket Failure • Patriot Missile Failure during Gulf War

  32. Software for Numerical Work • Software Libraries • Spreadsheets • Mathematical Software • symbolic manipulation • data analysis • data visualization

More Related