1 / 53

Floating Point Numbers

Floating Point Numbers. Material on Data Representation can be found in Chapter 2 of Computer Architecture (Nicholas Carter). Fractions. Similar to what we’re used to with decimal numbers. Converting decimal to binary II. 98.61 Integer part 98 / 2 = 49 remainder 0

Download Presentation

Floating Point Numbers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Floating Point Numbers Material on Data Representation can be found in Chapter 2 of Computer Architecture (Nicholas Carter)

  2. Fractions • Similar to what we’re used to with decimal numbers

  3. Converting decimal to binary II • 98.61 • Integer part • 98 / 2 = 49 remainder 0 • 49 / 2 = 24 remainder 1 • 24 / 2 = 12 remainder 0 • 12 / 2 = 6 remainder 0 • 6 / 2 = 3 remainder 0 • 3 / 2 = 1 remainder 1 • 1 / 2 = 0 remainder 1 • 1100010

  4. Converting decimal to binary III • 98.61 • Fractional part • 0.61  2 = 1.22 • 0.22  2 = 0.44 • 0.44  2 = 0.88 • 0.88  2 = 1.76 • 0.76  2 = 1.52 • 0.52  2 = 1.04 • .100111

  5. Another Example (Whole number part) • 123.456 • Integer part • 123 / 2 = 61 remainder 1 • 61 / 2 = 30 remainder 1 • 30 / 2 = 15 remainder 0 • 15 / 2 = 7 remainder 1 • 7 / 2 = 3 remainder 1 • 3 / 2 = 1 remainder 1 • 1 / 2 = 0 remainder 1 • 1111011

  6. Checking: Find Calculator on menu

  7. Put the calculator in Programmer view

  8. Enter number (in Decimal), read off binary or put into binary mode if you want to use copy/Paste

  9. Another Example (fractional part) • 123.456 • Fractional part • 0.456  2 = 0.912 • 0.912  2 = 1.824 • 0.824  2 = 1.648 • 0.648  2 = 1.296 • 0.296  2 = 0.592 • 0.592  2 = 1.184 • 0.184  2 = 0.368 • … • .0111010…

  10. Convert to decimal mode, then

  11. Ctrl-C to copy the displayed number. Switch to Scientific View. Ctrl-V to paste

  12. Divide by 2 raised to the number of digits (in this case 7, including leading zero) 1 2

  13. Divide by 2 raised to the number of digits (in this case 7, including leading zero) 3 4

  14. Finally hit the equal sign. In most cases it will not be exact

  15. Other way around • Multiply fraction by 2 raised to the desired number of digits in the fractional part. For example • .456  27 = 58.368 • Throw away the fractional part and represent the whole number • 58 111010 • But note that we specified 7 digits and the result above uses only 6. Therefore we need to put in the leading 0 • 0111010

  16. Fixed point • If one has a set number of bits reserved for representing the whole number part and another set number of bits reserved for representing the fractional part of a number, then one is said to be using fixed point representation. • The point dividing whole number from fraction has an unchanging (fixed) place in the number.

  17. Limits of the fixed point approach • Suppose you use 4 bits for the whole number part and 4 bits for the fractional part (ignoring sign for now). • The largest number would be 1111.1111 = 15.9375 • The smallest, non-zero number would be 0000.0001 = .0625

  18. Floating point representation • Floating point representation allows one to represent a wider range of numbers using the same number of bits. • It is like scientific notation.

  19. Scientific notation • Used to represent very large and very small numbers. • Ex. Avogadro’s number •  6.0221367  1023 particles •  602213670000000000000000 • Ex. Fundamental charge e •  1.60217733  10-19 C •  0.000000000000000000160217733 C

  20. Scientific notation: all of these are the same number • 12345.6789 = 1234.56789  100 • 1234.56789  10 = 1234.56789  101 • 123.456789  100 =123.456789  102 • 12.3456789  103 • 1.23456789  104 • Rule: Shift the point to the left and increment the power of ten.

  21. Small numbers • 0.000001234 • 0.00001234  10-1 • 0.0001234  10-2 • 0.001234  10-3 • 0.01234  10-4 • 0.1234  10-5 • 1.234  10-6 • Rule: shift point to the right and decrement the power.

  22. IEEE 754 standards • The standards for floating point numbers are known as IEEE 754. • Starting with the fixed point binary representation, shift the point and increase the power (of 2 now that we’re in binary). • Like Scientific Notation, shift so that the number has one non-zero whole number digit (not 0 hence a 1) and the remainder are fractional bits.

  23. Floats (98.61) • SHIFT expression so it is between 1 and 2 and keep track of the number of shifts • 1100010.10011100001010001 • 1.10001010011100001010001  26 • Express the number of shifts in binary • 1.10001010011100001010001  200000110 We’re not done yet so this exponent will change.

  24. Mantissa and Exponent and Sign • 1.10001010011100001010001  200000110 • (Significand) Mantissa • 1.10001010011100001010001  200000110 • Exponent • +1.10001010011100001010001  200000110 • The number may be negative, so there a bit (the sign bit) reserved to indicate whether the number is positive or negative

  25. Small numbers • 0.000010101110 • 1.0101110  2-5 • The power (a.k.a. the exponent) could be negative so we have to be able to deal with that. • Floating point numbers use a procedure known as biasing to handle the negative exponent problem.

  26. Biasing • Actually the exponent is not represented as shown previously. • There were 8 bits used to represent the exponent on the previous slide, that means there are 256 numbers that could be represented. • Since the exponent could be negative (to represent numbers less than 1), we choose roughly half of the range to be positive and half to be negative .

  27. Biasing (Cont.) • In biasing, one does notuse 2’s complement or a sign bit. • Instead one adds a bias (equal to the magnitude of the most negative number) to the exponents and represents the result of that addition.

  28. Biasing (Cont.) • The exponents of all 1’s is reserved for special purposes – as is the exponent of all 0’s. • Thus with 8 bits, the bias is 127 (= 27 -1 that is 2 raised to the number of bits used for the exponent minus one). • In our previous example, we had to shift 6 times to the left, corresponding to an exponent of +6. • We add that shift to the bias 127+6=133. • That is the number we put in the exponent portion: 133  10000101.

  29. Big floats – a quick comparison • Assume we use 8 bits, 4 for the mantissa and 4 for the exponent (neglecting sign). What is the largest float? • Mantissa: 1111 Exponent 1111 • 0.9375  27 • =120 • (Compare this to the largest fixed-point number using the same amount of space 15.9375)

  30. Small floats – a quick comparison • Assume we use 8 bits, 4 for the mantissa and 4 for the exponent (neglecting sign). What is the smallest float? • Mantissa: 1000 Exponent 0000 • 0.5  2-8 • = 0.001953125 • (Compare this to the smallest fixed-point number using the same amount of space .0625)

  31. Mantissa Storage • 1.10001010011100001010001  200000110 • (Significand) Mantissa • Our rules have use starting with 1.something (there are a few exceptions). • The standards come from a time when storage was “expensive” – so why store a digit that is always 1? So the standard does not store the 1 – it is implied.

  32. The pieces • One bit for a sign • Eight bits for an exponent – biased by 127 • Twenty-three digits for the mantissa – which does not include the implied 1 • +98.61 • Sign: 0 • Exponent: 1000 0101 • Mantissa: 1000 1010 0111 0000 1010 001

  33. https://www.h-schmidt.net/FloatConverter/IEEE754.html

  34. Adding Floats • Consider adding the following numbers expressed in scientific notation 3.456789  103 1.212121  10-2 • The first step is to re-express the number with the smaller magnitude so that it has the same exponent as the other number.

  35. Adding Floats (Cont.) • 1.212121  10-2 • 0.1212121  10-1 • 0.01212121  100 • 0.001212121  101 • 0.0001212121  102 • 0.00001212121  103 • The number was shifted 5 times (3-(-2)).

  36. Adding Floats (Cont.) • When the exponents are equal the mantissas can be added. 3.456789  103 0.00001212121  103 • =3.45680112121  103

  37. Rounding • In a computer there are a finite number of bits used to represent a number. • When the smaller floating-point number is shifted to make the exponents equal, some of the less significant bits are lost. • This loss of information (precision) is known as rounding.

  38. One more fine point about floating-point representation • As discussed so far, the mantissa (significand) always starts with a 1. • When storage was expensive, designers opted not to represent this bit, since it is always 1. • It had to be inserted for various operations on the number (adding, multiplying, etc.), but it did not have to be stored.

  39. Still another fine point • When we assume that the mantissa must start with a 1, we lose 0. • Zero is too important a number to lose, so we interpret the mantissa of all zeros and exponent of all zeros as zero • Even though ordinarily we would assume the mantissa started with a one that we didn’t store.

  40. Yet another fine point • In the IEEE 754 format for floats, you bias by one less (127) and reserve the exponents 00000000 and 11111111 for special purposes. • One of these special purposes is “Not a number” (NaN). • Another in “Infinity” which is the floating point version of overflow.

  41. An example • Represent -9087.8735 as a float using 23 bits for the mantissa, 8 for the exponent and one for the sign. • The float stores 23 bits but there is an implied bit, so we will talk about 24. • Convert the whole number magnitude 9087 to binary: 10 0011 0111 1111 • That uses up 14 of the 24 bits for the mantissa (23 stored), leaving 10 for the fractional part.

  42. An example (Cont.) • Multiply the fractional part by 210 and convert whole number part of that to binary, make sure in uses 9 bits (add leading 0’s if it doesn’t). • .8735  210 = 894.464 • 894  1101111110

  43. An example (Cont.) • 10001101111111.1101111110 • 1.00011011111111101111110  213 • Mantissa (1)00011011111111101111110 • Exponent 13+127=140  10001100 • Sign bit 1 (because number was negative) • The actual order is sign-exponent-mantissa

  44. Check

  45. Example 2 • 0.0076534 • No whole number part. Begin by using all 24 (sic) mantissa bits for the fractional part. • 0.0076534  224 = 128402.7449344 • 128402  11111010110010010 • Only uses 17 places, means that so far number starts with 7 zeros. But float mantissas are supposed to start with 1. • .000000011111010110010010 • 1.1111010110010010×2-8 • (But we need more digits for our mantissa)

  46. Example 2 (Cont.) 24+7 • 0.0076534  231 = 16435551.3516032 • 16435551  1111 1010 1100 1001 0101 1111 • Above is mantissa • Exponent 127 – 8 = 119  01110111 • Sign bit 0 (positive number)

  47. Check

  48. Reverse • 10000111101010001100100111110101 • 10000111101010001100100111110101 • Sign bit is one number is negative • Exponent 00001111 15  15-127 (unbias)  -112 • Mantissa: 1.01010001100100111110101

  49. Reverse (Cont) • 1.01010001100100111110101 × 223 / 223 • 101010001100100111110101 / 223 • 11061749 / 223

  50. Reverse (Cont). • -1.31866323947906494140625*2^(-112)

More Related