1 / 21

COE 308

COE 308. Floating Point. The World is not just Integers. 3.14159265… ten ( π ). 2.71828… ten ( e ). 0.0000000001 ten or 1.0 ten x 10 -9. 3,155,760,000 ten or 3.15576 ten x 10 9. Scientific Notation : A.AAAAA x 10 yyyy. Incorrect (un-normalized) notation.

jtenney
Download Presentation

COE 308

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COE 308 Floating Point COE 308

  2. The World is not just Integers 3.14159265…ten (π) 2.71828…ten (e) 0.0000000001ten or 1.0ten x 10-9 3,155,760,000ten or 3.15576ten x 109 Scientific Notation: A.AAAAA x 10yyyy Incorrect (un-normalized) notation Correct Normalized Notation 1.0ten x 10-9 0.1ten x 10-8 3.15576ten x 109 31.5576ten x 108 COE 308

  3. Floating Point Scientific Notation in binary: 1.XXXXXXXXX . 2yyyyyyy • Representation: • Sign: S • Exponent: E • Significand: F 1.0 > F ≥ 0 (-1)S x (1.0 + F) x 2E COE 308

  4. Floating Point Representation GOAL  Quickly compare two FP numbers By considering them unsigned integers • Options: • Significand: • Sign + Magnitude • Two’s Complement • Exponent: • Sign + Magnitude • Exponent • Biased Significand:F Sign Exponent Significand:F Exponent Sign + Magnitude Two’s Complement COE 308

  5. Floating Point Representation- Two’s Complement - Example: Consider the following two numbers: A = 1.32 x 217 B = - 1.22 x 217 A = 10101000111101011100001 x 217 B = - 00111000011010001111010 x 217 B = 11000111100101110000110 x 217 in 2’s complement representation 0 00010001 10101000111101011100001 A 0 00010001 11000111100101110000110 B Although in fact A > B because A>0 and B<0, The two numbers as represented above give the impression that B > A if we consider them as two 32-bits unsigned integers. Two’s Complement Representation Unsuitable for Quickly comparing two FP numbers COE 308

  6. Floating Point Representation- Sign + Magnitude - A > B because of the sign bit. In Sign + Magnitude representation, the two numbers A = 1.32 x 217 and B = - 1.22 x 217 will be represented as follows: 0 00010001 10101000111101011100001 A 1 00010001 00111000011010001111010 B Sign + Magnitude Representation suitable for Quickly comparing two FP numbers COE 308

  7. How about the Exponent- Two’s Complement - In previous examples, A and B exponents were positive. How about if one of the exponents is negative? Example: A = 1.32 x 2-17 and B = 1.22 x 217 Let’s represent the exponent in two’s complement representation 0 11101111 10101000111101011100001 A 0 00010001 00111000011010001111010 B Although in fact A < B, the two numbers as represented above give the impression that A > B Exponent Two’s Complement Representation Unsuitable for Quickly comparing two FP numbers COE 308

  8. How about the Exponent- Biased Representation - A > B because of the biased representation of exponents In a biased representation, we add an offset to the exponent so that: The lowest negative exponent is represented with the value ONE (00000001) If a number K = k1. 2e  E = e + bias For Single Precision, bias = 127 and for Double Precision, bias = 1023 0 01101110 10101000111101011100001 A E = -17 + 127 = 110 0 10010000 00111000011010001111010 B E = 17 + 127 = 144 Biased Representation Suitable COE 308

  9. Floating Point Representation- Summary - Sign:S Exponent:E Significand:F • Sign + Magnitude better than 2’s complement • Exponent represented in a bias notation (-1)S x (1.0 + F) x 2E-bias IEEE-754 Standard COE 308

  10. MIPS Floating Point Formats Follows IEEE-754 floating point standard Single Precision S:1 bit E: 8 bits F: 23 bits Double Precision S:1 bit E: 11 bits F: 20 bits F: 32 bits COE 308

  11. Conversion How to convert from decimal to FP ? COE 308

  12. Representation of Zero • Zero is represented with: • Sign bit at 0 • Exponent field composed of all bits at 0 • Significand bits are too all at 0 0 00000000 00000000000000000000000 COE 308

  13. Two Issues with FP • Overflow occurs when the exponent of the result is larger than the available bits for the exponent field • Underflow occurs when the result is smaller than the smallest number that can be represented and will yield a significand of 0s. COE 308

  14. Representation of Exceptions • Infinity • Represented as a number with an exponent of 255 (Single Precision) or 2047 (Double Precision) • The sign determines whether it is ±  +  0 11111111 00000000000000000000000 -  1 11111111 00000000000000000000000 • NaN: Not a Number. Used to represent errors and exceptions • Represented with maximum E and F≠0 • Result of exception like division by 0 or square root of negative number • Operation on a NaN will result in a NaN. COE 308

  15. Floating Point Addition Example: Add two numbers A = 1.85 x 1025 and B = 1.45 x 1017 How to proceed ? 1.85 x 1025 = 185000000.00 x 1017 1.45 x 1017 = 1.45 x 1017 ------------------------------------------------------ Need to align the two significands. Alignment of significands to have same exponents So that Addition becomes POSSIBLE • Alignment of significands in binary is performed by shifts • Smaller exponent number is shifted to the right COE 308

  16. Addition or Subtraction ? • A and B may not be of the same sign • Need also want to simply provide subtraction • Need to Select between A and B to determine which one needs to be shifted right • Need to Select between A and B to know which one needs to be complemented (2’s complement) • Needs also to determine the sign of the result (whether to complement the result or not) COE 308

  17. Floating Point Addition • Compare exponents • Shift smaller number to the right (increment its exponent) until its exponent matches the larger exponent • Complement one of the two operands (if needed) • Add the two significands • Loop to Normalize the result • Shift (left/right) to normalize the result • Detect overflow/underflow • Round the significand to the appropriate number of bits • Complement the result (if needed) COE 308

  18. Floating Point Addition Circuit - Shift Exponent Compare Unit complement complement Sign Determination Unit + complement Normalization and Rounding COE 308

  19. Floating Point Multiplication • Add the two biased exponents • Subtract one time a bias to get a biased exponent • Because E1 = e1 + bias and E2 = e2 + bias • E1 + E2 = e1 + e2 + 2xbias • Multiply the significands • Loop to Normalize the result • Shift (left/right) to normalize the result • Detect overflow/underflow • Round the significand to the appropriate number of bits • Set the sign of the product COE 308

  20. Floating Point Multiplication Circuit + Exponent Addition Unit Sign Determination Unit Integer Multiplication Circuit Bias - Normalization and Rounding COE 308

  21. Floating Point Instructions in MIPS • 32 Floating Point Registers $f0, …, $f31 • add.s, sub.s, mul.s and div.s: single precision • add.d, sub.d, mul.d and div.d: double precision • lwc1, swc1: load/store fp to/from memory • bc1t, bc1f: branch if FP cond true/false • c.lt.s, c.lt.d: compare single/double precision. COE 308

More Related