1 / 38

Advanced Computer Arithmetic Floating Point Arithmetic Week 3

CENG536 Computer Engineering department Ç ankaya University . Advanced Computer Arithmetic Floating Point Arithmetic Week 3. The problem with fixed-point representation is illustrated by the following examples: .

adonica
Download Presentation

Advanced Computer Arithmetic Floating Point Arithmetic Week 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CENG536 Computer Engineering department Çankaya University Advanced Computer ArithmeticFloating Point ArithmeticWeek 3

  2. The problem with fixed-point representation is illustrated by the following examples: The relative representation error due to truncation is quite significant for x while it is much less severe for y. On the other hand, both x2 and y2 are unrepresentable, because their computations lead to underflow (number too small) and overflow (too large), respectively. Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  3. This numbers can be represented as The exponent -5 or +7 essentially indicates the direction and amount by which the radix-point must be moved to produce the corresponding fixed-point representation shown above. Hence, the designation is “floating-point numbers”. Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  4. A floating-point number has four components: the sign, the significand (mantissa) s, the exponent base b, and the exponent e. The exponent base is usually a power of two except for digital arithmetic, where it is 10. Floating-Point Numbers mantissa CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  5. A typical floating-point format. A key point to observe is that two signs are involved in a floating-point number. Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  6. The use of biased exponent format has virtually no effect on the speed or cost of exponent arithmetic (addition / subtraction), given small number of bits involved. It does, however, facilitate zero detection (zero can be represented with the smallest biased exponent of 0 and an all-zero significand) and magnitude comparison (we can compare normalized floating-point numbers as if they were integers). Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  7. The range of values in a floating-point number representation is composed of the intervals [- max, - min] and [max, min] : Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  8. Number distribution pattern and subranges in presentations: There are three special or singular values -, 0 +. Zero is special because it can not be presented with a normalized mantissa (significand). Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  9. Overflow occurs when a result is less then – max or greater then + max. Underflow, on the other hand, occurs for results in a range (– min, 0) or (0, min) Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  10. The equation for the value of a floating-point number suggests that the range [- max, max] increases if we choose a larger exponent base b. Alarger b also simplifies arithmetic operations on the exponents, since for the given range, smaller exponents must be dealt with. However, if the significand is to be kept in normalized form, effective precision decreases for larger b. In the past, machines with b = 2, 8, 16, or 256 were built. Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  11. The exponent sign is almost always encoded in a biased format. As for a sign of a floating-point number, alternatives to the currently dominant signed-magnitude format include the use the 1’s or 2’s complement representation. Several variations have been tried in the past, including the complementation of the significand part only and the complementation of the entire number (including the exponent part) when the number to be represented is negative. Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  12. The two representation formats in IEEE standard for binary floating-point numbers (ANSI/IEEE Std 754-1985) are depicted: The ANSI/IEEE Floating-Point Standard CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  13. The ANSI/IEEE Floating-Point Standard CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  14. Standard defines extended formats that allow implementation to carry higher precisions internally to reduce the effect of accumulated errors. Two extender formats are defined: The ANSI/IEEE Floating-Point Standard CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  15. Value = N = (-1)s 2 E-127  (1.M) The decimal number 0.7510 is to be represented in the IEEE 754 single precision format: 0.7510 = 0.112 (converted to a binary number) = 1.1  2-1(normalized a binary number) hidden The mantissa is positive so the sign S is given by S = 0 The biased exponent E is given by E = e + 127 E = - 1 + 127 = 12610 = 0111 11102 Fractional part of mantissa M = .1000…..000 (in 23 bits) Floating-Point Conversion Example CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  16. The IEEE 754 single precision representation is given by: Sign Exponent Bits Mantissa 1 bit 8 bits 23 bits Floating-Point Conversion Example CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  17. The decimal number – 2345.12510 is to be represented in the IEEE 754 single precision format: – 2345.12510= – 1001 0010 1001.0012 (converted to binary) = – 1.0010 0101 0010 012  211(normalized binary) hidden The mantissa is negative so the sign S is given by S = 1 The biased exponent E is given by E = e + 127 E = 11 + 127 = 13810 = 1000 10102 Fractional part of mantissa M = .0010 0101 0010 0100 ... 000 (in 23 bits) Floating-Point Conversion Example CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  18. The IEEE 754 single precision representation is given by: Sign Exponent Bits Mantissa 1 bit 8 bits 23 bits Floating-Point Conversion Example CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  19. Basic arithmetic on floating-point numbers is conceptually simple. However, care must be taken in hardware implementation for ensuring corrections and avoiding undue loss of precision; in addition, it must be possible to handle any exceptions. Addition and subtraction are most difficult of the elementary operations for floating-point operands. Here, we deal only with addition, since subtraction can be converted to addition by flipping the sign of subtrahend. Basic Floating-Point Algorithms CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  20. Consider the addition Assuming , we begin by aligning the two operand through right-shifting of the significand (mantissa) of the number with the smaller exponent. Basic Floating-Point Algorithms CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  21. If the exponent base b and the number representation radix (base) are the same, we simply shift s2 to the right by e1 – e2digits. When b =rathe shift amount, which is computed through direct subtraction of the biased exponent, is multiplied by a. In either case, this step is referred to as alignment shift, or preshift, (in contrast to normalization shift or postshift which is needed when the resulting significand s is unnormalized). Basic Floating-Point Algorithms CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  22. We then perform addition as follows Basic Floating-Point Algorithms CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  23. Floating-point multiplication is simpler then floating-point addition; it is performed by multiplying the significands and adding the exponents Postshifting may be needed, since the product s1  s2of the two significands can be unnormalized. For example, we have , leading to the possible need for a single-bit right shift. Also, the computed exponent needs adjustment if the exponents are biased or if a normalization shift is performed. Overflow/underflow is possible during multiplication if e1 and e2 have like signs. Overflow is also possible due to normalization. Basic Floating-Point Algorithms CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  24. Similarly, floating-point division is performed by dividing the significands and subtracting the exponents Here, problems to be dealt with are similar to those of multiplication. The ratio of the significands may have to be normalized. For example we have and a single bit left-shift is always adequate. The computed exponent needs adjustment is the exponents are biased or if a normalizing shift is performed. Overflow / underflow is possible during division if e1 and e2 have unlike signs. Underflow due to normalization is also possible. Basic Floating-Point Algorithms CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  25. To extract the square root of a positive floating-point number, we first make its exponent even. This may require subtracting 1 from the exponent and multiplying the significand by b. We then use the following In the case of IEEE floating-point numbers, the adjusted significand will be in the range 1  s  4, which leads directly to a normalized significand for the result. Square-rooting never produced overflow or underflow. Basic Floating-Point Algorithms CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  26. Floating-Point Addition Algorithm CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  27. Floating-Point Addition Algorithm Flowchart CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  28. Floating-Point Addition Algorithm Example CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  29. Floating-Point Addition Algorithm Notes CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  30. Floating-Point Subtraction Algorithm CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  31. Floating-Point Subtraction Flowchart CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  32. Floating-Point Multiplication Algorithm CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  33. Floating-Point Multiplication Flowchart CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  34. Floating-Point Multiplication Example CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  35. Floating-Point Multiplication Notes CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  36. Floating-Point Division Algorithm CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  37. Floating-Point Error Rounding CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

  38. Floating-Point Error Rounding Observations CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

More Related