1 / 17

Chapter 2: Floating point number systems and Round-off error

Chapter 2: Floating point number systems and Round-off error. Floating point number system = the set of real numbers that can be represented exactly by a finite word length. . fl(x) = the machine number that represents real number x also called the “floating point” representation of x

tadita
Download Presentation

Chapter 2: Floating point number systems and Round-off error

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 2: Floating point number systems and Round-off error Floating point number system = the set of real numbers that can be represented exactly by a finite word length. fl(x) = the machine number that represents real number x also called the “floating point” representation of x |fl(x) – x| = round-off error in fl(x) distance between 1 and the next larger machine number is a common measure of round-off error

  2. Definition of a floating point number system b = base p = precision (number of digits: d0, d1,…dp-1) L = lower limit of exponent U = upper limit of exponent All floating point numbers in the system can be written as fl = (d0 + d1/b + d2/b2 + ... + dp-1/bp-1)bE A period is commonly placed between d0 and d1 The base is commonly indicated by a subscript on dp-1 Example: 1.112x21 = (3.5)10 In a normalized system d0 0 0 < di<b -1 for i = 1, ..., p-1 E is any integer such that L < E < U.

  3. A floating-point number system is finite and discrete: number of normalized floating-point numbers = 2(b-1) bp –1 (U – L + 1) explain smallest possible = UFL = bLexplain largest possible = OFL = bU +1 (1-b-p) prove special values: zero, Inf, and NaN

  4. Note gap between 0 and bL Filled be allowing d0 to be zero when exponent has smallest value Called “subnormals” Smallest positive number becomes (0.01)2 x 2-1 = (0.125)10 How many subnormals?

  5. euro (called “unit round off”) ≡ upper bound on relative round-off error in the floating point representation of x |(fl(x)-x)/x| <euroaffected by method of round off chop = drop digits when exhaust word length fl(x) is the nearest machine number on the side of x toward zero Also called “round toward zero”. euro = b1-p round to next means fl(x) is the nearest machine number to x on either side with ties broken by choosing floating point number with last stored digit even. euro = 0.5b1-p

  6. Analysis of a “toy” floating point number system will be part of your next quiz.

  7. Consider all numbers of the form (0.d1d2d3)2 x 2k with k = 0,+1 Which are excluded from a normalized system when b=2, p=4, L=-1, U=1 with and without subnomals?

  8. All input to a floating point number system contains errors Consequences of Input Error x = true input value f = exact result for a given input x = inexact input (round-off error, incomplete knowledge, previous calculation, etc) f = numerical approximation for given input Total error =f(x) – f(x) =f(x) – f(x) + f(x) – f(x) computational propagated error data error

  9. Condition number is a metric of consequence of input error Condition number ~ 1 means “well conditioned” The consequence of input error is the same order of magnitude as the input error

  10. Example: intersection of 2 lines Dotted lines show magnitude of input error Lines are nearly perpendicular Lines are nearly parallel

  11. Errors in floating-point arithmetic due to insufficient precision Let fl(x op y) be the floating point representation of the result of instruction x op y Recall euro≡ upper bound on relative round-off error relative floating point error is = |d| <euro Rewrite as fl(x op y) = (x op y)(1 + d), where d<euro fl(y + z) = (y + z)(1 + d1) where |d1| <euro fl(x(y + z)) = (x(y + z)(1 + d1))(1 + d2) where |d2| <euro fl(x(y + z)) = (x(y + z)(1 + d1 + d2) + O(d1d2) fl(x(y + z)) = (x(y + z)(1 + d) where |d| = |d1 + d2| < 2euro Does this suggest that sheer volume of computation effects accuracy?

  12. s2 = (xi - <x>)2 safe method s2 = xi2 – n<x>2 sensitive to round-off error Loss of significance: when round-off error matters Subtraction of two numbers of same sign and nearly the same magnitude results in the loss of the most significant (i.e. leading) digits of the operands Example: if 0 < e < euro then fl(1 + e) – fl(1 – e) = 1 – 1 = 0  2e Note that the subtraction operation is exact for its operands The rounding error prior to subtraction left the operands with inadequate precision. Subtraction has simply enhanced the implications of rounding error. Example: Calculating standard deviation

  13. Example 2: Solution of ax2 + bx + c = 0 by quadratic formula for the root that is affected by loss of significance

  14. Example: loss of significance when ax2 + bx + c = 0 solved by quadratic formula

  15. Assignment 8 Due 3-27-14 Problems from the text 6th edition 2.1-18 p56 2.1-38 p58 2.2-4 p68 2.2-10 p69 2.2-26c p70

  16. Suggest problems from textbook on errors Chapter 2.1 Problem 11c p56 Chapter 2.2 Examples 1 p62, 3 p66 Problems 6 p69, 29 p70

More Related