Chapter 2: Floating point number systems and Round-off error

Chapter 2: Floating point number systems and Round-off error Floating point number system = the set of real numbers that can be represented exactly by a finite word length. fl(x) = the machine number that represents real number x also called the “floating point” representation of x |fl(x) – x| = round-off error in fl(x) distance between 1 and the next larger machine number is a common measure of round-off error

Definition of a floating point number system b = base p = precision (number of digits: d0, d1,…dp-1) L = lower limit of exponent U = upper limit of exponent All floating point numbers in the system can be written as fl = (d0 + d1/b + d2/b2 + ... + dp-1/bp-1)bE A period is commonly placed between d0 and d1 The base is commonly indicated by a subscript on dp-1 Example: 1.112x21 = (3.5)10 In a normalized system d0 0 0 < di<b -1 for i = 1, ..., p-1 E is any integer such that L < E < U.

A floating-point number system is finite and discrete: number of normalized floating-point numbers = 2(b-1) bp –1 (U – L + 1) explain smallest possible = UFL = bLexplain largest possible = OFL = bU +1 (1-b-p) prove special values: zero, Inf, and NaN

Note gap between 0 and bL Filled be allowing d0 to be zero when exponent has smallest value Called “subnormals” Smallest positive number becomes (0.01)2 x 2-1 = (0.125)10 How many subnormals?

euro (called “unit round off”) ≡ upper bound on relative round-off error in the floating point representation of x |(fl(x)-x)/x| <euroaffected by method of round off chop = drop digits when exhaust word length fl(x) is the nearest machine number on the side of x toward zero Also called “round toward zero”. euro = b1-p round to next means fl(x) is the nearest machine number to x on either side with ties broken by choosing floating point number with last stored digit even. euro = 0.5b1-p

Analysis of a “toy” floating point number system will be part of your next quiz.

Consider all numbers of the form (0.d1d2d3)2 x 2k with k = 0,+1 Which are excluded from a normalized system when b=2, p=4, L=-1, U=1 with and without subnomals?

All input to a floating point number system contains errors Consequences of Input Error x = true input value f = exact result for a given input x = inexact input (round-off error, incomplete knowledge, previous calculation, etc) f = numerical approximation for given input Total error =f(x) – f(x) =f(x) – f(x) + f(x) – f(x) computational propagated error data error

Condition number is a metric of consequence of input error Condition number ~ 1 means “well conditioned” The consequence of input error is the same order of magnitude as the input error

Example: intersection of 2 lines Dotted lines show magnitude of input error Lines are nearly perpendicular Lines are nearly parallel

Errors in floating-point arithmetic due to insufficient precision Let fl(x op y) be the floating point representation of the result of instruction x op y Recall euro≡ upper bound on relative round-off error relative floating point error is = |d| <euro Rewrite as fl(x op y) = (x op y)(1 + d), where d<euro fl(y + z) = (y + z)(1 + d1) where |d1| <euro fl(x(y + z)) = (x(y + z)(1 + d1))(1 + d2) where |d2| <euro fl(x(y + z)) = (x(y + z)(1 + d1 + d2) + O(d1d2) fl(x(y + z)) = (x(y + z)(1 + d) where |d| = |d1 + d2| < 2euro Does this suggest that sheer volume of computation effects accuracy?

s2 = (xi - <x>)2 safe method s2 = xi2 – n<x>2 sensitive to round-off error Loss of significance: when round-off error matters Subtraction of two numbers of same sign and nearly the same magnitude results in the loss of the most significant (i.e. leading) digits of the operands Example: if 0 < e < euro then fl(1 + e) – fl(1 – e) = 1 – 1 = 0  2e Note that the subtraction operation is exact for its operands The rounding error prior to subtraction left the operands with inadequate precision. Subtraction has simply enhanced the implications of rounding error. Example: Calculating standard deviation

Example 2: Solution of ax2 + bx + c = 0 by quadratic formula for the root that is affected by loss of significance

Example: loss of significance when ax2 + bx + c = 0 solved by quadratic formula

Assignment 8 Due 3-27-14 Problems from the text 6th edition 2.1-18 p56 2.1-38 p58 2.2-4 p68 2.2-10 p69 2.2-26c p70

Suggest problems from textbook on errors Chapter 2.1 Problem 11c p56 Chapter 2.2 Examples 1 p62, 3 p66 Problems 6 p69, 29 p70

Chapter 2: Floating point number systems and Round-off error

Chapter 2: Floating point number systems and Round-off error

Presentation Transcript

Residue Number systems

Transient Heat Conduction in Large Biot Number Systems

Floating Point Numbers

MIPS mul div, and MIPS floating point instructions

Chapter 17 Goertzel Algorithm

405 ECONOMETRICS Chapter # 13 : AUTOCORRELATION: WHAT HAPPENS IF THE ERROR TERMS ARE CORRELATED? By Domodar N. Gujarat

ECE 3551 Microcomputer Systems 1

Intellectual MARATHON

Chapter 1 Chemical Foundations

Chapter 10

Chapter 10: Inventory

Chapter 2 Modeling of Control Systems

Lecture Note 5 Computer Arithmetic

Chapter 3

Floating Point Arithmetic

Number Representation Part 2 Fixed-Radix Signed Representations Floating Point Representations

CHAPTER 10

NUMBER SYSTEMS

Chapter 1 Chemical Foundations

Floating Pier art