**Arithmetic Systems** Kwang Hee Ko School of Mechatronics Gwangju Institute of Science and Technology

**Introduction** • Representation of numbers and associated operations in digital computers. • Various arithmetic systems are used. • Integer Arithmetic • Floating Point Arithmetic • Rational Arithmetic • Interval Arithmetic

**Floating Point Arithmetic** • Current CAD/Graphics systems operate in floating point arithmetic (FPA). • Basic arithmetic operations (especially division operation in FPA) lead to significant numerical errors. • CAD systems frequently fail as a result of the limited precision that is inherent to the internal representation of floating point numbers. • Any sequence of operations on a digital computer is essentially equivalent to a finite sequence of manipulations on a discrete grid of points.

**Floating Point Arithmetic** • Representation of nonnegative real numbers • The standard way to represent a nonnegative real number in decimal form • An integer part, a fractional part, and a decimal point between them. • 37.458, 0.003947 • Normalized scientific notation • Shifting the decimal point and supplying appropriate powers of 10. • 37.458 = 0.37458 x 102

**Floating Point Arithmetic** • Normalized Floating-point Representation • In Decimal System • x = ±0.d1d2d3…x10n • d1,d2… are the decimal digits • In Binary System • x = ±0.b1b2b3…x2n • b1,b2… are the binary numbers, 0,1. • In a digital computer, numbers are represented using binary system.

**Floating Point Arithmetic** • Consider a floating point number x = ±0.b1b2b3…bpx2n • b1,b2…bp are the mantissa, b1≠ 0, p is the number of significant digits, and n is an integer exponent. • Example: p=2 and -2 ≤ E ≤ 3.

**Floating Point Arithmetic** • The resulting set of FP numbers is a finite subset of the rational numbers. • They are distributed non-uniformly on the real number axis. • The most real numbers cannot be represented exactly in a computer!!!!!! • The result of a floating point calculation must often be rounded in order to fit back into its finite representation. • This is a characteristic feature of floating point computation. • Overflow/underflow occur when a number is outside the range that the computer can represent.

**Floating Point Arithmetic** • Basic questions on floating point arithmetic • What is the result of an operation when the infinitely precise result is not representable in the computer system? • Are elementary operations like multiplication and addition commutative? • What happens when we multiply two very large numbers? • What if we divide a number by zero? • What if we attempt to compute the square root of a negative number?

**Floating Point Arithmetic** • The IEEE Standard 754 ensures that operations yield the mathematically expected results with the expected properties. • It also ensures that exceptional cases yield specified results.

**IEEE Standard 754** • It specifies: • Two basic floating point formats: single and double • Single • Overall: 32 bits, Significand: 24 bits • Double • Overall: 64 bits, Significand: 53 bits • Two classes of extended floating point formats: single extended and double extended.

**IEEE Standard 754** • It specifies: • Accuracy requirements on floating point operations • Add, subtract, multiply, divide, square root, remainder, round number, compare, etc. • If no exact result can be delivered, the operation must be specified such that the operation must minimally modify the exact result according to the rules of prescribed rounding modes. • Five types of IEEE floating point exceptions • Invalid operation, division by zero, overflow, underflow, inexact.

**IEEE Standard 754** • Note on “INEXACT” exception • This is signaled whenever the ideal result of an arithmetic operation would not fit into its intended destination, so the result had to be altered by rounding it off to fit.

**IEEE Standard 754** • It specifies: • Four rounding directions: • Toward the nearest representable value, toward negative infinity, toward positive infinity, toward 0. • Rounding precision • If a system delivers results in double extended format, the user should be able to specify that such results are to be rounded to the precision of either the single or double format.

**IEEE Standard 754 Formats** • Storage Formats • It is a data structure specifying the fields that comprises a floating point numeral, the layout of those fields, and their arithmetic interpretation. • It specifies how a floating point format is stored in memory. • IEEE precision • single : float (C, C++), REAL or REAL*4 (Fortran) • double : double (C, C++), DOUBLE PRECISION or REAL * 8 (Fortran) • Double extended : long double (C, C++), REAL * 16 (Fortran)

**IEEE Standard 754 Formats** • Single Format: continuous one 32-bit word • 23-bit fraction, f • 8-bit biased exponent, e • 1-bit sign, s

**IEEE Standard 754 Formats** • Single Format: continuous one 32-bit word

**IEEE Standard 754 Formats** • Single Format: continuous one 32-bit word

**IEEE Standard 754 Format** • Double Format: two successive 32-bit words • 52-bit fraction, f • 11-bit biased exponent, e • 1-bit sign, s

**IEEE Standard 754 Format** • Double Format: two successive 32-bit words

**IEEE Standard 754 Format** • Double Format: two successive 32-bit words

**IEEE Standard 754 Format** • Double-Extended Format (for SPARC): four successive 32-bit words • 112-bit fraction, f • 15-bit biased exponent, e • 1-bit sign, s

**IEEE Standard 754 Format** • Double-Extended Format (for SPARC): four successive 32-bit words

**IEEE Standard 754 Format** • Double-Extended Format (for SPARC): four successive 32-bit words

**IEEE Standard 754 Format** • Double-Extended Format (for x86): ten successive 32-bit words • 63-bit fraction, f • 1-bit explicit leading significand bit, j • 15-bit biased exponent, e • 1-bit sign, s

**IEEE Standard 754 Format** • Double-Extended Format (for x86): ten successive 32-bit words

**IEEE Standard 754 Format** • Double-Extended Format (for x86): ten successive 32-bit words

**IEEE Standard 754 Format** • What is the number of significant decimal digits of a in the IEEE formats? Or how many decimal digits are to be trusted as accurate when one represents a in IEEE formats?

**IEEE Standard 754 Format** • Underflow • It occurs, roughly speaking, when the result of an arithmetic operation is so small that it cannot be stored in its intended destination format without suffering a rounding error that is larger than usual.

**Rational Arithmetic** • Numbers are represented in rational form. • The arithmetic is done with rational numbers without approximation. • It is often important to know the exact value, not an approximate one. • If arithmetic is done on fractions instead of on approximations to fractions, many computations can be done entirely without any accumulated rounding errors. • Exact value of 1/3 instead of 0.3333333…

**Rational Arithmetic** • Features of Rational Arithmetic • It is robust. • It is generally memory intensive and time consuming • Due to the growth of the number of digits needed to represent rational numbers.

**Rational Arithmetic** • Rational numbers can be represented as pairs of integers (u/u’), where u and u’ are relatively prime to each other and u’ > 0 . • (0/1) = 0 • Basic Operators • (u/u’) = (v/v’) if and only if u = v, u’ = v’ • (u/u’) x (v/v’) = (w/w’), where w = uv/d, w’ = u’v’/d, d = gcd(uv,u’v’). (gcd -> greatest common divisor) • Division can be performed similarly.

**Rational Arithmetic** • Basic Operators • (u/u’) ± (v/v’) = ((uv’ ± u’v)/u’v’). Then reduce this fraction by using d = gcd(uv’ ± u’v,u’v’) • Example • (7/66) + (17/12) = (67/44) • Can rational arithmetic represent all numbers?

**Interval Arithmetic** • An interval number is defined to be an ordered pair of real numbers [a,b] with a ≤ b. • A set of numbers, [a,b] = {x|a ≤ x ≤ b} • Set operators are used. • [a,a] : degenerate interval value • Equivalent to real numbers • It is also called the range arithmetic. • An upper and a lower bounds on each exact number are maintained during the calculations.

**Interval Arithmetic** • The details of interval arithmetic will be introduced in the next lecture!!!