 Download Download Presentation Number Representation Part 2 Floating Point Representations Rounding

# Number Representation Part 2 Floating Point Representations Rounding

Download Presentation ## Number Representation Part 2 Floating Point Representations Rounding

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. ECE 645: Lecture 5 Number Representation Part 2 Floating Point Representations Rounding Representation of the Galois Field elements

2. Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 17, Floating-Point Representations Chapter 17.5, Rounding schemes Rounding Algorithms 101 http://www.eetimes.com/document.asp?doc_id=1274515

3. Floating Point Representations

4. The ANSI/IEEE standard floating-point number representation formats Originally IEEE 754-1985. Superseded by IEEE 754-2008 Standard. - -

5. Table 17.1 Some features of the ANSI/IEEE standard floatingpoint number representation formats

6. 1.f 2e f = 0: Representation of  f 0: Representation of NaNs f = 0: Representation of 0 f 0: Representation of denormals, 0.f 2–126 Exponent Encoding Exponent encoding in 8 bits for the single/short (32-bit) ANSI/IEEE format 0 1 126 127 128 254 255 Decimal code 00 01 7E 7F 80 FE FF Hex code Exponent value –126 –1 0 +1 +127

7. Fig. 17.4 Denormals in the IEEE single-precision format.

8. New IEEE 754-2008 Standard Basic Formats

9. New IEEE 754-2008 Standard Binary Interchange Formats

10. Exact result 1 + 2-1 + 2-23 + 2-24 Requirements for Arithmetic Results of the 4 basic arithmetic operations (+, -, , ) as well as square-rooting must match those obtained if all intermediate computations were infinitely precise That is, a floating-point arithmetic operation should introduce no more imprecision than the error attributable to the final rounding of a result that has no exact representation (this is the best possible) Example: (1 + 2-1)  (1 + 2-23 ) Rounded result 1 + 2-1 + 2-22 Error = ½ ulp

11. Rounding 101

12. Rounding Modes The IEEE 754-2008 standard includes five rounding modes: Default: Round to nearest, ties to even (rtne) Optional: Round to nearest, ties away from 0 (rtna) Round toward zero (inward) Round toward + (upward) Round toward – (downward)

13. Rounding Rounding occurs when we want to approximate a more precise number (i.e. more fractional bits L) with a less precise number (i.e. fewer fractional bits L') Example 1: old: 000110.11010001 (K=6, L=8) new: 000110.11 (K'=6, L'=2) Example 2: old: 000110.11010001 (K=6, L=8) new: 000111. (K'=6, L'=0) The following pages show rounding from L>0 fractional bits to L'=0 bits, but the mathematics hold true for any L' < L Usually, keep the number of integral bits the same K'=K

14. Whole part Fractional part xk–1xk–2 . . . x1x0.x–1x–2 . . . x–lyk–1yk–2 . . . y1y0 Round Rounding Equation • y = round(x)

15. Rounding Techniques • There are different rounding techniques: • 1) truncation • results in round towards zero in signed magnitude • results in round towards -∞ in two's complement • 2) round to nearest number • 3) round to nearest even number (or odd number) • 4) round towards +∞ • Other rounding techniques • 5) jamming or von Neumann • 6) ROM rounding • Each of these techniques will differ in their error depending on representation of numbers i.e. signed magnitude versus two's complement • Error = round(x) – x

16. xk–1xk–2 . . . x1x0.x–1x–2 . . . x–lxk–1xk–2 . . . x1x0 trunc ulp 1) Truncation • Truncation in signed-magnitude results in a number chop(x) that is always of smaller magnitude than x. This is called round towards zero or inward rounding • 011.10 (3.5)10 011 (3)10 • Error = -0.5 • 111.10 (-3.5)10 111 (-3)10 • Error = +0.5 • Truncation in two's complement results in a number chop(x) that is always smaller than x. This is called round towards -∞ or downward-directed rounding • 011.10 (3.5)10 011 (3)10 • Error = -0.5 • 100.10 (-3.5)10 100 (-4)10 • Error = -0.5 The simplest possible rounding scheme: chopping or truncation

17. x chop( ) 4 4 3 3 2 2 1 1 x x – 4 – 3 – 2 – 1 1 2 3 4 – 4 – 3 – 2 – 1 1 2 3 4 – 1 – 1 – 2 – 2 – 3 – 3 – 4 – 4 Truncation Function Graph: chop(x) x chop( ) Fig. 17.5 Truncation or chopping of a signed-magnitude number (same as round toward 0). Fig. 17.6 Truncation or chopping of a 2’s-complement number (same as round to -∞).

18. Bias in two's complement truncation • Assuming all combinations of positive and negative values of x equally possible, average error is -0.375 • In general, average error = -(2-L'-2-L )/2, where L' = new number of fractional bits

19. Implementation truncation in hardware • Easy, just ignore (i.e. truncate) the fractional digits from L to L'+1 xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L = yk-1 yk-2 .. y1 y0. ignore (i.e. truncate the rest)

20. 2) Round to nearest number • Rounding to nearest number what we normally think of when say round • 010.01 (2.25)10 010 (2)10 • Error = -0.25 • 010.11 (2.75)10 011 (3)10 • Error = +0.25 • 010.00 (2.00)10 010 (2)10 • Error = +0.00 • 010.10 (2.5)10 011 (3)10 • Error = +0.5 [round-half-up (arithmetic rounding)] • 010.10 (2.5)10 010 (2)10 • Error = -0.5 [round-half-down]

21. Round-half-up: dealing with negative numbers • Rounding to nearest number what we normally think of when say round • 101.11 (-2.25)10 110 (-2)10 • Error = +0.25 • 101.01 (-2.75)10 101 (-3)10 • Error = -0.25 • 110.00 (-2.00)10 110 (-2)10 • Error = +0.00 • 101.10 (-2.5)10 110 (-2)10 • Error = +0.5 [asymmetric implementation] • 101.10 (-2.5)10 101 (-3)10 • Error = -0.5 [symmetric implementation]

22. Round to Nearest Function Graph: rtn(x)Round-half-up version Symmetric implementation Asymmetric implementation

23. Bias in two's complement round to nearestRound-half-up asymmetric implementation • Assuming all combinations of positive and negative values of x equally possible, average error is +0.125 • Smaller average error than truncation, but still not symmetric error • We have a problem with the midway value, i.e. exactly at 2.5 or -2.5 leads to positive error bias always • Also have the problem that you can get overflow if only allocate K' = K integral bits • Example: rtn(011.10)  overflow • This overflow only occurs on positive numbers near the maximum positive value, not on negative numbers

24. Implementing round to nearest (rtn) in hardware Round-half-up asymmetric implementation • Two methods • Method 1: Add '1' in position one digit right of new LSB (i.e. digit L'+1) and keep only L' fractional bits xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L + 1 = yk-1 yk-2 .. y1 y0. y-1 • Method 2: Add the value of the digit one position to right of new LSB (i.e. digit L'+1) into the new LSB digit (i.e. digit L) and keep only L' fractional bits xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L + x-1 yk-1 yk-2 .. y1 y0. ignore (i.e. truncate the rest) ignore (i.e truncate the rest)

25. Fig. 17.9 R* rounding or rounding to the nearest odd number. Round to Nearest Even Function Graph: rtne(x) • To solve the problem with the midway value we implement round to nearest-even number (or can round to nearest odd number) Fig. 17.8 Rounding to the nearest even number.

26. Bias in two's complement round to nearest even (rtne) • average error is now 0 (ignoring the overflow) • cost: more hardware

27. 4) Rounding towards infinity • We may need computation errors to be in a known direction • Example: in computing upper bounds, larger results are acceptable, but results that are smaller than correct values could invalidate upper bound • Use upward-directed rounding (round toward +∞) • up(x) always larger than or equal to x • Similarly for lower bounds, use downward-directed rounding (round toward -∞) • down(x) always smaller than or equal to x • We have already seen that round toward -∞ in two's complement can be implemented by truncation

28. Rounding Toward Infinity Function Graph: up(x) and down(x) up(x) down(x) down(x) can be implemented by chop(x) intwo's complement

29. x inward( ) 4 3 2 1 x – 4 – 3 – 2 – 1 1 2 3 4 – 1 – 2 – 3 – 4 Two's Complement Round to Zero • Two's complement round to zero (inward rounding) also exists

30. Other Methods • Note that in two's complement round to nearest (rtn) involves an addition which may have a carry propagation from LSB to MSB • Rounding may take as long as an adder takes • Can break the adder chain using the following two techniques: • Jamming or von Neumann • ROM-based

31. 5) Jamming or von Neumann Chop and force the LSB of the result to 1 Simplicity of chopping, with the near-symmetry or ordinary rounding Max error is comparable to chopping (double that of rounding) - - - - - - - - - -

32. xk–1 . . . x4x3x2x1x0.x–1x–2 . . . x–lxk–1 . . . x4y3y2y1y0 ROM ROM address ROM data 6) ROM Rounding Fig. 17.11 ROM rounding with an 8  2 table. Example: Rounding with a 32  4 table Rounding result is the same as that of the round to nearest scheme in 31 of the 32 possible cases, but a larger error is introduced when x3 = x2 = x1 = x0 = x–1 = 1 - - - - - - - - - -

33. Representation of the Galois Field elements

34. Evariste Galois (1811-1832)

35. Evariste Galois (1811-1832) Studied the problem of finding algebraic solutions for the general equations of the degree  5, e.g., f(x) = a5x5+ a4x4+ a3x3+ a2x2+ a1x+ a0 = 0 Answered definitely the question which specific equations of a given degree have algebraic solutions. On the way, he developed group theory, one of the most important branches of modern mathematics.

36. Evariste Galois (1811-1832) • Galois submits his results for the first time to • the French Academy of Sciences • Reviewer 1 • Augustin-Luis Cauchy forgot or lost the communication. • 1830 Galois submits the revised version of his manuscript, • hoping to enter the competition for the Grand Prize • in mathematics • Reviewer 2 • Joseph Fourier – died shortly after receiving the manuscript. • 1831 Third submission to the French Academy of Sciences • Reviewer 3 • Simeon-Denis Poisson – did not understand the manuscript • and rejected it.

37. Evariste Galois (1811-1832) • May 1832 Galois provoked into a duel • The night before the duel he wrote a letter to his friend • containing the summary of his discoveries. • The letter ended with a plea: • “Eventually there will be, I hope, some people who • will find it profitable to decipher this mess.” • May 30, 1832 Galois was grievously wounded in the duel and died • in the hospital the following day. • Galois manuscript rediscovered by Joseph Liouville • 1846 Galois manuscript published for • the first time in a mathematical journal.

38. Field Set F, and two operations typically denoted by (but not necessarily equivalent to) + and * Set F, and definitions of these two operations must fulfill special conditions.

39. Examples of fields Infinite fields { R= set of real numbers, + addition of real numbers * multiplication of real numbers } Finite fields { set Zp={0, 1, 2, … , p-1}, + (mod p): addition modulo p, * (mod p): multiplication modulo p }

40. Finite Fields = Galois Fields p – prime pm – number of elements in the field GF(pm) GF(2m) GF(p) Most significant special cases Arithmetic operations present in many libraries Normal basis representation Polynomial basis representation Fast in hardware Fast squaring