Create Presentation
Download Presentation

525 Views
Download Presentation

Download Presentation
## Number Representation Part 2 Floating Point Representations Rounding

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**ECE 645: Lecture 5**Number Representation Part 2 Floating Point Representations Rounding Representation of the Galois Field elements**Required Reading**Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 17, Floating-Point Representations Chapter 17.5, Rounding schemes Rounding Algorithms 101 http://www.eetimes.com/document.asp?doc_id=1274515**The ANSI/IEEE standard floating-point number representation**formats Originally IEEE 754-1985. Superseded by IEEE 754-2008 Standard. - -**Table 17.1 Some features of the ANSI/IEEE standard**floatingpoint number representation formats**1.f 2e**f = 0: Representation of f 0: Representation of NaNs f = 0: Representation of 0 f 0: Representation of denormals, 0.f 2–126 Exponent Encoding Exponent encoding in 8 bits for the single/short (32-bit) ANSI/IEEE format 0 1 126 127 128 254 255 Decimal code 00 01 7E 7F 80 FE FF Hex code Exponent value –126 –1 0 +1 +127**New IEEE 754-2008 Standard**Basic Formats**New IEEE 754-2008 Standard**Binary Interchange Formats**Exact result 1 + 2-1 + 2-23 + 2-24**Requirements for Arithmetic Results of the 4 basic arithmetic operations (+, -, , ) as well as square-rooting must match those obtained if all intermediate computations were infinitely precise That is, a floating-point arithmetic operation should introduce no more imprecision than the error attributable to the final rounding of a result that has no exact representation (this is the best possible) Example: (1 + 2-1) (1 + 2-23 ) Rounded result 1 + 2-1 + 2-22 Error = ½ ulp**Rounding Modes**The IEEE 754-2008 standard includes five rounding modes: Default: Round to nearest, ties to even (rtne) Optional: Round to nearest, ties away from 0 (rtna) Round toward zero (inward) Round toward + (upward) Round toward – (downward)**Rounding**Rounding occurs when we want to approximate a more precise number (i.e. more fractional bits L) with a less precise number (i.e. fewer fractional bits L') Example 1: old: 000110.11010001 (K=6, L=8) new: 000110.11 (K'=6, L'=2) Example 2: old: 000110.11010001 (K=6, L=8) new: 000111. (K'=6, L'=0) The following pages show rounding from L>0 fractional bits to L'=0 bits, but the mathematics hold true for any L' < L Usually, keep the number of integral bits the same K'=K**Whole part**Fractional part xk–1xk–2 . . . x1x0.x–1x–2 . . . x–lyk–1yk–2 . . . y1y0 Round Rounding Equation • y = round(x)**Rounding Techniques**• There are different rounding techniques: • 1) truncation • results in round towards zero in signed magnitude • results in round towards -∞ in two's complement • 2) round to nearest number • 3) round to nearest even number (or odd number) • 4) round towards +∞ • Other rounding techniques • 5) jamming or von Neumann • 6) ROM rounding • Each of these techniques will differ in their error depending on representation of numbers i.e. signed magnitude versus two's complement • Error = round(x) – x**xk–1xk–2 . . . x1x0.x–1x–2 . . . x–lxk–1xk–2**. . . x1x0 trunc ulp 1) Truncation • Truncation in signed-magnitude results in a number chop(x) that is always of smaller magnitude than x. This is called round towards zero or inward rounding • 011.10 (3.5)10 011 (3)10 • Error = -0.5 • 111.10 (-3.5)10 111 (-3)10 • Error = +0.5 • Truncation in two's complement results in a number chop(x) that is always smaller than x. This is called round towards -∞ or downward-directed rounding • 011.10 (3.5)10 011 (3)10 • Error = -0.5 • 100.10 (-3.5)10 100 (-4)10 • Error = -0.5 The simplest possible rounding scheme: chopping or truncation**x**chop( ) 4 4 3 3 2 2 1 1 x x – 4 – 3 – 2 – 1 1 2 3 4 – 4 – 3 – 2 – 1 1 2 3 4 – 1 – 1 – 2 – 2 – 3 – 3 – 4 – 4 Truncation Function Graph: chop(x) x chop( ) Fig. 17.5 Truncation or chopping of a signed-magnitude number (same as round toward 0). Fig. 17.6 Truncation or chopping of a 2’s-complement number (same as round to -∞).**Bias in two's complement truncation**• Assuming all combinations of positive and negative values of x equally possible, average error is -0.375 • In general, average error = -(2-L'-2-L )/2, where L' = new number of fractional bits**Implementation truncation in hardware**• Easy, just ignore (i.e. truncate) the fractional digits from L to L'+1 xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L = yk-1 yk-2 .. y1 y0. ignore (i.e. truncate the rest)**2) Round to nearest number**• Rounding to nearest number what we normally think of when say round • 010.01 (2.25)10 010 (2)10 • Error = -0.25 • 010.11 (2.75)10 011 (3)10 • Error = +0.25 • 010.00 (2.00)10 010 (2)10 • Error = +0.00 • 010.10 (2.5)10 011 (3)10 • Error = +0.5 [round-half-up (arithmetic rounding)] • 010.10 (2.5)10 010 (2)10 • Error = -0.5 [round-half-down]**Round-half-up: dealing with negative numbers**• Rounding to nearest number what we normally think of when say round • 101.11 (-2.25)10 110 (-2)10 • Error = +0.25 • 101.01 (-2.75)10 101 (-3)10 • Error = -0.25 • 110.00 (-2.00)10 110 (-2)10 • Error = +0.00 • 101.10 (-2.5)10 110 (-2)10 • Error = +0.5 [asymmetric implementation] • 101.10 (-2.5)10 101 (-3)10 • Error = -0.5 [symmetric implementation]**Round to Nearest Function Graph: rtn(x)Round-half-up version**Symmetric implementation Asymmetric implementation**Bias in two's complement round to nearestRound-half-up**asymmetric implementation • Assuming all combinations of positive and negative values of x equally possible, average error is +0.125 • Smaller average error than truncation, but still not symmetric error • We have a problem with the midway value, i.e. exactly at 2.5 or -2.5 leads to positive error bias always • Also have the problem that you can get overflow if only allocate K' = K integral bits • Example: rtn(011.10) overflow • This overflow only occurs on positive numbers near the maximum positive value, not on negative numbers**Implementing round to nearest (rtn) in hardware**Round-half-up asymmetric implementation • Two methods • Method 1: Add '1' in position one digit right of new LSB (i.e. digit L'+1) and keep only L' fractional bits xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L + 1 = yk-1 yk-2 .. y1 y0. y-1 • Method 2: Add the value of the digit one position to right of new LSB (i.e. digit L'+1) into the new LSB digit (i.e. digit L) and keep only L' fractional bits xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L + x-1 yk-1 yk-2 .. y1 y0. ignore (i.e. truncate the rest) ignore (i.e truncate the rest)**Fig. 17.9 R* rounding or rounding to the nearest odd**number. Round to Nearest Even Function Graph: rtne(x) • To solve the problem with the midway value we implement round to nearest-even number (or can round to nearest odd number) Fig. 17.8 Rounding to the nearest even number.**Bias in two's complement round to nearest even (rtne)**• average error is now 0 (ignoring the overflow) • cost: more hardware**4) Rounding towards infinity**• We may need computation errors to be in a known direction • Example: in computing upper bounds, larger results are acceptable, but results that are smaller than correct values could invalidate upper bound • Use upward-directed rounding (round toward +∞) • up(x) always larger than or equal to x • Similarly for lower bounds, use downward-directed rounding (round toward -∞) • down(x) always smaller than or equal to x • We have already seen that round toward -∞ in two's complement can be implemented by truncation**Rounding Toward Infinity Function Graph: up(x) and down(x)**up(x) down(x) down(x) can be implemented by chop(x) intwo's complement**x**inward( ) 4 3 2 1 x – 4 – 3 – 2 – 1 1 2 3 4 – 1 – 2 – 3 – 4 Two's Complement Round to Zero • Two's complement round to zero (inward rounding) also exists**Other Methods**• Note that in two's complement round to nearest (rtn) involves an addition which may have a carry propagation from LSB to MSB • Rounding may take as long as an adder takes • Can break the adder chain using the following two techniques: • Jamming or von Neumann • ROM-based**5) Jamming or von Neumann**Chop and force the LSB of the result to 1 Simplicity of chopping, with the near-symmetry or ordinary rounding Max error is comparable to chopping (double that of rounding) - - - - - - - - - -**xk–1 . . . x4x3x2x1x0.x–1x–2 . . . x–lxk–1 . . .**x4y3y2y1y0 ROM ROM address ROM data 6) ROM Rounding Fig. 17.11 ROM rounding with an 8 2 table. Example: Rounding with a 32 4 table Rounding result is the same as that of the round to nearest scheme in 31 of the 32 possible cases, but a larger error is introduced when x3 = x2 = x1 = x0 = x–1 = 1 - - - - - - - - - -**Representation**of the Galois Field elements**Evariste Galois (1811-1832)**Studied the problem of finding algebraic solutions for the general equations of the degree 5, e.g., f(x) = a5x5+ a4x4+ a3x3+ a2x2+ a1x+ a0 = 0 Answered definitely the question which specific equations of a given degree have algebraic solutions. On the way, he developed group theory, one of the most important branches of modern mathematics.**Evariste Galois (1811-1832)**• Galois submits his results for the first time to • the French Academy of Sciences • Reviewer 1 • Augustin-Luis Cauchy forgot or lost the communication. • 1830 Galois submits the revised version of his manuscript, • hoping to enter the competition for the Grand Prize • in mathematics • Reviewer 2 • Joseph Fourier – died shortly after receiving the manuscript. • 1831 Third submission to the French Academy of Sciences • Reviewer 3 • Simeon-Denis Poisson – did not understand the manuscript • and rejected it.**Evariste Galois (1811-1832)**• May 1832 Galois provoked into a duel • The night before the duel he wrote a letter to his friend • containing the summary of his discoveries. • The letter ended with a plea: • “Eventually there will be, I hope, some people who • will find it profitable to decipher this mess.” • May 30, 1832 Galois was grievously wounded in the duel and died • in the hospital the following day. • Galois manuscript rediscovered by Joseph Liouville • 1846 Galois manuscript published for • the first time in a mathematical journal.**Field**Set F, and two operations typically denoted by (but not necessarily equivalent to) + and * Set F, and definitions of these two operations must fulfill special conditions.**Examples of fields**Infinite fields { R= set of real numbers, + addition of real numbers * multiplication of real numbers } Finite fields { set Zp={0, 1, 2, … , p-1}, + (mod p): addition modulo p, * (mod p): multiplication modulo p }**Finite Fields = Galois Fields**p – prime pm – number of elements in the field GF(pm) GF(2m) GF(p) Most significant special cases Arithmetic operations present in many libraries Normal basis representation Polynomial basis representation Fast in hardware Fast squaring