CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246: Computer Arithmetic Algorithms and Hardware Design Fall 2006 Lecture 10 Floating Point Number Rounding, Polynomial Expression Instructor: Prof. Chung-Kuan Cheng

Topics: • Rounding F.P. Numbers • Polynomial Expression

Rounding the numbers • Why we need the • Guard bit • Round bit • Sticky bit

Example 1 1.00000 24 -1.10000 2-3 Normalize according to exponent 1.00000 24 -0.00000011 24 0.11111101 24 Renormalize 1.1111101x23 Result = 1.11111x23 Sticky Bit Round bit Take 5 bits after decimal

Rounding • We need only one guard bit for normalization after addition. • Assumption: Operands are normalized. • Why?

Example 2 1.00001 23 -1.01011 2-1 Normalize according to exponent 1.00000 23 -0.000101011 23 0.111100101 23 Renormalize 1.11100101 22 Result = 1.11101 22 Bit on the boundary Round bit Take 5 bits after decimal Non-zero => round-up

Theory behind it • When shifting right, don’t need to remember anything more than 3 bits below • This is a necessaryand sufficient condition g r guard Other bits round OR Sticky bit

Polynomial Approximation of Functions

Taylor Series f(x) = f(x0) + Example: sin(x) = x – x3/3! + x5/5! – x7/7!+…

Taylor Series Given: PN(x) = = c0+x(c1+x(c2+…+x(cN-1+xcN))))) R(N) =cN R(i-1) =ci-1+xR(i) … PN (X) =R(0) How to calculate value of function? Recursively Group common factors …. N multiples and adds

Taylor Series • 1 adder => do it in series • Given more components => can we go faster? • Take N = 7 as example c7x7+c6x6+c5x5+c4x4+c3x3+c2x2+c1x1+c0 How to accelerate?

Taylor Series c7x7+c6x6+c5x5+c4x4+c3x3+c2x2+c1x1+c0 • Use 3 stages to generate xk • Use xk to generate the polynominal expression. x x x x x x x x x2 + + x3 + + x4 Carry-save =constant time x5 x6 x7 + + + Log n

Taylor Series c7x+c6 c5x+c4x c3x+c2 c1x+c0 x2(c7x+c6)+c5x+c4x x2(c3x+c2)+c1x+c0 x4[x2(c7x+c6)+c5x+c4x]+x2(c3x+c2)+c1x+c0 • This is a bit faster. Only 2 stages • But what is fastest way to produce result? & energy efficient? => minimize[# of multiplies] • All this uses +’s and x’s. Need to get rid of them. => Let’s to try table look-up x x2 x4

Taylor Series – Table look-up • SRAM/DRAM => eat power • ROM => better option f(x) = • Suppose there is a table as a binary tree. • Let x = xH + xL x0 = xH Example X = 110101 xH = 110000 f(xH + xL) = xL = 000101

Taylor Series – Table look-up • 1st order f(xH + xL) ~= => Only 1 multiplication !!! f(xH) Table-1 xH f’(xH) + f(xH + xL) x Table-2 x xL

Taylor Series • With extra order => 1 Extra table and 1 multiplier • If you wish to change the function, all you have to do is just change the content of the table • Problem? => Now it’s the size of the table! 2^L L /

Taylor Series • Let’s reduce X into 3 sections (instead of the previous 2 (High and Low) ) x = x1+x22-k+x32-2k => f(x)= f(x1+x22-k)+x32-2k f ’(x1) + Epsilon E ~= 2-3k f(x) requires a 2n x Vn table 2n: # of bits of x Vn: # bits of f(x) 32bit x => 232 x 232 = 264 64 bits -> HUGE!! -> but do we really need all those #’s in the table??

Taylor Series Let E = epsilon, [] = Lower limit x*y = (x+y)2 / 4 – (x-y)2 / 4 = ( [(x+y)/2] + E/2 )2 - ( [(x-y)/2] + E/2 )2 = [ (x+y)/2 ]2 - [ (x-y)/2 ]2 - E * y x Content of lower bits determines lower bits of result, but not other bits !! ……… Table x2 ………

Taylor Series • 2n x V vs. 2n x (v-w ) + 2L x w 2n x v – (2n x w - 2L x w ) 2n x v – w (2n - 2L ) Size of table is reduced by 2n x v n v x / / f(x) 2n x (v-w) v-w n x / / f(x) w 2L x w L / /

CSE 246: Computer Arithmetic Algorithms and Hardware Design