1 / 19

CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246: Computer Arithmetic Algorithms and Hardware Design. Fall 2006 Lecture 10 Floating Point Number Rounding, Polynomial Expression. Instructor: Prof. Chung-Kuan Cheng. Topics:. Rounding F.P. Numbers Polynomial Expression. Rounding the numbers. Why we need the Guard bit Round bit

genica
Download Presentation

CSE 246: Computer Arithmetic Algorithms and Hardware Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 246: Computer Arithmetic Algorithms and Hardware Design Fall 2006 Lecture 10 Floating Point Number Rounding, Polynomial Expression Instructor: Prof. Chung-Kuan Cheng

  2. Topics: • Rounding F.P. Numbers • Polynomial Expression

  3. Rounding the numbers • Why we need the • Guard bit • Round bit • Sticky bit

  4. Example 1 1.00000 24 -1.10000 2-3 Normalize according to exponent 1.00000 24 -0.00000011 24 0.11111101 24 Renormalize 1.1111101x23 Result = 1.11111x23 Sticky Bit Round bit Take 5 bits after decimal

  5. Rounding • We need only one guard bit for normalization after addition. • Assumption: Operands are normalized. • Why?

  6. Example 2 1.00001 23 -1.01011 2-1 Normalize according to exponent 1.00000 23 -0.000101011 23 0.111100101 23 Renormalize 1.11100101 22 Result = 1.11101 22 Bit on the boundary Round bit Take 5 bits after decimal Non-zero => round-up

  7. Theory behind it • When shifting right, don’t need to remember anything more than 3 bits below • This is a necessaryand sufficient condition g r guard Other bits round OR Sticky bit

  8. Polynomial Approximation of Functions

  9. Taylor Series f(x) = f(x0) + Example: sin(x) = x – x3/3! + x5/5! – x7/7!+…

  10. Taylor Series Given: PN(x) = = c0+x(c1+x(c2+…+x(cN-1+xcN))))) R(N) =cN R(i-1) =ci-1+xR(i) … PN (X) =R(0) How to calculate value of function? Recursively Group common factors …. N multiples and adds

  11. Taylor Series • 1 adder => do it in series • Given more components => can we go faster? • Take N = 7 as example c7x7+c6x6+c5x5+c4x4+c3x3+c2x2+c1x1+c0 How to accelerate?

  12. Taylor Series c7x7+c6x6+c5x5+c4x4+c3x3+c2x2+c1x1+c0 • Use 3 stages to generate xk • Use xk to generate the polynominal expression. x x x x x x x x x2 + + x3 + + x4 Carry-save =constant time x5 x6 x7 + + + Log n

  13. Taylor Series c7x+c6 c5x+c4x c3x+c2 c1x+c0 x2(c7x+c6)+c5x+c4x x2(c3x+c2)+c1x+c0 x4[x2(c7x+c6)+c5x+c4x]+x2(c3x+c2)+c1x+c0 • This is a bit faster. Only 2 stages • But what is fastest way to produce result? & energy efficient? => minimize[# of multiplies] • All this uses +’s and x’s. Need to get rid of them. => Let’s to try table look-up x x2 x4

  14. Taylor Series – Table look-up • SRAM/DRAM => eat power • ROM => better option f(x) = • Suppose there is a table as a binary tree. • Let x = xH + xL x0 = xH Example X = 110101 xH = 110000 f(xH + xL) = xL = 000101

  15. Taylor Series – Table look-up • 1st order f(xH + xL) ~= => Only 1 multiplication !!! f(xH) Table-1 xH f’(xH) + f(xH + xL) x Table-2 x xL

  16. Taylor Series • With extra order => 1 Extra table and 1 multiplier • If you wish to change the function, all you have to do is just change the content of the table • Problem? => Now it’s the size of the table! 2^L L /

  17. Taylor Series • Let’s reduce X into 3 sections (instead of the previous 2 (High and Low) ) x = x1+x22-k+x32-2k => f(x)= f(x1+x22-k)+x32-2k f ’(x1) + Epsilon E ~= 2-3k f(x) requires a 2n x Vn table 2n: # of bits of x Vn: # bits of f(x) 32bit x => 232 x 232 = 264 64 bits -> HUGE!! -> but do we really need all those #’s in the table??

  18. Taylor Series Let E = epsilon, [] = Lower limit x*y = (x+y)2 / 4 – (x-y)2 / 4 = ( [(x+y)/2] + E/2 )2 - ( [(x-y)/2] + E/2 )2 = [ (x+y)/2 ]2 - [ (x-y)/2 ]2 - E * y x Content of lower bits determines lower bits of result, but not other bits !! ……… Table x2 ………

  19. Taylor Series • 2n x V vs. 2n x (v-w ) + 2L x w 2n x v – (2n x w - 2L x w ) 2n x v – w (2n - 2L ) Size of table is reduced by 2n x v n v x / / f(x) 2n x (v-w) v-w n x / / f(x) w 2L x w L / /

More Related