1 / 37

Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation. Tor Aamodt and Paul Chow University of Toronto. Presentation Outline. Background / Motivation Floating-to-Fixed-Point Conversion Architectural Support Experimental Results Summary / Future Directions.

emmet
Download Presentation

Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation Tor Aamodt and Paul Chow University of Toronto

  2. Presentation Outline • Background / Motivation • Floating-to-Fixed-Point Conversion • Architectural Support • Experimental Results • Summary / Future Directions Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  3. Background:University of Toronto DSP Project • Motivation: DSP Compiler/Architecture Co-design • First Generation Silicon (Sean Peng’s M.A.Sc. Thesis) taped- out Sept. 30, 1999: 108 pin PGA / 0.35 µm CMOS / 63 MHz • 16-bit Fixed-Point VLIW with Two-Level Instruction Fetching • Harvard Memory Architecture • 5 stage pipeline: IF1  IF2  ID  EX  WB • 7 function units: • 2 integer units: 16.0 multiply & 1.15 multiply operations • 2 address units: modulo addressing • 2 memory units: each tied to one data memory bank • 1 control unit Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  4. sign bit 8 bit exponent (excess 127) 23+1 bit normalized mantissa IWL sign bit integer part fractional part Background:Fixed-Point versus Floating-Point 32 bit Floating-Point (IEEE): Fixed-Point: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  5. Function Unit Cost significantly less This factor motivates us to find ways of coping with the shortcomings of fixed-point representations Dynamic Range of |x| [0,2IWL) (2-126, 2127) Precision of x: |x / x| x -1 2(1+IWL - WL) 2-23 Background:Fixed-Point versus Floating-Point Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  6. Motivation • Why convert floating-point code to fixed-point code? Saves area and power. • Why automate the process? Manual conversion is time-consuming and error-prone. • What qualities are we looking for in an automated conversion system? Good signal quality*. Fast code. Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  7. + an infinitesimally small number. Why? e.g.  log22  = 1 Input, program variable, intermediate result, output For all definitions of , and all inputs x Background:Fixed-point Numerical Representations in Signal Processing • Consider a program P with associated inputs x(k)  SP. Example: P an IIR filter, SPthe set of all human speech samples x(k). • Signal Scaling: Integer Word Length (IWL) • definition: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  8. Addition / Subtraction Overflow Guard Bits A: >> 1  (+1) B: n IWLA A: IWLB B: IWLA+ IWLB ??? A*B: Background:Fixed-Point Arithmetic Operations >> n (binary point alignment) Multiplication Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  9. Presentation Outline • Background Material / Motivation • Floating-to-Fixed-Point Conversion • Architecture Support • Experimental Results • Summary / Future Directions Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  10. Conversion Process:Previous Work • ‘Worst-Case Evaluation’: Markus Willems et. al. FRIDGE: An Interactive Code Generation Environment for HW/SW CoDesign. ICASSP, April 1997. • A ‘Statistical’ Approach: Ki-Il Kum, Jiyang Kang, and Wonyong Sung. A Floating-Point to Fixed-Point C Converter for Fixed-Point Digital Signal Processors. In Proc. 2nd SUIF Compiler Workshop, August 1997. Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  11. Conversion Process: Overview “sin(x)”  “utdsp_sin(x)” float *p, x, y, A[N], B[N]; for( int i=0; i < N; i++ ){ p = (condition) ? A : B; y += x*p[i]; } float fubar( float *p ) { float sum = 0.0; for( int i=0; i < N; i++) sum += p[i]; } Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  12. Consider the ANSI C code: float a, b, x[N]; y = a*x[i] + b*x[i+1]; tmp_1 = a*x[i]; tmp_2 = b*x[i+1]; y = tmp_1 * tmp_2; * a Equivalent Expression Tree: ID Assignment: * x[i] “1” : tmp_1 y + b “0” : x[i+1] “2” : tmp_2 Conversion Process:Collecting Dynamic Range Information Code Instrumentation: profile(tmp_1,1); profile(tmp_2,2); profile(y,0); fin Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  13. int a, b, x[N]; y = a•x[i] >> 2 + b•x[i+1]; 1. Type Conversion 2. Scaling Operations 3. Fractional Fixed-Point Operations Conversion Process:Desired Result Continuation of Previous Example : float a, b, x[N]; y = a*x[i] + b*x[i+1]; Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  14. Conversion Process:Type Conversion / Scaling Operation Generation • Type conversion: {float, double} int • Scaling Operations are added to expression trees using a post-order traversal... • Two previous algorithms from the literature for generating scaling operations... • Neither use Intermediate Result Profile data, instead, they combine range information from leaf nodes in a bottom-up fashion. • Is Useful Information Lost? Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  15. Conversion Process:IRP: Using Intermediate ResultProfileData • ‘Worst-Case Evaluation’: Markus Willems et. al. FRIDGE: An Interactive Code Generation Environment for HW/SW CoDesign. ICASSP, April 1997. • A ‘Statistical’ Approach: Ki-Il Kum, Jiyang Kang, and Wonyong Sung. A Floating-Point to Fixed-Point C Converter for Fixed-Point Digital Signal Processors. In Proc. 2nd SUIF Compiler Workshop, August 1997. • UTDSP Algorithms: IRP, IRP-SA • Each node  has a measured IWL and a current IWL • Measured: IWL as determined by profiling • Current: IWL due to scaling operations within  Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  16. Example: “A op B”:  IWLA op B measured IWLA op B current   IWLA measured IWLA current ? IWLB measured IWLB current   op Converted Sub-Expressions A B Scaling Operation Generation Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  17. For example, assume |A| > |B|, and IWLA+B measured  IWLA measured “A ± B” A: B: >> n n IRP: Additive Operations “A  B”  “(A << nA)  (B >> [n-nB])” where: nA = IWLA current - IWLA measured nB = IWLA current - IWLB measured n = IWLA measured - IWLB measured IWLA+B current = IWLA measured Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  18. “A • B”  “(A << nA) • (B << nB)” where: nA = IWLA current - IWLA measured nB = IWLA current - IWLB measured  IWLA•B current = nA + nB Note: Typoin Notes! IRP: Multiplication IWLA•B current =IWLA measured+ IWLB measured Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  19. Question: Is information discarded unnecessarily here? Answer: Yes!Consider the following alternative: y = (a*x[i]<<1) + b*x[i+1] Assuming 2’s-complement arithmetic, this expression results in a more precise answer. IRP-SA: Using ‘Shift Absorption’ Problem: y = (a*x[i] + b*x[i+1]>>1) << 1 Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  20. Presentation Outline • Background Material / Motivation • Floating-to-Fixed-Point Conversion • Architecture Support • Experimental Results • Summary / Future Directions Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  21. Fractional Multiplication with integrated Left Shift: A: Left Shift B: A*B: Architectural Support Common occurrence (using IRP-SA): A•B << n Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  22. Presentation Outline • Background Material / Motivation • Floating-to-Fixed-Point Conversion • Architecture Support • Experimental Results • Summary / Future Directions Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  23. Experimental Results • Four test-cases presented in paper: (1) 4th Order IIR Filter (2) 1024 Point Radix 2 Decimation in Time FFT (3) Nonlinear Feedback Control System (4) 16th Order Lattice Filter • Look at (1) in detail, summarize results for others. • Explore some interesting properties exhibited in (4) that are indicative of possible future improvements. Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  24. 20 0 -20 Magnitude (dB) -40 -60 -80 -100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 ´ p Normalized Frequency ( rad/sample) 100 0 Phase (degrees) -100 -200 -300 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ´ p Normalized Frequency ( rad/sample) Experimental Results:4th Order IIR Filter • 4th Order Chebyshev Type II Low-Pass Filter • Designed using MATLAB’s cheby2 command • Transfer Function: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  25. 14 Bit 16 Bit Algorithm w/o FMLS w/ FMLS w/o FMLS w/ FMLS SNU-4 44.7 dB 56.4 dB 56.4 dB 44.7 dB 45.6 dB 57.1 dB WC 45.6 dB 57.1 dB IRP 49.2 dB 49.3 dB 60.9 dB 62.0 dB IRP-SA 48.8 dB 53.5 dB 61.0 dB 66.9 dB Experimental Results4th Order IIR Filter (cont’d) • Filter Realization: • MATLAB’s tfsos command (pole-zero pairing) • 2 Cascaded Direct-Form IIR filters Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  26. Experimental Results4th Order IIR Filter (cont’d) IRP: (A2[0]*t2 - A2[1]*D2[0] << 1) + (A2[2]*D2[1] << 1 ) << 2 IRP-SA: (A2[0]*t2 << 3) - (A2[1]*D2[0] << 3) + (A2[2]*D2[1] << 3) Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  27. 14 Bit 16 Bit Algorithm w/o FMLS w/ FMLS w/o FMLS w/ FMLS SNU-4 28.7 dB 36.7 dB 36.7 dB 28.7 dB 28.7 dB 36.7 dB WC 28.7 dB 36.7 dB IRP 28.7 dB 34.9 dB 36.7 dB 44.6 dB IRP-SA 28.7 dB 34.9 dB 36.7 dB 44.6 dB Experimental Results:1024-Point Radix-2 FFT Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  28. Experimental Results:Rotational Inverted Pendulum U of T System Control Group Non-linear Testbench Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  29. 14 Bit 16 Bit Algorithm w/o FMLS w/ FMLS w/o FMLS w/ FMLS SNU-4 4.0 dB 30.7 dB 54.9 dB 42.7 dB 54.3 dB 66.1 dB WC 47.3 dB 59.2 dB IRP 53.1 dB 58.4 dB 65.8 dB 71.8 dB IRP-SA 52.8 dB 59.4 dB 64.4 dB 72.0 dB Experimental Results:Rotational Inverted Pendulum Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  30. Experimental Results:Rotational Inverted Pendulum - 12-bit Controller Comparison WC : 32.8 dB IRP-SA: 41.1 dB IRP-SA w/ fmls: 48.0 dB Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  31. Experimental Results:16th Order Lattice Filter t h 16 Order Elliptic Bandpass Filter Transfer Function 20 0 -20 Magnitude (dB) -40 -60 -80 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ´ p Normalized Frequency ( rad/sample) 1000 500 0 Phase (degrees) -500 -1000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ´ p Normalized Frequency ( rad/sample) Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  32. 32 Bit w/o Loop Unrolling 16 Bit w/ Loop Unrolling Algorithm w/o FMLS w/ FMLS w/o FMLS w/ FMLS SNU-4 22.8 dB 47.1 dB 47.0 dB 22.8 dB 28.1 dB 48.3 dB WC 28.1 dB 48.3 dB IRP 36.1 dB 36.2 dB 51.3 dB 51.3 dB IRP-SA 36.1 dB 36.2 dB 51.3 dB 50.9 dB Experimental Results:Lattice Filter Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  33. Experimental Results:Lattice Filter #define N 16; double state[N+1], K[N], V[N+1]; double lattice( double x ) { double y = 0.0; for( int i=0; i < N; i++ ) { x = x - K[N-i-1] * state[N-i-1]; state[N-i] = state[N-i-1] + K[N-i-1]*x; y = y + V[N-i]*state[N-i]; } state[0] = x; return y + V[0]*state[0]; } Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  34. Experimental Results:Lattice Filter • Observation: Wide Dynamic Ranges of “state”, “V”, “x”, and “y” are due to ‘Name Dependencies’ of array elements and accumulators when assigning integer word lengths. • Can use Loop Unrolling + Renaming to break dependencies and achieve far better results (iteration dependant analysis mentioned in FRIDGE paper—however no experimental results reported) Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  35. Presentation Outline • Background Material / Motivation • Floating-to-Fixed-Point Conversion • Architecture Support • Experimental Results • Summary / Future Directions Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  36. Summary • Intermediate result profile data can used to reduce numerical error of fixed-point code. • A fractional multiply with integrated left shift operation can improve the results, especially when combined with the IRP-SA algorithm. • Improvements between 3.0 dB and 12.8 dB have been observed so far. Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

  37. Future Directions • Structural Transformations • Extended Precision Arithmetic • Overflows due to accumulated rounding error — use two profiling phases to estimate the effect of ‘second-order’ interactions. Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation

More Related