Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Tor Aamodt and Paul Chow University of Toronto { aamodt, pc }@eecg.utoronto.ca 3rd ACM International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Nov. 17-18th, 2000, San Jose CA

What is this presentation about? • FOCUS: Signal processing applications developed using high-level language representation and floating-point data types... • WANT: Faster fixed-point software development... • QUESTION: Are there “better” fixed-point DSP instruction-sets in terms of runtime, power, or roundoff-noise performance? Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Presentation Outline • Motivation & Background • Focus on… • Automatic Conversion to Fixed-Point • Architectural Enhancements • Some Experimental Results • Summary / Future Directions Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Motivation • 80% of DSPs in use are Fixed-Point. Why? • Because fixed-point hardware is cheaper and uses less power … • … however, it is much harderto develop signal-processing software for. Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Background • UTDSP Project: DSP Compiler/Architecture Co-design • Traditional DSP architectures are hard for compilers to generate efficient code for… eg. extended precision accumulators • First Generation Silicon Sept. 30, 1999: 108 pin PGA 0.35 µm CMOS / 63 MHz (Sean Peng’s M.A.Sc.) • 16-bit Fixed-Point VLIW DSP with novel 2-level Instruction fetching architecture (reduced pin-count) • June 2000: Synopsys CoCentric Fixed-Point Designer Tool • First commercial tool for transforming floating-point ANSI C programs into fixed-point ($20,000 US) Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

signbit 8 bit exponent (excess 127) 23+1 bit normalized mantissa 32 bit Floating-Point (IEEE): Fixed-Point: Background: Fixed-Point versus Floating-Point explicit binary-point implied binary-point sign bit Integer Part Fractional Part Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Explicit Scaling Operations Background: Using Fixed-Point Arithmetic Floating-Point: yn=yn-1 + xn yn=(( •yn-1>>3)+ xn )<< 1 Fixed-Point: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Automatic Conversion Process Traditional Optimizing Compiler: Input Program Parser Optimizer Code Generator Processor • CONSTRAINT: Input/Output Invariance • GOAL: Application Speedup ie. make code faster, but do not break anything!!! Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Automatic Conversion Process Traditional Optimizing Compiler: Input Program Optimizer Code Generator Parser Processor Sample Inputs Floating-Point to Fixed-Point Translator • “RELAX” CONSTRAINTS… • GOALS: “Good” Input/Ouput Fidelity (eg. good signal-to-noise ratio) Fast/Low-Power Operation (10-500  faster than FP emulation) Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

float a, b, x[N]; y = a*x[i] + b*x[i+1]; 1. Type Conversion 2. Scaling Operations 3. Fractional Fixed-Point Operations Floating-Point to Fixed-Point Translation int a, b, x[N]; y = a•x[i] >> 2 + b•x[i+1]; Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Floating-Point to Fixed-Point Translator SUIF Parser* Optimizer Identifier Assignment Fixed-PointConversion Instrument Code Sample Inputs Profile *SUIF = Stanford University Intermediate Format See: http://suif.stanford.edu Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Consider the ANSI C code: float a, b, x[N]; y = a*x[i] + b*x[i+1]; Code Instrumentation: tmp_1 = a*x[i]; tmp_2 = b*x[i+1]; y = tmp_1 * tmp_2; profile(tmp_1,1); profile(tmp_2,2); profile(y,0); Equivalent Expression Tree: ID Assignment: a “1” : tmp_1 * x[i] y + “0” : b * x[i+1] “2” : tmp_2 Collecting Dynamic Range Information Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

IWL Sign bit Integer Part Fractional Part Generating Scaling Operations • Signal Scaling: Integer Word Length (IWL) • definition: IWL[x] = log2 max(x) + 1 Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

   ?   Generating Scaling Operations Example: “A op B”: IWLA op B measured IWLA op B current IWLA measured IWLA current IWLB measured IWLB current op Converted Sub-Expressions A B Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Automatic Conversion Process:IRP: Using Intermediate ResultProfileData • Previous Algorithms: • ‘Worst-Case Evaluation’: Markus Willems et. al. FRIDGE: An Interactive Code Generation Environment for HW/SW CoDesign. ICASSP, April 1997. (a.k.a. Predecessor to Synopsys CoCentric Fixed-Point Designer Tool) • A ‘Statistical’ Approach: Ki-Il Kum, Jiyang Kang, and Wonyong Sung. A Floating-Point to Fixed-Point C Converter for Fixed-Point Digital Signal Processors. In Proc. 2nd SUIF Compiler Workshop, August 1997. • Neither use Intermediate Result Profile data, instead, they combine range information from leaf nodes  Is Useful Information Lost? Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

“A  B”  “(A << nA)  (B >> [n-nB])” where: nA = IWLA current - IWLA measured nB = IWLA current - IWLB measured n = IWLA measured - IWLB measured IWLA+B current = IWLA measured IRP: Additive Operations For example, assume |A| > |B|, and IWLA+B measured  IWLA measured “A ± B” A: B: >> n n Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

IRP: Multiplication “A • B”  “(A << nA) • (B << nB)” where: nA = IWLA current - IWLA measured nB = IWLA current - IWLB measured IWLA•B current =IWLA measured+ IWLB measured Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

IRP: Division “A / B”  “(A >> [ndividend - nA]) / (B << nB)” nA = IWLA current - IWLA measured nB = IWLA current - IWLB measured ndiff = IWLA/B measured - IWLA measured + IWLB measured ndividend = ndiff, if ndiff  0 0 , otherwise Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Example: y = (a*x[i] + (b*x[i+1]>>1)) << 1 Question: Is information discarded unnecessarily here? Consider the following alternative: y = (a*x[i]<<1) + b*x[i+1] IRP-SA: Using ‘Shift Absorption’ BUT: Can we really discard most significant bits and get roughly the same answer???? YES! Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Architectural Support Common occurrence (using IRP-SA): A•B << n IWLA Fractional Multiplication with internal Left Shift A: IWLB B: IWLA+ IWLB A*B: n Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Experimental Results Benchmarks 4th Order Cascaded/Parallel IIR Filter (IIR-C, IIR-P) (Normalized) Lattice Filter (LAT, NLAT) 128-Point Radix 2 Decimation in Time FFT (FFT-NR, FFT-MW) Levinson-Durbin Recursion (LEVDUR) 10x10 Matrix-Multiply (MMUL10) Nonlinear Control (INVPEND) Trig Function (SIN) Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

SQNR Enhancement: FMLS and/or IRP-SA Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

What Is The Effect of “Shift Absorption” ? Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Experimental Results:Rotational Inverted Pendulum U of T System Control Group Non-linear Testbench Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Closed-Loop System Response: Rotational Inverted Pendulum12-bit Controller Comparison WC : 32.8 dB IRP-SA: 41.1 dB IRP-SA w/ fmls: 48.0 dB Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

128-Point Radix-2 FFT (Generated by MATLAB RealTime Workshop) Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Speedup? Rotational Inverted Pendulum: Fractional Multiply Output Shift Relative Frequencies Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

…Yup! Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Speedup* Using FMLS Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

SQNR Enhancement for various Output Shift Sets Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Summary • The Fractional Multiply with internal Left Shift (FMLS) operation can improve runtime and signal-to-noise performance. Speedups of up to 35% and SQNR enhancement equivalent of up to 2 bits maybe even 4 bits (depending on how you choose to measure it) • Easy VLSI implementation, and easy for compiler to use. Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Future Directions • Higher Level Transformations: • Automatic Generation of Block-Floating-Point... • Quantization Error Feedback… • BOTH need signal-flow-graph representation… therefore probably need a better DSP language than ANSI C • Variable Precision Arithmetic (How much precision does each operation need?) Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation