Common Subexpression Elimination Involving Multiple Variables for Linear DSP Synthesis

Common Subexpression Elimination Involving Multiple Variables for Linear DSP Synthesis 15th IEEE International Conference on Application Specific Architectures and Processors (ASAP) Farzan Fallah Advanced CAD Research Fujitsu Labs. of America Anup Hosangadi Ryan Kastner ECE Department, UCSB

Outline • Introduction • Arithmetic expressions and polynomial formulation • Eliminating multiple variable common subexpressions • Results • Limitations of proposed technique • Conclusions

Introduction • Multiplications by constants encountered in many application areas • DSP transforms in Audio, Video, Image processing (DFT, DCT, IDCT etc..) • Filtering operations in Communication (FIR, IIR filters) • Multiple Input Multiple Output (MIMO) systems • Polynomials in Computer graphics

Introduction • Multiplication is expensive in hardware • Decompose constant multiplications into shifts and additions • 13*X = (1101)2*X = X + X<<2 + X<<3 • Signed digits can reduce the number of additions/subtractions • Canonical Signed Digits (CSD) (Knuth’74) • (57)10 = (0110111)2 = (100-1001)CSD • Further reduction possible by common subexpression elimination • Upto 50% reduction (R.Hartley TCS’96)

4+, 4<< “0101” => X + X<<2 3+, 3<< Introduction • Common subexpressions = common digit patterns • F1 = 7*X = (0111)*X = X + X<<1 + X<<2 F2 = 13*X = (1101)*X = X + X<<2 + X<<3 • D1 = X + X<<2 F1 = D1 + X<<1 F2 = D1 + X<<3 • Good for single variable: FIR filters (transposed form) • Multiple variable? (DFT, DCT etc..??)

Introduction • Matrix form of linear systems Y1 a11 a12 a13 X1 Y2 =a21 a22 a23 xX2 Y3 a31 a32 a33 X3 Potkonjak TCAD’95 All Distinct SijXj and CikDk Y1 Y2 Y3

Arithmetic expressions & Polynomial formulation • View linear systems as set of arithmetic expressions • Expressions consisting of +,-,<< operators • Develop methodology for extracting common subexpressions • Polynomial formulation C×X=(±X×Li) (14)(10)×X=(1110)(2)×X = X<<3 + X<<2+ X<<1 = XL3 + XL2 + XL1 = (100-10)(CSD)×X = XL4 - XL

Arithmetic expressions and Polynomial formulation • Y1 = 5 7 X1 Y2 4 12 X2 • Polynomial formulation 5 = 0101 7 = 0111 4 = 0100 12 = 1100 Y1 = (1)X1 + (2)X1L2 + (3)X2 + (4)X2L + (5)X2L2 Y2 = (6)X1L2 + (7)X2L2 + (8)X2L3 6 <<, 6 +

Digit pattern matching techniques 0 1 0 1 0 1 1 1 0 1 0 0 1 1 0 0 X1 X2 D1 = X2 + X2<<1 Y1 = X1 + X1<<2 + D1+ X2<<2 Y2 = X1<<2 + D1<<2 5 <<, 5 +

Algebraic techniques for factoring and eliminating common subexpressions • Algebraic methods in multi-level logic synthesis (MLLS) • Reducing literal count in a set of Boolean expressions • Factoring, decomposition: Established algebraic techniques • Can be applied to linear arithmetic expressions as well D1 = X1+ X2<<2 Y1 = D1 + D1<<3 + X1<<3 Y2 = D1 + X2<<2

Finding candidate common subexpressions (kernels) • Terminology • Divisor: An expression having at least one term with a non-zero exponent of L • eg. X1 + X2L + X3L2 is a divisor • X1L + X2L2 + X3L2is not a divisor • Kernel: Divisor obtained from original expression by division by an exponent of L. • Co-kernel: Exponent of L that is used to obtain the kernel • Example • P = X1L3 + X2L3 + X2L2 + X3 • Division by L2 kernel = X1L + X2L + X2; co-kernel = L2

Kernel generation algorithm Recursively divide by the smallest non-zero exponent of L Y1 = (1)X1 + (2)X1L2 + (3)X2 + (4)X2L + (5)X2L2 Y2 = (6)X1L2 + (7)X2L2 + (8)X2L3 • Divide Y2 by L2 » Divide again by L » Divide Y1 by L

Kernel generation • All kernels and co-kernels for example linear system ((1)X1 + (2)X1L2 + (3)X2 + (4)X2L + (5)X2L2)[1] ((2)X1L + (4)X2 + (5)X2L)[L] ((2)X1 + (5)X2)[L2] ((6)X1L2 + (7)X2L2 + (8)X2L3)[1] ((6)X1 + (7)X2 + (8)X2L)[L2]

Importance of Kernels • Theorem: There exists a k-term common subexpression iff there is a k-term “non-overlapping” intersection between at least two kernels • Proof • If: Non-overlapping k-term intersection => K-term common subexpression Only If: If there are 2 instances of k-term subexpression • Case1: “divisor” => Each instance will be a part of some kernel expression • Case2: “non-divisor” => dividing by smallest non-zero exponent of L will convert it into a “divisor”

Kernel generation • eg. 10*X = (1010)*X = (1)XL + (2)XL3 14*X = (1110)*X = (3)XL + (4)XL2 + (5)XL3 • common subexpression = XL + XL3 = (X + XL2)L • kernels involved in intersection: • ((1)X + (2)XL2) • ((3)X+ (4)XL + (5)XL2)

1 0 0 1 0 0 1 Overlapping kernels • Consider (1001001)*X • (1001001)*X = (1)XL6 + (2)XL3 + (3)X • Kernels • [1] ( (1)XL6 + (2)XL3 + (3)X) • [L3] ( (1)XL3 + (2)X)

Y1 = X1 + X1L2 + X2 + X2L + X2L2 Y2 = X1L2 + X2L2 + X2L3 X2L2 Finding kernel intersections • Form Kernel Cube Matrix (KCM) • One row for each kernel generated • One column for each distinct kernel cube • Each non-zero element represents a term

Finding kernel intersections • Each rectangle with non-overlapping terms = a common subexpression • Rectangle: Set of rows and columns such that all elements are ‘1’ • Search only for prime rectangles • Prime rectangle: Rectangle that is not covered by any other rectangle • Prime rectangle may have overlapping terms • Find a non-overlapping rectangle within the prime rectangle (MIR = Maximum Irredundant Rectangle) • Value of a rectangle (R = #Rows, C = #Cols) • Value = # of additions/subtractions saved by selecting rectangle • V(R,C) = (R-1)*(C-1)

Finding kernel intersections • Selecting common subexpressions • Greedy selection of most valued non-overlapping rectangle in each iteration • This is very expensive • Worst case O(2MN) prime rectangles to be considered • M = # of expressions; N = Bit-width • Heuristic required (ping-pong) • Start with a seed row/column • Build rectangle by intersections with other rows/cols • Complexity = Linear in #Rows/Columns

Finding kernel intersections • 4 • 7 8 4 5 7 8 MIR = OR

Select D1 = X1 + X2 + X2L, saves 2 additions! Extracting kernel intersections (1st Iteration)

Extracting Kernel intersections (2nd iteration) D2 = X1 + X2 Final Implementation D2 = X1 + X2 D1 = D2 + X2<<1 Y1 = D1 + D2<<2 Y2 = D1<<1 3 <<, 3 +

Experimental Setup • Goal • Reduction in #additions/subtractions • Effect on area/latency on synthesis • Transforms DCT, IDCT,DFT, DST, DHT. • 8x8 constant matrices • 16 digits precision (CSD representation) • Compare with • Potkonjak (TCAD’95) • RESANDS (Nguyen et. al TVLSI’2000)

Experimental results

Experimental results • Synthesis results (Minimum Latency constraints)

Limitations of this technique • Results dependant on initial representation of constants • Mixed representation • Too many: O(3N) per constant • Factoring of constants • eg. 105*X = 15*7*X = (16-1)*(8-1)*X = ( (X<<4 -1)<<3 – 1) • Factoring in general is very hard • Common subexpressions with reversed signs • eg. (X1 – X2) = -(X2 – X1) cannot be detected

Conclusions • Contributions • Novel polynomial transformation • Adapting rectangle covering methods • Single var and multi-var subexpressions eliminated together => better results • Future work • Addressing shortcomings of current method • Optimization for timing, power

Conclusions • Thank you!! • Questions??

Common Subexpression Elimination Involving Multiple Variables for Linear DSP Synthesis

Common Subexpression Elimination Involving Multiple Variables for Linear DSP Synthesis

Presentation Transcript

Multiple Linear Regression

Linear Systems Gaussian Elimination

Multiple Linear Regression

Multiple Linear Regression

Multiple Linear Regression

Global Common Subexpression Elimination

Optimizations (continue): Common subexpression elimination Code motion Strngth reduction

Multiple linear regression

Multiple Linear Regression

Busy Elimination Multiple Access

Advanced Compilers CMPSCI 710 Spring 2003 Common Subexpression Elimination

Global Common Subexpression Elimination

Common Subexpression Elimination

Common Subexpression Elimination

Common Subexpression Elimination and Copy Propagation in Titanium

POWER EXPRESSIONS INVOLVING VARIABLES

Multiple Independent Variables

Intrinsically Linear Variables

Solve systems of linear equations in two variables by elimination.

Multiple Linear Regression

Advanced Compilers CMPSCI 710 Spring 2003 Common Subexpression Elimination