1 / 28

Common Subexpression Elimination Involving Multiple Variables for Linear DSP Synthesis

Common Subexpression Elimination Involving Multiple Variables for Linear DSP Synthesis . 15 th IEEE International Conference on Application Specific Architectures and Processors (ASAP). Farzan Fallah Advanced CAD Research Fujitsu Labs. of America. Anup Hosangadi Ryan Kastner

ena
Download Presentation

Common Subexpression Elimination Involving Multiple Variables for Linear DSP Synthesis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Common Subexpression Elimination Involving Multiple Variables for Linear DSP Synthesis 15th IEEE International Conference on Application Specific Architectures and Processors (ASAP) Farzan Fallah Advanced CAD Research Fujitsu Labs. of America Anup Hosangadi Ryan Kastner ECE Department, UCSB

  2. Outline • Introduction • Arithmetic expressions and polynomial formulation • Eliminating multiple variable common subexpressions • Results • Limitations of proposed technique • Conclusions

  3. Introduction • Multiplications by constants encountered in many application areas • DSP transforms in Audio, Video, Image processing (DFT, DCT, IDCT etc..) • Filtering operations in Communication (FIR, IIR filters) • Multiple Input Multiple Output (MIMO) systems • Polynomials in Computer graphics

  4. Introduction • Multiplication is expensive in hardware • Decompose constant multiplications into shifts and additions • 13*X = (1101)2*X = X + X<<2 + X<<3 • Signed digits can reduce the number of additions/subtractions • Canonical Signed Digits (CSD) (Knuth’74) • (57)10 = (0110111)2 = (100-1001)CSD • Further reduction possible by common subexpression elimination • Upto 50% reduction (R.Hartley TCS’96)

  5. 4+, 4<< “0101” => X + X<<2 3+, 3<< Introduction • Common subexpressions = common digit patterns • F1 = 7*X = (0111)*X = X + X<<1 + X<<2 F2 = 13*X = (1101)*X = X + X<<2 + X<<3 • D1 = X + X<<2 F1 = D1 + X<<1 F2 = D1 + X<<3 • Good for single variable: FIR filters (transposed form) • Multiple variable? (DFT, DCT etc..??)

  6. Introduction • Matrix form of linear systems Y1 a11 a12 a13 X1 Y2 =a21 a22 a23 xX2 Y3 a31 a32 a33 X3 Potkonjak TCAD’95 All Distinct SijXj and CikDk Y1 Y2 Y3

  7. Arithmetic expressions & Polynomial formulation • View linear systems as set of arithmetic expressions • Expressions consisting of +,-,<< operators • Develop methodology for extracting common subexpressions • Polynomial formulation C×X=(±X×Li) (14)(10)×X=(1110)(2)×X = X<<3 + X<<2+ X<<1 = XL3 + XL2 + XL1 = (100-10)(CSD)×X = XL4 - XL

  8. Arithmetic expressions and Polynomial formulation • Y1 = 5 7 X1 Y2 4 12 X2 • Polynomial formulation 5 = 0101 7 = 0111 4 = 0100 12 = 1100 Y1 = (1)X1 + (2)X1L2 + (3)X2 + (4)X2L + (5)X2L2 Y2 = (6)X1L2 + (7)X2L2 + (8)X2L3 6 <<, 6 +

  9. Digit pattern matching techniques 0 1 0 1 0 1 1 1 0 1 0 0 1 1 0 0 X1 X2 D1 = X2 + X2<<1 Y1 = X1 + X1<<2 + D1+ X2<<2 Y2 = X1<<2 + D1<<2 5 <<, 5 +

  10. Algebraic techniques for factoring and eliminating common subexpressions • Algebraic methods in multi-level logic synthesis (MLLS) • Reducing literal count in a set of Boolean expressions • Factoring, decomposition: Established algebraic techniques • Can be applied to linear arithmetic expressions as well D1 = X1+ X2<<2 Y1 = D1 + D1<<3 + X1<<3 Y2 = D1 + X2<<2

  11. Finding candidate common subexpressions (kernels) • Terminology • Divisor: An expression having at least one term with a non-zero exponent of L • eg. X1 + X2L + X3L2 is a divisor • X1L + X2L2 + X3L2is not a divisor • Kernel: Divisor obtained from original expression by division by an exponent of L. • Co-kernel: Exponent of L that is used to obtain the kernel • Example • P = X1L3 + X2L3 + X2L2 + X3 • Division by L2 kernel = X1L + X2L + X2; co-kernel = L2

  12. Kernel generation algorithm Recursively divide by the smallest non-zero exponent of L Y1 = (1)X1 + (2)X1L2 + (3)X2 + (4)X2L + (5)X2L2 Y2 = (6)X1L2 + (7)X2L2 + (8)X2L3 • Divide Y2 by L2 » Divide again by L » Divide Y1 by L

  13. Kernel generation • All kernels and co-kernels for example linear system ((1)X1 + (2)X1L2 + (3)X2 + (4)X2L + (5)X2L2)[1] ((2)X1L + (4)X2 + (5)X2L)[L] ((2)X1 + (5)X2)[L2] ((6)X1L2 + (7)X2L2 + (8)X2L3)[1] ((6)X1 + (7)X2 + (8)X2L)[L2]

  14. Importance of Kernels • Theorem: There exists a k-term common subexpression iff there is a k-term “non-overlapping” intersection between at least two kernels • Proof • If: Non-overlapping k-term intersection => K-term common subexpression Only If: If there are 2 instances of k-term subexpression • Case1: “divisor” => Each instance will be a part of some kernel expression • Case2: “non-divisor” => dividing by smallest non-zero exponent of L will convert it into a “divisor”

  15. Kernel generation • eg. 10*X = (1010)*X = (1)XL + (2)XL3 14*X = (1110)*X = (3)XL + (4)XL2 + (5)XL3 • common subexpression = XL + XL3 = (X + XL2)L • kernels involved in intersection: • ((1)X + (2)XL2) • ((3)X+ (4)XL + (5)XL2)

  16. 1 0 0 1 0 0 1 Overlapping kernels • Consider (1001001)*X • (1001001)*X = (1)XL6 + (2)XL3 + (3)X • Kernels • [1] ( (1)XL6 + (2)XL3 + (3)X) • [L3] ( (1)XL3 + (2)X)

  17. Y1 = X1 + X1L2 + X2 + X2L + X2L2 Y2 = X1L2 + X2L2 + X2L3 X2L2 Finding kernel intersections • Form Kernel Cube Matrix (KCM) • One row for each kernel generated • One column for each distinct kernel cube • Each non-zero element represents a term

  18. Finding kernel intersections • Each rectangle with non-overlapping terms = a common subexpression • Rectangle: Set of rows and columns such that all elements are ‘1’ • Search only for prime rectangles • Prime rectangle: Rectangle that is not covered by any other rectangle • Prime rectangle may have overlapping terms • Find a non-overlapping rectangle within the prime rectangle (MIR = Maximum Irredundant Rectangle) • Value of a rectangle (R = #Rows, C = #Cols) • Value = # of additions/subtractions saved by selecting rectangle • V(R,C) = (R-1)*(C-1)

  19. Finding kernel intersections • Selecting common subexpressions • Greedy selection of most valued non-overlapping rectangle in each iteration • This is very expensive • Worst case O(2MN) prime rectangles to be considered • M = # of expressions; N = Bit-width • Heuristic required (ping-pong) • Start with a seed row/column • Build rectangle by intersections with other rows/cols • Complexity = Linear in #Rows/Columns

  20. Finding kernel intersections • 4 • 7 8 4 5 7 8 MIR = OR

  21. Select D1 = X1 + X2 + X2L, saves 2 additions! Extracting kernel intersections (1st Iteration)

  22. Extracting Kernel intersections (2nd iteration) D2 = X1 + X2 Final Implementation D2 = X1 + X2 D1 = D2 + X2<<1 Y1 = D1 + D2<<2 Y2 = D1<<1 3 <<, 3 +

  23. Experimental Setup • Goal • Reduction in #additions/subtractions • Effect on area/latency on synthesis • Transforms DCT, IDCT,DFT, DST, DHT. • 8x8 constant matrices • 16 digits precision (CSD representation) • Compare with • Potkonjak (TCAD’95) • RESANDS (Nguyen et. al TVLSI’2000)

  24. Experimental results

  25. Experimental results • Synthesis results (Minimum Latency constraints)

  26. Limitations of this technique • Results dependant on initial representation of constants • Mixed representation • Too many: O(3N) per constant • Factoring of constants • eg. 105*X = 15*7*X = (16-1)*(8-1)*X = ( (X<<4 -1)<<3 – 1) • Factoring in general is very hard • Common subexpressions with reversed signs • eg. (X1 – X2) = -(X2 – X1) cannot be detected

  27. Conclusions • Contributions • Novel polynomial transformation • Adapting rectangle covering methods • Single var and multi-var subexpressions eliminated together => better results • Future work • Addressing shortcomings of current method • Optimization for timing, power

  28. Conclusions • Thank you!! • Questions??

More Related