slide1
Download
Skip this Video
Download Presentation
IEEE/ACM Asia South Pacific Design Automation Conference (ASP-DAC), Shanghai, 2005

Loading in 2 Seconds...

play fullscreen
1 / 31

IEEE/ACM Asia South Pacific Design Automation Conference (ASP-DAC), Shanghai, 2005 - PowerPoint PPT Presentation


  • 162 Views
  • Uploaded on

Reducing Hardware Complexity of Linear DSP Systems by Iteratively Eliminating Two-Term Common Subexpressions. IEEE/ACM Asia South Pacific Design Automation Conference (ASP-DAC), Shanghai, 2005. Anup Hosangadi Ryan Kastner ECE Department, UCSB. Farzan Fallah Advanced CAD Research

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'IEEE/ACM Asia South Pacific Design Automation Conference (ASP-DAC), Shanghai, 2005' - fai


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Reducing Hardware Complexity of Linear DSP Systems by Iteratively Eliminating Two-Term Common Subexpressions

IEEE/ACM Asia South Pacific Design Automation Conference (ASP-DAC), Shanghai, 2005

Anup Hosangadi

Ryan Kastner

ECE Department, UCSB

Farzan Fallah

Advanced CAD Research

Fujitsu Labs of America

outline
Outline
  • Introduction
  • Related Work
  • Polynomial transformation
  • Common Subexpression elimination
  • Results
  • Conclusions
introduction
Introduction
  • Multiplications by constants encountered in many application areas
    • DSP transforms in Audio, Video, Image processing (DFT, DCT, IDCT etc..)
    • Filtering operations in Communication (FIR, IIR filters)
    • Multiple Input Multiple Output (MIMO) systems
    • Polynomials in Computer graphics
introduction1
Introduction
  • Multiplication is expensive in hardware
  • Decompose constant multiplications into shifts and additions
    • 13*X = (1101)2*X = X + X<<2 + X<<3
  • Signed digits can reduce the number of additions/subtractions
    • Canonical Signed Digits (CSD) (Knuth’74)
    • (57)10 = (0110111)2 = (100-1001)CSD
  • Further reduction possible by common subexpression elimination
    • Upto 50% reduction (R.Hartley TCS’96)
introduction2
4+, 4<<

3+, 3<<

Introduction
  • Common subexpressions

= common digit patterns

    • F1 = 7*X = (0111)*X = X + X<<1 + X<<2

F2 = 13*X = (1101)*X = X + X<<2 + X<<3

    • D1 = X + X<<2

F1 = D1 + X<<1

F2 = D1 + X<<3

    • Good for single variable: FIR filters(transposed form)
    • Multiple variable? (DFT, DCT etc..??)

“0101”

=> X + X<<2

related work
Related Work
  • Simple Bipartite matching (Potkonjak et. al TCAD’95)
    • (10101) and (01101) => common pattern = “101”
    • (10010) and (010010) => cannot detect pattern “1001”
  • Recursive Shift and Add (RESANDS) (H.Nguyen et. Al, TVLSI 2000)
    • (10010) and (010010) => common pattern “1001”
  • Exhaustive enumeration of all digit patterns (Pasko et. Al. TCAD’99)
    • (1011) => “0011”, “1001”, “1010”, “0101”, “1011”
related work1
Related Work
  • Extending techniques for multiple variables

Y1 a11 a12 a13 X1

Y2 =a21 a22 a23 xX2

Y3 a31 a32 a33 X3

Potkonjak et. al. TCAD’95

All Distinct SijXj and CikDk

Y1

Y2

Y3

related work2
Related Work
  • Multiple Variable Common Subexpression elimination (A.Hosangadi et. al ASAP’04)
    • Polynomial transformation of linear systems.
    • Use rectangular covering methods
    • Cannot find subexpressions with reversed signs

eg. (X1 – X2<<1) ≠ (X2<<1 – X1)

    • Common occurrence when signed digits are used
    • Rectangle covering has exponential complexity
    • Method to overcome these limitations ?
related work3
Related Work
  • Algebraic methods in multi-level logic synthesis (MLLS)
    • Reducing literal count in a set of Boolean expressions
    • Factoring, decomposition: Established algebraic techniques
    • Typically used for thousands of variables and literals
  • Apply these methods to optimize linear systems?

D1 = X1+ X2<<2

Y1 = D1 + D1<<3 + X1<<3

Y2 = D1 + X2<<2

linear systems and polynomial transformation
Linear systems and polynomial transformation
  • View linear systems as set of arithmetic expressions
    • Expressions consisting of +,-,<< operators
    • Develop methodology for extracting common subexpressions
  • Polynomial formulation

C × X = (±X×Li)

(14)10 × X = (1110)2 × X

= X<<3 + X<<2 + X<<1

= XL3 + XL2 + XL1

= (100-10)CSD × X = XL4 – XL1

linear systems and polynomial transformation1
Linear Systems and polynomial transformation
  • Y0 1 1 1 1 X0

Y1 =2 1 -1 -2 X1

Y2 1 -1 -1 1 X2

Y3 1 -2 2 -1 X3

  • Decomposing constant multiplications

H.264 Integer Transform

Y0 = X0 + X1 + X2 + X3

Y1 = X0<<1 + X1 - X2 - X3<<1

Y2 = X0 - X1 - X2 + X3

Y3 = X0 - X1<<1 + X2<<1 - X3

12+, 4<<

linear systems and polynomial transformation2
Linear Systems and polynomial transformation
  • Y0 1 1 1 1 X0

Y1 =2 1 -1 -2 X1

Y2 1 -1 -1 1 X2

Y3 1 -2 2 -1 X3

  • Polynomial transformation

H.264 Integer Transform

Y0 = X0 + X1 + X2 + X3

Y1 = X0L + X1 - X2 - X3L

Y2 = X0 - X1 - X2 + X3

Y3 = X0 - X1L + X2L - X3

12+, 4<<

fx algorithm
Fx algorithm
  • Concurrent Decomposition and Factorization of Boolean Expressions (J.Rajski et. al TCAD’92)
    • Popular as Fast-Extract (Fx) algorithm
    • Expression f = gh + r
      • g = (ab + c) => Double cube divisor
      • g = ab => Single cube divisor
    • Fx algorithm for Linear systems?
two term divisors
Two-term divisors
  • Obtained from every pair of terms in each expression
    • Divide by the minimum exponent of L
      • eg. F = X1 + X2L + X3L3
      • { +X2L, +X3L3}: Divide by L => (X2+ X3L2)
      • Divisors = (X1 + X2L), (X1 + X3L3), (X2 + X3L2)
  • Two divisors intersect if
    • The terms involved are distinct
    • (X1 – X2L)∩ (X1 - X2L) = φ

(X1 – X2L)∩ (-X1 + X2L) = φ (reversed signs allowed !!)

two term divisors1
Two-term divisors
  • Theorem: Multiple term common subexpression in set of expression iff non-overlapping intersection among two-term divisors
  • Many divisors with intersections, which one to choose?
    • Use greedy selection of divisor with most # of intersections
  • Selecting divisors changes expressions
    • Perform concurrent decomposition of expressions
algorithm step 1
Algorithm (Step 1)
  • Creating set of divisors {Divisors};

{Divisors} = φ;

for each expression Pi

{

{Dnew} = Divisors for Pi;

{Divisors} = {Divisors}∩ {Dnew};

Update frequency statistics of {Divisors} ;

}

algorithm step 2 common subexpression elimination
Algorithm (Step 2)Common Subexpression Elimination

{Divisors} = Set of all 2-term divisors;

while( intersections present)

{

Find Best_Divisor in {Divisors} ;

{T} = Set of terms involved in intersection;

{D} = Set of divisors involving any term in {T} ;

{Divisors} = {Divisors} – {D};

Rewrite Expressions;

{Dnew} = New Divisors involving new terms;

{Divisors} = {Divisors}∩ {Dnew};

}

algorithm complexity
Algorithm complexity
  • MxM constant matrix; N digits of precision

Y0 1111 1111 1011 1001Y0 = X0 + X0L + ... XM-1L3+ XM-1

Y1

.. … … … …

..

YM-11111 1110 0011 1010

M

N

O(MN) terms

M

=> O(M2N2) divisors

algorithm step 11
Algorithm (Step 1)
  • Creating set of divisors {Divisors};

{Divisors} = φ;

for each expression Pi

{

{Dnew} = Divisors for Pi;

{Divisors} = {Divisors}∩ {Dnew};

Update frequency statistics of {Divisors} ;

}

O(M2N2) distinct divisors

O(M2N2)

O(M3N2)

algorithm step 2 common subexpression elimination1
Algorithm (Step 2)Common Subexpression Elimination

O(M2N2)

{Divisors} = Set of all 2-term divisors;

while( intersections present)

{

Find Best_Divisor in {Divisors} ;

{T} = Set of terms involved in intersection;

{D} = Set of divisors involving any term in {T} ;

{Divisors} = {Divisors} – {D};

Rewrite Expressions;

{Dnew} = New Divisors involving new terms;

{Divisors} = {Divisors}∩ {Dnew};

}

O(M2N2)

algorithm
Algorithm
  • H.264 example
  • >> Select D0 = (X0 + X3)

Y0 = X0 + X1 + X2 + X3

Y1 = X0L + X1 - X2 - X3L

Y2 = X0 - X1 - X2 + X3

Y3 = X0 - X1L + X2L - X3

algorithm1
Algorithm
  • H.264 example
  • >> Select D1 = (X1 – X2)

Y0 = D0 + X1 + X2

Y1 = X0L + X1 - X2 - X3L

Y2 = D0 - X1 - X2

Y3 = X0 - X1L + X2L - X3

algorithm2
Algorithm
  • H.264 example
  • >> Select D2 = (X1 + X2)

Y0 = D0 + X1 + X2

Y1 = X0L + D1 - X3L

Y2 = D0 - X1 - X2

Y3 = X0 - D1L - X3

algorithm3
Algorithm
  • H.264 example
  • >> Select D3 = (X0 – X3)

Y0 = D0 + D2

Y1 = X0L + D1 -X3L

Y2 = D0 - D2

Y3 = X0 - D1L - X3

final implementation
Final Implementation

8+, 2<<

  • Extracting 4 divisors

D0 = X0 + X3 Y0 = D0 + D2

D1 = X1 – X2 Y1 = D1 + D3L

D2 = X1 + X2 Y2 = D0 - D2

D3 = X0 - X3 Y3 = D3 – D1L

Original: 12+, 4<<

Rectangle Covering:

10+, 3<<

experimental setup
Experimental Setup
  • Goal
    • Reduction in #additions/subtractions
    • Effect on area/latency on synthesis
    • Simulate designs to estimate power consumption
  • Transforms DCT, IDCT,DFT, DST, DHT.
  • 8x8 constant matrices
  • 16 digits precision (CSD representation)
  • Compare with
    • Potkonjak (TCAD’95)
    • RESANDS (Nguyen et. al TVLSI’2000)
    • Rectangle Covering (A.Hosangadi et.al ASAP’04)
experimental results
Experimental Results

Run Time 0.81s 0.08s

experimental results1
Experimental results

(III)  RESANDS

(IV)  Rect. Covering

(V)  2-term CSE

  • Synthesis results (minimum latency constraints)
experimental results2
Experimental results

(III)  RESANDS

(IV)  Rect. Covering

(V)  2-term CSE

  • Power consumption
conclusions
Conclusions
  • A new technique for eliminating common subexpressions in linear systems
  • Fewer operations than known methods
  • Much faster than rectangle covering
  • Combine with scheduling on given resources
slide31
Thank you
  • Questions??
ad