Reducing Hardware Complexity of Linear DSP Systems by Iteratively Eliminating Two-Term Common Subexp...
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

IEEE/ACM Asia South Pacific Design Automation Conference (ASP-DAC), Shanghai, 2005 PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on
  • Presentation posted in: General

Reducing Hardware Complexity of Linear DSP Systems by Iteratively Eliminating Two-Term Common Subexpressions. IEEE/ACM Asia South Pacific Design Automation Conference (ASP-DAC), Shanghai, 2005. Anup Hosangadi Ryan Kastner ECE Department, UCSB. Farzan Fallah Advanced CAD Research

Download Presentation

IEEE/ACM Asia South Pacific Design Automation Conference (ASP-DAC), Shanghai, 2005

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Ieee acm asia south pacific design automation conference asp dac shanghai 2005

Reducing Hardware Complexity of Linear DSP Systems by Iteratively Eliminating Two-Term Common Subexpressions

IEEE/ACM Asia South Pacific Design Automation Conference (ASP-DAC), Shanghai, 2005

Anup Hosangadi

Ryan Kastner

ECE Department, UCSB

Farzan Fallah

Advanced CAD Research

Fujitsu Labs of America


Outline

Outline

  • Introduction

  • Related Work

  • Polynomial transformation

  • Common Subexpression elimination

  • Results

  • Conclusions


Introduction

Introduction

  • Multiplications by constants encountered in many application areas

    • DSP transforms in Audio, Video, Image processing (DFT, DCT, IDCT etc..)

    • Filtering operations in Communication (FIR, IIR filters)

    • Multiple Input Multiple Output (MIMO) systems

    • Polynomials in Computer graphics


Introduction1

Introduction

  • Multiplication is expensive in hardware

  • Decompose constant multiplications into shifts and additions

    • 13*X = (1101)2*X = X + X<<2 + X<<3

  • Signed digits can reduce the number of additions/subtractions

    • Canonical Signed Digits (CSD) (Knuth’74)

    • (57)10 = (0110111)2 = (100-1001)CSD

  • Further reduction possible by common subexpression elimination

    • Upto 50% reduction (R.Hartley TCS’96)


Introduction2

4+, 4<<

3+, 3<<

Introduction

  • Common subexpressions

    = common digit patterns

    • F1 = 7*X = (0111)*X = X + X<<1 + X<<2

      F2 = 13*X = (1101)*X = X + X<<2 + X<<3

    • D1 = X + X<<2

      F1 = D1 + X<<1

      F2 = D1 + X<<3

    • Good for single variable: FIR filters(transposed form)

    • Multiple variable? (DFT, DCT etc..??)

“0101”

=> X + X<<2


Related work

Related Work

  • Simple Bipartite matching (Potkonjak et. al TCAD’95)

    • (10101) and (01101) => common pattern = “101”

    • (10010) and (010010) => cannot detect pattern “1001”

  • Recursive Shift and Add (RESANDS) (H.Nguyen et. Al, TVLSI 2000)

    • (10010) and (010010) => common pattern “1001”

  • Exhaustive enumeration of all digit patterns (Pasko et. Al. TCAD’99)

    • (1011) => “0011”, “1001”, “1010”, “0101”, “1011”


Related work1

Related Work

  • Extending techniques for multiple variables

    Y1 a11 a12 a13 X1

    Y2 =a21 a22 a23 xX2

    Y3 a31 a32 a33 X3

Potkonjak et. al. TCAD’95

All Distinct SijXj and CikDk

Y1

Y2

Y3


Related work2

Related Work

  • Multiple Variable Common Subexpression elimination (A.Hosangadi et. al ASAP’04)

    • Polynomial transformation of linear systems.

    • Use rectangular covering methods

    • Cannot find subexpressions with reversed signs

      eg. (X1 – X2<<1) ≠ (X2<<1 – X1)

    • Common occurrence when signed digits are used

    • Rectangle covering has exponential complexity

    • Method to overcome these limitations ?


Related work3

Related Work

  • Algebraic methods in multi-level logic synthesis (MLLS)

    • Reducing literal count in a set of Boolean expressions

    • Factoring, decomposition: Established algebraic techniques

    • Typically used for thousands of variables and literals

  • Apply these methods to optimize linear systems?

D1 = X1+ X2<<2

Y1 = D1 + D1<<3 + X1<<3

Y2 = D1 + X2<<2


Linear systems and polynomial transformation

Linear systems and polynomial transformation

  • View linear systems as set of arithmetic expressions

    • Expressions consisting of +,-,<< operators

    • Develop methodology for extracting common subexpressions

  • Polynomial formulation

C × X = (±X×Li)

(14)10 × X = (1110)2 × X

= X<<3 + X<<2 + X<<1

= XL3 + XL2 + XL1

= (100-10)CSD × X = XL4 – XL1


Linear systems and polynomial transformation1

Linear Systems and polynomial transformation

  • Y0 1 1 1 1 X0

    Y1 =2 1 -1 -2 X1

    Y2 1 -1 -1 1 X2

    Y3 1 -2 2 -1 X3

  • Decomposing constant multiplications

H.264 Integer Transform

Y0 = X0 + X1 + X2 + X3

Y1 = X0<<1 + X1 - X2 - X3<<1

Y2 = X0 - X1 - X2 + X3

Y3 = X0 - X1<<1 + X2<<1 - X3

12+, 4<<


Linear systems and polynomial transformation2

Linear Systems and polynomial transformation

  • Y0 1 1 1 1 X0

    Y1 =2 1 -1 -2 X1

    Y2 1 -1 -1 1 X2

    Y3 1 -2 2 -1 X3

  • Polynomial transformation

H.264 Integer Transform

Y0 = X0 + X1 + X2 + X3

Y1 = X0L + X1 - X2 - X3L

Y2 = X0 - X1 - X2 + X3

Y3 = X0 - X1L + X2L - X3

12+, 4<<


Fx algorithm

Fx algorithm

  • Concurrent Decomposition and Factorization of Boolean Expressions (J.Rajski et. al TCAD’92)

    • Popular as Fast-Extract (Fx) algorithm

    • Expression f = gh + r

      • g = (ab + c) => Double cube divisor

      • g = ab => Single cube divisor

    • Fx algorithm for Linear systems?


Two term divisors

Two-term divisors

  • Obtained from every pair of terms in each expression

    • Divide by the minimum exponent of L

      • eg. F = X1 + X2L + X3L3

      • { +X2L, +X3L3}: Divide by L => (X2+ X3L2)

      • Divisors = (X1 + X2L), (X1 + X3L3), (X2 + X3L2)

  • Two divisors intersect if

    • The terms involved are distinct

    • (X1 – X2L)∩ (X1 - X2L) = φ

      (X1 – X2L)∩ (-X1 + X2L) = φ (reversed signs allowed !!)


Two term divisors1

Two-term divisors

  • Theorem: Multiple term common subexpression in set of expression iff non-overlapping intersection among two-term divisors

  • Many divisors with intersections, which one to choose?

    • Use greedy selection of divisor with most # of intersections

  • Selecting divisors changes expressions

    • Perform concurrent decomposition of expressions


Algorithm step 1

Algorithm (Step 1)

  • Creating set of divisors {Divisors};

    {Divisors} = φ;

    for each expression Pi

    {

    {Dnew} = Divisors for Pi;

    {Divisors} = {Divisors}∩ {Dnew};

    Update frequency statistics of {Divisors} ;

    }


Algorithm step 2 common subexpression elimination

Algorithm (Step 2)Common Subexpression Elimination

{Divisors} = Set of all 2-term divisors;

while( intersections present)

{

Find Best_Divisor in {Divisors} ;

{T} = Set of terms involved in intersection;

{D} = Set of divisors involving any term in {T} ;

{Divisors} = {Divisors} – {D};

Rewrite Expressions;

{Dnew} = New Divisors involving new terms;

{Divisors} = {Divisors}∩ {Dnew};

}


Algorithm complexity

Algorithm complexity

  • MxM constant matrix; N digits of precision

    Y0 1111 1111 1011 1001Y0 = X0 + X0L + ... XM-1L3+ XM-1

    Y1

    .. … … … …

    ..

    YM-11111 1110 0011 1010

M

N

O(MN) terms

M

=> O(M2N2) divisors


Algorithm step 11

Algorithm (Step 1)

  • Creating set of divisors {Divisors};

    {Divisors} = φ;

    for each expression Pi

    {

    {Dnew} = Divisors for Pi;

    {Divisors} = {Divisors}∩ {Dnew};

    Update frequency statistics of {Divisors} ;

    }

O(M2N2) distinct divisors

O(M2N2)

O(M3N2)


Algorithm step 2 common subexpression elimination1

Algorithm (Step 2)Common Subexpression Elimination

O(M2N2)

{Divisors} = Set of all 2-term divisors;

while( intersections present)

{

Find Best_Divisor in {Divisors} ;

{T} = Set of terms involved in intersection;

{D} = Set of divisors involving any term in {T} ;

{Divisors} = {Divisors} – {D};

Rewrite Expressions;

{Dnew} = New Divisors involving new terms;

{Divisors} = {Divisors}∩ {Dnew};

}

O(M2N2)


Algorithm

Algorithm

  • H.264 example

  • >> Select D0 = (X0 + X3)

Y0 = X0 + X1 + X2 + X3

Y1 = X0L + X1 - X2 - X3L

Y2 = X0 - X1 - X2 + X3

Y3 = X0 - X1L + X2L - X3


Algorithm1

Algorithm

  • H.264 example

  • >> Select D1 = (X1 – X2)

Y0 = D0 + X1 + X2

Y1 = X0L + X1 - X2 - X3L

Y2 = D0 - X1 - X2

Y3 = X0 - X1L + X2L - X3


Algorithm2

Algorithm

  • H.264 example

  • >> Select D2 = (X1 + X2)

Y0 = D0 + X1 + X2

Y1 = X0L + D1 - X3L

Y2 = D0 - X1 - X2

Y3 = X0 - D1L - X3


Algorithm3

Algorithm

  • H.264 example

  • >> Select D3 = (X0 – X3)

Y0 = D0 + D2

Y1 = X0L + D1 -X3L

Y2 = D0 - D2

Y3 = X0 - D1L - X3


Final implementation

Final Implementation

8+, 2<<

  • Extracting 4 divisors

D0 = X0 + X3 Y0 = D0 + D2

D1 = X1 – X2 Y1 = D1 + D3L

D2 = X1 + X2 Y2 = D0 - D2

D3 = X0 - X3 Y3 = D3 – D1L

Original: 12+, 4<<

Rectangle Covering:

10+, 3<<


Experimental setup

Experimental Setup

  • Goal

    • Reduction in #additions/subtractions

    • Effect on area/latency on synthesis

    • Simulate designs to estimate power consumption

  • Transforms DCT, IDCT,DFT, DST, DHT.

  • 8x8 constant matrices

  • 16 digits precision (CSD representation)

  • Compare with

    • Potkonjak (TCAD’95)

    • RESANDS (Nguyen et. al TVLSI’2000)

    • Rectangle Covering (A.Hosangadi et.al ASAP’04)


Experimental results

Experimental Results

Run Time 0.81s 0.08s


Experimental results1

Experimental results

(III)  RESANDS

(IV)  Rect. Covering

(V)  2-term CSE

  • Synthesis results (minimum latency constraints)


Experimental results2

Experimental results

(III)  RESANDS

(IV)  Rect. Covering

(V)  2-term CSE

  • Power consumption


Conclusions

Conclusions

  • A new technique for eliminating common subexpressions in linear systems

  • Fewer operations than known methods

  • Much faster than rectangle covering

  • Combine with scheduling on given resources


Ieee acm asia south pacific design automation conference asp dac shanghai 2005

  • Thank you

  • Questions??


  • Login