1 / 15

L8 : A Survey on Low Power Multiplication / Accumulation

L8 : A Survey on Low Power Multiplication / Accumulation. Contents. Introduction [1] Interlaced Accumulation Programming [2] Operand Swapping [3] Selective Coefficient Negation [4] Coefficient Optimization [5] Coefficient Reordering Conclusion & Future Works. Power Distribution of a DSP.

janet
Download Presentation

L8 : A Survey on Low Power Multiplication / Accumulation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. L8 : A Survey on Low Power Multiplication / Accumulation

  2. Contents • Introduction • [1] Interlaced Accumulation Programming • [2] Operand Swapping • [3] Selective Coefficient Negation • [4] Coefficient Optimization • [5] Coefficient Reordering • Conclusion & Future Works

  3. Power Distribution of a DSP • Hirotsugu [ISLPED ‘96] : For each test programs Normalized Power Consumption (%) 40 Variation due to Data Dependency 30 20 10 Pin Bus Misc. Control Memory Clocking Data Op. Address Generation Peripheral

  4. Multiplication and Accumulation: MAC • Major operation in DSP [ Modified Booth Encoding ] One of 0, X, -X, 2X, -2X based on each 2 bits of Y X X Y Y MULT ALU ACC PR CSA CPA MUL > (5 * ALU) PR

  5. Power Consumption by a Multiplier • Power Consumption by Data Dependency (nJ) X : Energy per cycle Y : # of input transitions Little Correlation 8 7 Average = 7nJ 6 (nJ) 5 2 4 3 1 2 1 20 40 60 20 40 36-bit ALU 16x16 MPY

  6. Power Consumption by a Multiplier • What is an important input in terms of power ? (nJ) (nJ) 8 8 7 7 6 6 5 5 Average = 1nJ Average = 5nJ 4 4 3 3 2 2 1 1 10 15 10 15 5 5 0x8000 x (random) (random) x 0x8000

  7. Power Consumption by a Multiplier • Booth encoding is a significant overhead. (nJ) (nJ) 8 8 7 7 6 6 5 5 Average = 4nJ Average = 6nJ 4 4 3 3 2 2 1 1 10 15 10 15 5 5 0x5555 x (random) (random) x 0x5555

  8. 2 3 1 Y(k) = C0 * X(k ) + C1 * X(k-1) + C2 * X(k-2) Y(k+1) = C0 * X(k+1) + C1 * X(k ) + C2 * X(k-1) 5 6 4 Y(k+2) = C0 * X(k+2) + C1 * X(k+1) + C2 * X(k ) 4 6 2 Y(k) = C0 * X(k ) + C1 * X(k-1) + C2 * X(k-2) Y(k+1) = C0 * X(k+1) + C1 * X(k ) + C2 * X(k-1) 3 5 1 Y(k+2) = C0 * X(k+2) + C1 * X(k+1) + C2 * X(k ) Interlaced Accumulation Programming(1/2) • Hirotsugu [ISLPED ‘96] 3-tap FIR filter (n=3)

  9. Interlaced Accumulation Programming(2/2) • More than 40% power is saved by • Keeping a constant at one operand of multiplier • X is kept : 7nJ -> 5 ~ 6nJ • Y is kept : 7nJ -> 1 ~ 3nJ • Reducing the number of memory access by a half • Traditional : two memory operands • Interlaced : one memory operand • ( data re-use by temporary register )

  10. Operand Swapping (1/2) • Weight = how many additions are needed ? Weight = 2 00111100 Y= 00X000X0 By Booth Encoding Operands Current (mW) A B A*B B*A Saving 7FFF AAAA 54% 10.0 22.0 0001 AAAA Low Weight High Switching 7FFF 6666 68% 10.0 31.6 0001 AAAA 7FFF AAAA 58% 12.2 28.8 0001 0001

  11. Operand Swapping (2/2) • For filter operations, one operand is usually is constant. => Operand swapping in compile-time. Y Current (mA) LowW ->LowW HighW ->HighW LowW ->HighW LowS HighS LowS HighS HighS LowS 4.0 9.5 11.9 21.2 19.2 X 7.7 13.0 21.6 31.2 27.5 HighS LowS : Low switching HighS : High switching LowW : Low weight HighW : High weight Candidate for Operand Swapping

  12. Selective Coefficient Negation • To reduce the toggle • store Coeff[i] or -Coeff[i] on memory • According to the negation, • use `multiply and add’ (MAC+ instruction) • use `multiply and sub’ (MAC- instruction) • GSM Vocoder : 11% power reduction ACC = ACC + (X * Y) ACC = ACC - (X * Y)

  13. Coefficient Optimization • Mahesh [TVLSI ‘98] • The design of the finite wordlength FIR filter • Given N coefficients and constraints, • Find a new set of coefficients such that the total Hamming distance between successive coefficients is minimized. • => using a coefficient perturbation & • an algorithm similar to simulated annealing • But, Hamming distance is not a good cost-function !!!

  14. Coefficient Ordering • MAC operation : commutative, associative • Finding a good ordering • N! cases for a N-tap filter Y(k) = C0 * X(k ) + C1 * X(k-1) + C2 * X(k-2) Y(k) = C1 * X(k-1 ) + C0 * X(k ) + C2 * X(k-2)

  15. Conclusion & Future Works • Power characteristics of a multiplier • Some techniques for low power MACs • Interlaced accumulation programming • Operand swapping • Selective coefficient negation • Coefficient optimization & ordering • Find an accurate power model for a multiplier • Cost function for coefficient optimization • & instruction-level power optimization • An implementation of a multiplier supporting • Selective ‘operand swapping’ & ‘negation’

More Related