1 / 13

Distributed Arithmetic

Distributed Arithmetic. Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources. Objective. Distributed arithmetic What ? Where ? How ?. What is DA?. Multiplication using LUT Used to implement multipliers in LUT rich FPGAs.

hugh
Download Presentation

Distributed Arithmetic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

  2. Objective • Distributed arithmetic • What ? • Where ? • How ?

  3. What is DA? • Multiplication using LUT • Used to implement multipliers in LUT rich FPGAs

  4. Twos Complement Multiplication One bit at a time:

  5. SDA 1-Tap FIR Filter 1 Z-1 A0 00000...0 0 C0 1 LUT contains two locations N BITS WIDE SAMPLE DATA Partial Product ROM A0 +/- X0 Parallel to serial converter Scaling Accumulator

  6. Partial products of equal weight are added together before being summed to next higher partial product weight Create look-up table of summed partial products Distributed Arithmeticfor a 2-Tap Filter -23 22 21 20 -23 22 21 20 C0 = 1 0 0 1 (-7) C1 = 0 1 1 0 ( 6) X X0 = 0 1 1 1 ( 7) X X1 = 0 1 0 1 ( 5) + + + + ( 1 0 0 1 ( 1 0 0 1 ( 1 0 0 1 (0 0 0 0 1 1 0 0 1 1 1 1 0 1 1 0) 0 0 0 0 ) 0 1 1 0 ) 0 0 0 0 ) 0 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 0 = 1 1 1 0 1 1 0 1 (-1) (-14) (-4) (0) (-19) (-49) ( 30) (Serial-Data / Tap-Parallel Multiply) = Sign Extension

  7. SDA 2-Tap FIR Filter 0000...0 C0 Z-1 C1 C0 + C1 1 00 01 10 11 LUT contains all possible sums of the partial products N BITS WIDE SAMPLE DATA Partial Product ROM A0 X0 +/- A1 X1 Scaling Accumulator

  8. SDA 4-Tap FIR Filter Z-1 +/- Scaling Accumulator N BITS WIDE SAMPLE DATA Partial Product ROM A0 0000...0 X0 C0 1 + A1 0000...0 X1 C1 1 + A2 0000...0 X2 C2 1 + A3 0000...0 X3 C3

  9. SDA 8-Tap FIR Filter 1 1 1 1 1 1 1 Z-1 N BITS WIDE SAMPLE DATA A0 Partial Product ROM X0 A1 X1 A2 Pre-Adder X2 A3 X3 + +/- A0 X4 Partial Product ROM Scaling Accumulator A1 X5 A2 X6 4 -input LUT contains all possible sums of the partial products A3 X7

  10. 60 Single MAC DA FIR B=8 50 DA FIR B=12 40 DA FIR B=16 Sample Rate (MSPS) 30 Serial FPGA FIR 20 10 0 0 50 100 150 200 250 Xilinx DA FIR Performance 6000 Dual MAC DA FIR B=8 5000 DA FIR B=12 4000 DA FIR B=16 3000 Performance (MMACs/s) Serial FPGA FIR 2000 1000 0 0 50 100 150 200 250 Filter Length (Taps) Filter Length (Taps) fclk = 200 MHz for both processor and FPGA B = data sample precision for FPGA

  11. Trade Clock Cycles for Logic Area Trade Clock Cycles for Logic Area Multi bits per clock cycle 160Ms/s 20Ms/s b7 b7 b7 Serial-DA Parallel-DA b4 b3 b0 Hardware Over-sampling = 4 b0 Hardware Over-sampling = 8 Hardware Over-sampling = 2 b0 b0 b7 b3 Hardware Over-sampling = 1 b4 b0 The sample is serialized and processed 1 bit per clock cycle. 8 clock cycles are thus required to process the whole sample The sample is serialized and processed 2 bits per clock cycle. 4 clock cycles are thus required to process the whole sample The sample is processed in parallel 8 bits per clock cycle The sample is serialized and processed 4 bits per clock cycle b0

  12. Conclusion • Efficiency of computation • Slow as its bit serial • Memory requirements

  13. References • The role of Distributed Arithmetic in FPGA based signal processing, www.xilinx.com

More Related