1 / 27

Implementation of Digital Filters in FPGA’s

Implementation of Digital Filters in FPGA’s. Ayaz Hasan. References. Chi-Jui Cou, Satish Mohankrishnan, Joseph B Evans, “FPGA Implementation of Digital Filters,” ICSPAT 1993 Uwe Meyer-Baese, “Digital Signal Processing with Field Programmable Gate Arrays,” 2003. Outline. Digital Filtering

bracha
Download Presentation

Implementation of Digital Filters in FPGA’s

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementation of Digital Filters in FPGA’s AyazHasan

  2. References • Chi-Jui Cou, Satish Mohankrishnan, Joseph B Evans, “FPGA Implementation of Digital Filters,” ICSPAT 1993 • Uwe Meyer-Baese, “Digital Signal Processing with Field Programmable Gate Arrays,” 2003

  3. Outline • Digital Filtering • Programmable Signal Processors vs. FPGA’s • Multiply Accumulate Units • Multipliers • Adders • Xilinx XC4000 implementations • FIR Filters • Pipelined MAC units

  4. Digital Filters • Modification of signal attributes in frequency or time domain • Linear Time-Invariant Filters • FIR Filters • Finite sum per output sample instant • IIR Filters • Infinite sum

  5. FIR Filters • Transfer Function • Lth order filter • Tapped Delay Structure • One of the multiplicands is an FIR coefficient • Non-recursive • No feedback • Finite Response

  6. IIR Filters • Transfer Function • Recursive Filter • Feedback • Canonical Filter • Has both recursive and non-recursive parts merged

  7. Programmable Signal Processors • Based on RISC architecture • At least one fast array multiplier (fixed or floating point) • Most algorithms MAC intensive • High MAC rates using multi-stage pipelined architecture • Cost effective

  8. FPGA’s • Can provide more bandwidth • Multiple MAC cells on a chip • Useful in high-bandwidth applications like wireless and multimedia • More efficient in implementing certain algorithms • CORDIC • Number Theoretic Transforms • Error-correction algorithms

  9. FPGA vs PDSP • PDSP • Complicated algorithms that contain several if-then-else constructs • FPGA • Front-end applications • FIR filters • CORDIC algorithms • FFT’s

  10. Target Device – Xilinx XC4000 • Basic logic element – Configurable Logic Block • Two separate 4-input, 1-output Lookup Tables • General purpose logic functions • Fast carry • One 3-input, 1-output LUT two combine two LUTs • Two flip flops • Five levels of routing • From CLB to CLB to long lines spanning the entire chip • Important in issues of speed • Can be used as 16x2 or 32x1 RAM or ROM

  11. Xilinx XC4000 CLB

  12. Multiply Accumulate Units • DSP algorithms are MAC intensive • Several approaches • Array approach • Addition using ripple carry methods • Linear convolution sum • L consecutive multiplications • L – 1 addition operations per sample • N x N-bit multipliers need to be fused together with an accumulator • Full N x N product is 2N bits wide, 2N-1 for signed #’s

  13. MAC Unit • MAC Components • 8 x 8 bit combinatorial array multiplier • 16-bit accumulator • Word sizes constrained by FPGA density • Larger word sizes possible if MAC units per chip reduced

  14. Multiplier • One CLB per partial product bit • 2-input AND gate generates each partial product • Addition logic • 64 CLB’s used • Signed Multiplication • Basic Cell Structure • Sum • Carry • xi AND ai

  15. Multiplier Implementation • ak ≠ 0 • Accumulation of X2k • ak = 0 • No operation

  16. Adder Implementation • 16-bits • 9 CLB’s, each configured as 2-bit adder • 7 for middle 14 bits • 1 each for MSB and LSB • Dedicated CLB carry logic • Improved efficiency of adders • Cout of a CLB can only be connected to a CLB above or below it • Vertical array • Delay of 20.5ns

  17. MAC Implementation • Performance • 100ns multiplier delay • 10 MHz • 73 CLB’s

  18. FIR Filter MAC Unit • MAC unit with 4 multipliers and an adder tree • Pipeline registers increase clock speed • 4 terms summed every clock cycle • 4 taps: Sampling rate = frequency • 8 taps: Sampling rate = frequency/2 • Maximum sampling frequency • M = # of multipliers • T = multiplier delay • N = # of tap filters

  19. FIR Filters • Performance • 100ns multiplier delay • 22.5ns adder delay • Routing delay may be up to75ns • 10 MHz clock • Sampling rates of 40/N MHz

  20. Pipelined MAC Units • Multiplier delay is a major limitation on maximum sampling rates • Pipelined array multipliers • Execution of separate multiplications overlaps • Carry propagating addition delay in last row of multiplier can be minimized • High sampling frequency can be achieved • Can be applied to previously mentioned FIR filters

  21. Pipelined MAC Units • Basic cells identical to unpipelined ones • Include pipeline registers • To propagate multiplier and multiplicand bits to the destination • To propagate product bits that have been completed, done in parallel with new batch of product bits • N x N multiplier • Carry propagate adder replaced with N rows of half adders with pipeline registers between the rows • Allows carry propagation of only one position between any two consecutive rows • Clock speed depends only on the delay in multiplier cells

  22. Pipelined MAC Units • For multiple tap filters • Accumulation of results needed through feedback of past output • Done by a set of full adders immediately below the diagonal of the array, feeding back outputs of full adders to their inputs through a single register • Clock rate • Approaches 100MHz for XC4000

  23. 4 x 4 Multiplier 6-Bit Accumulator • 4 MSB’s of multiplier fed back for accumulation • Output clocked out and accumulator reset after process complete • Filter coefficients and delayed inputs fed to multiplier in synchronized data streams • Arrivals corresponding to basic clock rate • N tap filter requires N+1 clock cycles for computation of one output

  24. FPGA Implementation • Routing delay critical • 3ns for output pipeline register to stabilize after clocking • Output then routed • Then 4.5ns delay in the next CLB • Total minimum delay 7.5ns • In addition, 3ns from pad to input • Some CLB’s can be used as registers between input pads and cells, preventing reduction of clock speed

  25. FPGA Implementation • 8 x 8 multiplier and 12-bit accumulator • 4.6ns worst case routing delay • 12.1ns worst case logic path delay • 80MHz clock rate • 2 MAC units can be accommodated in XC4013

  26. Conclusion • FPGA approach to digital filter implementation • Higher sampling rates than traditional DSP chips • Lower costs than ASICs for moderate volume • More flexibility • MAC units on a single FPGA • FIR Filter Implementation

  27. Questions

More Related