1 / 21

High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations

High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations. J . Greg Nash www.centar.net jgregnash@centar.net ICNC 2014. Outline. Motivation for new FFT designs in wireless applications? Review of FFT architectures New systolic FFT architecture

ziv
Download Presentation

High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High-Throughput Programmable Systolic ArrayFFT Architecture and FPGA Implementations J. Greg Nash www.centar.net jgregnash@centar.net ICNC 2014

  2. Outline • Motivation for new FFT designs in wireless applications? • Review of FFT architectures • New systolic FFT architecture • Circuit FPGA performance comparisons • LTE SC-FDMA • Fixed-size power-of-two transforms • Variable transforms (LTE, WiMAX) • Conclusions

  3. Future Drivers for Wireless FFT Design • Algorithmic (OFDM) • Large transform sizes (LTE: 2048 points; DVB: 32K points) • Run-time scalable OFDMA (LTE : 128 to 2048 points) • Non-power-of-two transform sizes (LTE SC-FDMA: 35 sizes, 12 to 1296 points) • High performance (LTE advanced) • BW= 100MHz with 8 MIMO streams  <1.0sec for 2K FFT) • Critical system requirements • Power • Cost

  4. FFT Architecture Review (1): Pipelined Block Diagram Signal Flow Graph (8-point DFT) W=e-2πI/N Collapse onto pipelined hardware blocks • Features • Fast • Hardware Intensive • Non-programmable

  5. FFT Architecture Review (2): Memory Based Traditional Proposed Systolic Array Features • Programmable • Compact • Typically slow Features • Programmable • Faster than pipelined FFT • Scalable • Higher SQNR

  6. Matrix Form DFT (16-Point DFT) Z = C X • W=e-2πI/N (N=16)

  7. Inputs X and Outputs Z in Bit-reversed Form(N=16) é é ù é ù é ù é ù ù 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ê ê ú ê ú ê ú ê ú ú ê ê ú ê ú ê ú ú ê ú ê ê ú ê ú ê ú ê ú ú 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ê ê ú ê ú ê ú ê ú ú ê ê ú ê ú ê ú ú ê ú d1 d2 d3 d4 ê ú ê ú ê ú ê ê ú ú 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ê ê ú ê ú ê ú ê ú ú ê ê ú ê ú ê ú ê ú ú ê ú ê ú ê ú ê ê ú ú ê ú ê ú ê ú ê ú ê ú ë û ë û ë û ë û 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ê ú ê ú ê ú é - ù é - ù é - ù é - ù 1 I -1 I 1 I -1 I 1 I -1 I 1 I -1 I ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú - - - - ê 1 I -1 I 1 I -1 I 1 I -1 I 1 I -1 I ú ê ú ê ú ê ú ê ú 2 3 ê ú ê ú ê ú ê ú ê ú d1 W d2 W d3 W d4 ê ú ê ú ê ú ê ú ê ú ê ú - - - - I -1 I 1 I -1 I 1 I -1 I 1 I -1 I 1 ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ë - û ë - û ë û - ë - û 1 I -1 I 1 I -1 I 1 I -1 I 1 I -1 I ê ú ê ú Cb = ê ú ê é ù é ù é ù é ù ú 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 ê ú ê ú ê ú ê ú ê ú ê ê ú ê ú ê ú ê ú ú ê ê ú ê ú ê ú ê ú ú 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 ú ê ê ú ê ú ê ú ê ú 2 4 6 ú ê ê ú ê ú ê ú ê ú d1 W d2 W d3 W d4 ê ê ú ê ú ê ú ê ú ú 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 ú ê ê ú ê ú ê ú ê ú ú ê ê ú ê ú ê ú ê ú ê ú ú ê ú ê ú ê ú ê ê ú ê ú ê ú ê ú ê ú ë û ë û ë û ë û 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 ê ú ê ú ê ú é - ù é - ù é - ù é - ù 1 I -1 I 1 I -1 I 1 I -1 I 1 I -1 I ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú - - - - ê 1 I -1 I 1 I -1 I 1 I -1 I 1 I -1 I ú ê ú ê ú ê ú ê ú 3 6 9 ê ú ê ú ê ú ê ú ê ú d1 W d2 W d3 W d4 ê ú ê ú ê ú ê ú ê ú ê ú - - - - 1 I -1 I 1 I -1 I 1 I -1 I 1 I -1 I ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ú ê ê ú ê ú ê ú ê ú ú ë ë - û ë - û ë - û ë - û û 1 I -1 I 1 I -1 I 1 I -1 I 1 I -1 I “ ”= element by element multiply

  8. New FFT Matrix Form “ ”= element by element multiply (for b=4)

  9. “Base-b” FFT Architecture Base-bDFT equations: Base-4 DFT architecture: Physical Virtual

  10. Processing flow for DFT of length N = NrNc • 1. Nccolumn DFTs (Xci) of length Nr • 2. Nrrow DFTs (Xri) of length Nc

  11. Base-4 Array Architecture 256 Point FFT (Nr =Nc=16) 1024 Point FFT (Nr =Nc=32) Array Processing Elements

  12. Interconnection Delays • 65nm Technology: 256pt FFT Altera Pipelined FFT Systolic Critical Path Fmax = 537 MHz Fmax = 351 MHz

  13. LTE Uplink: Single Carrier FDMA • DFT spreading of data symbols in frequency domain • Reduces PAPR in uplink • Less dependence on frequency offset • 35 DFT sizes N (12-points to 1296-points) • Run-time choice of DFT size

  14. LTE Systolic DFT • Array size uses base-b = 6 • Example→ • N = 520-points ( • Use subset of physical array for P,Q≠6 36-pt DFTs 15-pt DFTs

  15. Programmability • Parameter List (Matlab): • Matrix factorization parameters(ax,by,cz,…) • Addresses for coefficients 240 points

  16. LTE DFT: FPGA Cycle Counts

  17. LTE DFT: FPGA Circuit Usage Comparisons (65nm Technology)

  18. LTE Systolic DFT: Performance Comparisons

  19. Fixed Size FFT: Power-of-two • Streaming (continuous data in/out) • Array size uses base-b = 4 • Altera Stratix III FPGAs (65nm technology)

  20. Variable Size FFT: Power-of-two • Transform sizes: 128/256/512/1024/2048-points • Streaming (continuous data in/out) • Run-time transform size • Array size uses base-b = 4 • Altera Stratix III FPGAs (65nm technology)

  21. Conclusion: Better FFTs are Possible • Improved performance • Algorithmic reduction in computation cycles • Localized interconnects for high clocks speeds (>500MHz for 65nm FPGA technologies) • Reduced usage of FPGA logic cells • Programmability • Throughput scalability due to the use of systolic algorithms • Higher dynamic range (smaller word lengths needed)

More Related