Loading in 5 sec....

FFT in Hardware and SoftwarePowerPoint Presentation

FFT in Hardware and Software

- 129 Views
- Uploaded on
- Presentation posted in: General

FFT in Hardware and Software

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

FFT in Hardware and Software

- Core Algorithm
- Original Algorithm, the DFT, O(n2) complexity
- New Algorithm, the FFT (Fast Fourier Transform), O(nlog2(n)) depending on implementation.

- A summation over the whole input array for every single element in the output array.
- A VERY computationally inefficient algorithm to implement.

- A much more computationally efficient algorithm
- Works using the divide and conquer principle.
- First developed by Cooley and Tukey in 1965!

- Butterfly arrangement of computations
- Repeated on successive pairs of input data
- Then half as many times on alternating pairs
- Then half again as many times on every fourth element
- …

xe[n]

X[n]

WnN

xo[n]

X[n+N/2]

-WnN

- Simple operations repeated many times

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

- Even more speed for FFT
- Extremely parallelizable
- A whole layer can be done in two FPGA clock cycles
- 1 multiply cycle
- 1 add cycle
- (Assuming sufficient multipliers)

- Complexity
- Input speed
- Output speed
- If the FPGA takes 24.4ns but takes 20s to transfer the input data, what gain is there?
- i.e. 24.4ns + 20s + 20s = ~40s!

- Use a faster bus
- AMD Opteron’s Hypertransport
- 20.8 GB/s (166.4 Gb/s) per Link (V. 3)
- Modules that fit into an AMD 64-bit Opteron Socket
- http://www.drccomputer.com/pages/modules.html - xilinx based module
- http://www.xtremedatainc.com/xd1000_brief.html - altera based module

- AMD Opteron’s Hypertransport

- Put the FPGA on the die with the DSP
- Need silicon vendor support
- FPGA can access memory on a very wide bus (i.e. 128 bits per cycle)

- Implement the entire project in FPGA
- Time consuming to program
- Possibly insufficient room on the FPGA

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Input Array

Output

Multiplication by W factor Addition

- Each butterfly must be done sequentially
- Only slight parallelism enabled by a DSP like the TigerSHARC
- Each Butterfly can be done in 2 cycles (after optimization).

- Linear Profiling of FFT Algorithm in C++

- Profiling of VHDL on FPGA
- Butterfly takes 24.377ns to execute
- 62% is computational, 38% is routing on FPGA

- Most DSP Vendors
- Many FPGA Vendors (IP – Intellectual Property)
- Microcontroller Vendors (i.e. Blackfin)
- FFTW – The Fastest Fourier Transform in the West
- AMD Math Core Library
- Intel Library
- Highly Optimized for the expected hardware

- The Radix 4 version delivers a 1 K points complex processing time of 25 microseconds at 200-MHz system speeds and uses only about 10 percent of the resources in a mid-range Stratix device. The Radix 2 is half the size of the Radix 4 and offers a 1 K points complex processing time of 50 microseconds at 200-MHz system speeds. Additional versions of the new cores are under development. [6]

[1] Signals Systems and Transforms

[2] James W. Cooley and John W. Tukey, "An algorithm for the machine calculation of complex Fourier series," Math. Comput.19, 297–301 (1965).

[3] http://www.drccomputer.com/pages/modules.html - xilinx based module

[4] http://www.xtremedatainc.com/xd1000_brief.html - altera based module

[5] http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_2353,00.html

[6] http://www.us.design-reuse.com/news/news5650.html

[7] http://www.4dsp.com/fft.htm