# FFT: Accelerator Project - PowerPoint PPT Presentation

1 / 13

FFT: Accelerator Project. Rohit Prakash Anand Silodia. Work done till now. Studied various FFT algorithms Implemented radix-4, recursive and iterative algorithms Optimized these Compared the results with FFTW RESULT- FFTW fares better than our implementation. Current Objectives.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

FFT: Accelerator Project

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## FFT: Accelerator Project

Rohit Prakash

Anand Silodia

### Work done till now

• Studied various FFT algorithms

• Implemented radix-4, recursive and iterative algorithms

• Optimized these

• Compared the results with FFTW

RESULT-

• FFTW fares better than our implementation

### Current Objectives

• Validate the number of complex calculations in our implementation with theoretical number of computations

• Document the work done till now

• Make a website of the project

• Study FFTW code (also figure out the reasons for its efficiency)

• Run the code on intel compiler (icc)/ visual c++

### Validating the computations

• Incorrect theoretical formula (cnx.org)

• Theoretical formula (for no. of complex computations) =

(11/4)*nlog4(n) =8960 (Correct)

(3/4)*nlog4(n) = 3840 (Incorrect)

Actual 8960

### Documentation and website

• Website of the project –

• www.cse.iitd.ac.in/~cs1030186/btp

• Includes the details and results of our experimentations (till last week)

### Running on intel compiler icc

• No improvement

• Possible reasons –

• Tested on Intel Pentium Mobile

• This does not support optimizations like exploiting SSE3 instructions (-fast flag)

### FFTW code

• 56,489+ LOC (contains code written in Ocaml and C)

• We decided to study why FFTW is so fast (before going into the code itself)

• Text we came across in this context –

• Design and implementation of FFTW3 (Matteo Frigo and Steven G. Johnson)

• Documentation of FFTW

### Why is FFTW fast?

• The transform is computed by an executor, composed of highly optimized, composable blocks of C code called codelets

• At runtime, a ‘planner’ finds an efficient way to compose codelets: it measures the speed of different plans and chooses the best using a dynamic programming algorithm

• The executor interprets the plan with negligible overhead

• Codelets are generated automatically and are fast

### Contd…

• The executor implements the recursive divide and conquer Cooley Tukey FFT algorithm

• Basically, it adapts to hardware in order to maximize performance

• ‘Performance has little to do with the number of operations.Fast code must exploit instruction level parallelism of the processor. It is important to write the code in such a way that C compiler can schedule it efficiently’

### Contd…

• It uses some tricky optimizations like –

• It also exploits SIMD instructions

### Further plan ?

• Since FFTW supports MPI and adapts itself to the given hardware architecture, we may use it as it is.

### References

• www.fftw.org

• The Design and Implementation of FFTW3 (Matteo Frigo and Steven G. Johnson)

• The Fastest Fourier Transform in the West (Matteo Frigo and Steven G. Johnson)