Fft accelerator project
1 / 13

FFT: Accelerator Project - PowerPoint PPT Presentation

  • Uploaded on

FFT: Accelerator Project. Rohit Prakash Anand Silodia. Work done till now. Studied various FFT algorithms Implemented radix-4, recursive and iterative algorithms Optimized these Compared the results with FFTW RESULT- FFTW fares better than our implementation. Current Objectives.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'FFT: Accelerator Project' - redell

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Fft accelerator project

FFT: Accelerator Project

Rohit Prakash

Anand Silodia

Work done till now
Work done till now

  • Studied various FFT algorithms

  • Implemented radix-4, recursive and iterative algorithms

  • Optimized these

  • Compared the results with FFTW


  • FFTW fares better than our implementation

Current objectives
Current Objectives

  • Validate the number of complex calculations in our implementation with theoretical number of computations

  • Document the work done till now

  • Make a website of the project

  • Study FFTW code (also figure out the reasons for its efficiency)

  • Run the code on intel compiler (icc)/ visual c++

Validating the computations
Validating the computations

  • Incorrect theoretical formula (cnx.org)

  • Theoretical formula (for no. of complex computations) =

    (11/4)*nlog4(n) =8960 (Correct)

    (3/4)*nlog4(n) = 3840 (Incorrect)

    Actual 8960

Documentation and website
Documentation and website

  • Website of the project –

    • www.cse.iitd.ac.in/~cs1030186/btp

  • Includes the details and results of our experimentations (till last week)

Running on intel compiler icc
Running on intel compiler icc

  • No improvement

  • Possible reasons –

    • Tested on Intel Pentium Mobile

    • This does not support optimizations like exploiting SSE3 instructions (-fast flag)

Fftw code
FFTW code

  • 56,489+ LOC (contains code written in Ocaml and C)

  • We decided to study why FFTW is so fast (before going into the code itself)

  • Text we came across in this context –

    • Design and implementation of FFTW3 (Matteo Frigo and Steven G. Johnson)

    • Documentation of FFTW

Why is fftw fast
Why is FFTW fast?

  • The transform is computed by an executor, composed of highly optimized, composable blocks of C code called codelets

    • At runtime, a ‘planner’ finds an efficient way to compose codelets: it measures the speed of different plans and chooses the best using a dynamic programming algorithm

    • The executor interprets the plan with negligible overhead

    • Codelets are generated automatically and are fast


  • The executor implements the recursive divide and conquer Cooley Tukey FFT algorithm

  • Basically, it adapts to hardware in order to maximize performance

  • ‘Performance has little to do with the number of operations.Fast code must exploit instruction level parallelism of the processor. It is important to write the code in such a way that C compiler can schedule it efficiently’


  • It uses some tricky optimizations like –

  • It also exploits SIMD instructions

Further plan
Further plan ?

  • Since FFTW supports MPI and adapts itself to the given hardware architecture, we may use it as it is.


  • www.fftw.org

  • The Design and Implementation of FFTW3 (Matteo Frigo and Steven G. Johnson)

  • The Fastest Fourier Transform in the West (Matteo Frigo and Steven G. Johnson)