1 / 17

EADS: Accelerator Project

EADS: Accelerator Project. Rohit Prakash (2003CS10186) Anand Silodia (2003CS50210). Speed up scientific application. Application. Candidate Partition. Performance Prediction. Choose next partition. 28 th January : Figure out the best algorithm of FFT

Download Presentation

EADS: Accelerator Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EADS: Accelerator Project Rohit Prakash (2003CS10186) Anand Silodia (2003CS50210)

  2. Speed up scientific application Application Candidate Partition Performance Prediction Choose next partition

  3. 28th January : Figure out the best algorithm of FFT Compare the algos on the following parameters – - Execution Time - No. of multiplications - No. of additions 19th February : Study hardware implementation of FFT. .... Time lines (tentative)

  4. radix : The "radix" is the size of an FFT decomposition twiddle factors:"Twiddle factors" are the coefficients used to combine results from a previous stage to form inputs to the next stage Terminologies

  5. First Implementation • Implemented Recursive radix-4 FFT • analysed this using gprof • Looked into other FFT implementations • iterative • parallel • split radix

  6. Analysis of the implementation • Considered FFT of 1024 random points (double) • Results from gprof -> • No. of Complex multiplications : 21760 • No. of Complex additions : 7680 • (Each complex multiplication consists of 4 real multiplications and 2 real additions) • (Each complex addition/subtraction consists of 2 real additions/subtractions)

  7. Problems with this implementation • Inefficient use of memory (recursive procedure) • Wasted computations (some factors computed multiple times) • Maximum time utilized in computing Twiddle factors (complex number multiplications)

  8. 2nd Implementation • Radix-4 iterative in-place implementation - iterativeFFT(a) BitReversal(a,A) n length(a) for(s 1 to log4(n)) // logarithm is of base 4 { do m 4s ω e2Лi/m for(k0 to n-1 by m) { do τ 1 for(j0 to m/4) { tA[k+j] u τ A[k+j+m/4] v τ2A[k+j+2*m/4] x τ3A[k+j+3*m/4] A[k+j]t+u+v+x A[k+j+m/4]t+(i)u-v-(i)x A[k+j+2*m/4]t-u+v-x A[k+j+3*m/4]t-(i)u-v+(i)x τ τ* ω } } }

  9. Analysis of this implementation • Considered FFT of 1024 random points (double) • Results from gprof -> • No. of Complex multiplications : 14080 • No. of Complex additions/subtractions : 7680 • (Each complex multiplication consists of 4 real multiplications and 2 real additions) • (Each complex addition/subtraction consists of 2 real additions/subtractions)

  10. Improvements • Precompute twiddle factors • Trade additions for multiplications • (it’s possible to multiply with 3 real multiplies and 5 real adds rather than usual 4 real multiplies and 2 real adds) • use compiler flags (10%-15% execution time on some systems) • -O3 • -march=pentiumpro • -ffast-math • -fomit-frame-pointer

  11. Some results • Precomputing twiddle factors: • No. of multiplications : 8960 • 5120 less multiplications (complex) • Trading multiplications for additions • Did not show any appreciable decline in execution time • Using compiler flags • Drastic improvement in execution time

  12. Comparative Analysis

  13. Further enhancements possible • Use higher radix – 8,16,32, etc. • Use split-radix or Winograd algorithms • If data is real, we can have great improvements • Use Fast Bit-Reversal method (IEEE D.M.W. Evans)

  14. Resources • Rivest, Cormen • Numerical Recipes in C • IEEE papers • Conversion of Digit-Reversed to Bit-Reversed order in FFT algorithms (Panos E. and C.S. Burrus) • The Design and Implementation of FFTW3 (Matteo Frigo and Steven G. Johnson) • cnx.org • Other fft implementations on the net • Best: fftw

  15. Thank You

More Related