1 / 18

CS 584

CS 584. Fast Fourier Transform. Used in many scientific applications Transforms a periodic signal into the frequency spectrum of the signal. FFT. Given a sequence <X[0], X[1], … X[n-1]> Transform into <Y[0], Y[1], … Y[n-1]> Where. O(n 2 ). FFT.

natala
Download Presentation

CS 584

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 584

  2. Fast Fourier Transform • Used in many scientific applications • Transforms a periodic signal into the frequency spectrum of the signal

  3. FFT • Given a sequence <X[0], X[1], … X[n-1]> • Transform into <Y[0], Y[1], … Y[n-1]> • Where • O(n2)

  4. FFT • In 1965 Cooley and Tukey showed that the FFT equation could be evaluated in O(n log n) operations, resulting in: - - ( / 2 ) 1 ( / 2 ) 1 n n å å ki ki ~ ~ = w + w + w i Y [ i ] X [ 2 k ] X [ 2 k 1 ] = = 0 0 k k

  5. FFT Procedure RecursiveFFT(X, Y, n, w) if (n == 1) Y[0] = X[0] else RecursiveFFT(<X[0],X[2],…X[n-2]>, <Q[0],Q[1],…Q[n/2]>, n/2, w2); RecursiveFFT(<X[1],X[3],…X[n-1]>, <T[0], T[1],… T[n/2]>, n/2, w2); for i = 0 to n-1 Y[i] = Q[i mod (n/2)] + wi * T[i mod (n/2)]; end Optimization Opportunity

  6. FFT Something looks familiar?

  7. Parallelization of FFT • Parallelize by looking at the data patterns • Two algorithms • Binary Exchange • Matrix Transpose

  8. Binary Exchange FFT

  9. Binary Exchange FFT • Data exchange takes place between all pairs of processors that differ by one bit. • One element per processor • Easy • Multiple elements per processor • Assign contiguous blocks to processors • Same algorithm, just exchange blocks

  10. Binary Exchange FFT

  11. Binary Exchange FFT • As n increases so does communication • Big bandwidth requirement • Powers of w cannot be precalculated • wi is used at different times on different processors • Duplicated computation

  12. The Transpose FFT • Assume that sqrt(n) is a power of 2 • The data is arranged in a sqrt(n) x sqrt(n) two-dimensional square array

  13. The Transpose FFT

  14. Parallelization of Transpose FFT • Notice • First two iterations are columnwise • Last two iterations are rowwise • Rather than do an exchange • Transpose the matrix halfway through algorithm

  15. The Transpose FFT

  16. The Transpose FFT • Transposition of a striped partitioned array requires all-to-all communication • Would it be less expensive to just follow through with the algorithm or do the transpose?

  17. Which is better? • It Depends • Architecture and amount of data play together to create tradeoffs. • Transpose algorithm is easy to generalize to higher dimensions

  18. Which is better?

More Related