1 / 8

Using Vector Capabilities of GPUs to Accelerate FFT

Using Vector Capabilities of GPUs to Accelerate FFT. Vasily Volkov and Brian Kazian CS 258 Spring 2008. Sun Niagara II Specs. 8 SPARC Cores @ 1.4 GHz (up to 8 threads each) 16K Instruction/8K Data Caches 4MB shared L2 Cache One FPU per core Four dual-channel FBDIMM Memory Controllers

twarner
Download Presentation

Using Vector Capabilities of GPUs to Accelerate FFT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Vector Capabilities of GPUs to Accelerate FFT Vasily Volkov and Brian Kazian CS 258 Spring 2008

  2. Sun Niagara II Specs • 8 SPARC Cores @ 1.4 GHz (up to 8 threads each) • 16K Instruction/8K Data Caches • 4MB shared L2 Cache • One FPU per core • Four dual-channel FBDIMM Memory Controllers • Theoretical limit of 11 Gflops/s for the 8 FPU’s • Extremely large memory bandwidth (60 GB/s)

  3. FFT On Niagara • Decided to install and benchmark with the FFTW library • Very similar in execution to CUFFT • Offers competitive performance on variety of platforms • Compiled on Niagara II with pthreads enabled • Uses double precision as opposed to G80’s single

  4. Single FFT Comparison

  5. FFTW with Built-in Threading

  6. Batched FFTW

  7. Hybrid FFTW

  8. Results • Found that the Hybrid gave best results • Tune thread count for problem size • Limited by the number of threads in comparison to CUDA • Issues with data alignment in cache • Not stellar performance out of the box with FFTW

More Related