The FFT on a GPU

1 / 25

# The FFT on a GPU - PowerPoint PPT Presentation

The FFT on a GPU. Graphics Hardware 2003 July 27, 2003 Kenneth Moreland Edward Angel Sandia National Labs U. of New Mexico.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'The FFT on a GPU' - kirk

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### The FFT on a GPU

Graphics Hardware 2003

July 27, 2003

Kenneth Moreland Edward Angel

Sandia National Labs U. of New Mexico

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

Overview
• Introduction
• Motivation, FFT review.
• FFT Techniques
• Exploitable FFT properties.
• Implementation
• Results
• Performance, applications, conclusions.

Graphics Hardware 2003

Motivation
• The Fourier transform is a principal tool for digital image processing.
• Filtering.
• Correction.
• Compression.
• Classification.
• Generation.
• As such, should not our graphics hardware support such a tool?

Graphics Hardware 2003

The Discrete Fourier Transform
• Converts data in the spatial or temporal domain into frequencies the data comprise.

Graphics Hardware 2003

DFT

IDFT

The Discrete Fourier Transform
• 2D transform can be computed by applying the transform in one direction, then the other.

Graphics Hardware 2003

The Fast Fourier Transform
• Divide and Conquer Algorithm
• Input sequence is divided into subsequences consisting of values from even and odd indices, respectively.

Graphics Hardware 2003

Index Magic
• Do not use recursion.
• Use dynamic programming: iterate over entire array computing all values for each recursive depth together, like mergesort.
• Indexing is non-obvious.
• Unlike mergesort, recursive step does not divide array into contiguous chunks.
• At any iteration, what partition does a given index belong to, and where can one find the applicable values of the sub-partitions?

Graphics Hardware 2003

Index Magic
• Common solution: rearrange data by reversing the bits of indices.
• FFT can occur with contiguous partitions.
• Requires an extra data copy.
• Our solution, determine indexing in place.

Note that the paper has a typo.

Graphics Hardware 2003

Fourier Symmetry of Real Sequences
• In general, the frequency spectra of even real functions contain imaginary values.
• Captures magnitude and phase shift of sinusoids.
• Brute force FFT doubles computation and storage costs.
• But, Fourier transforms of real functions have symmetry.
• Values at and are real (because they are conjugates with themselves).

Graphics Hardware 2003

Fourier Transform of Real Functions
• Pick two functions, let them be f(x) and g(x).
• Let h(x) = f(x) + j g(x).
• Note that there is no loss of information.
• Can perform FFT of h in half the time as performing the brute force FFT of f and g individually.
• Simply point to one row of image as real components and another as imaginary components.

f

g

Graphics Hardware 2003

Untangling Fourier Transform Pairs
• Fourier transform is linear.
• H(u) = F(u) + j G(u)
• We can “untangle” using symmetry of F and G.
• Add and subtract H(u) and H(N – u) to cancel out conjugate terms of F and G.

Graphics Hardware 2003

Real Values

Imaginary Values

Packing Transforms of Real Functions
• We can store Fourier transform in an array the same size as the input.
• Throw away conjugate duplicates.
• Throw away imaginary values known to be zero.

Graphics Hardware 2003

Column-wise FFT
• We have two columns with real values.
• Use same “tangled” approach.
• All other columns are complex numbers.
• Use regular FFT.

Real

Real

Paired for

Complex

Graphics Hardware 2003

Packing 2D Transforms of Real Functions
• Rows transformed from complex values are already packed appropriately.
• The two rows transformed from real values are untangled and packed to follow suite.

Real Values

Imaginary Values

Graphics Hardware 2003

Available Resources
• nVidia GeForce FX 5800 Ultra.
• Full 32-bit floating point pipeline and frame buffers.
• Fully programmable vertex and fragment units.
• Cg
• High level language for vertex and fragment programs.
• Traditional CPU: 1.7 GHz Intel Zeon
• Freely available high performance FFT implementations.

Graphics Hardware 2003

Implementation
• Using a SIMD model for parallel computation.
• Draw quadrilateral parallel to screen.
• Rasterizer invokes the same fragment program “in parallel” over all pixels covered by quadrilateral.
• Inputs/output dependent on location of pixel the fragment program is running.
• We require many rendering passes.
• Use “render to texture” extension.
• Use two frame buffers: one for retrieving values of last pass and one for storing results of current computation.

Graphics Hardware 2003

Imaginary

Tangled

Imaginary

Tangled

Real

Tangled

Real

Tangled

Scale

Real

G

Scale

Imag.

G

Pass

Real

G

Pass

Imag.

G

Real

F

Imag.

F

Real

F

Imag.

F

Real, Tangled

Real

Untangled

Imag., Tangled

Imaginary

Untangled

Real, Tangled

Real

Untangled

Imag., Tangled

Imaginary

Untangled

I, F

Scale

I, G

Scale

I, F

Pass

I, G

Pass

R, F

R, G

R, F

R, G

Implementation

FFT

Untangle

FFT

Untangle

Frequency Spectra

Images

FFT

Untangle

FFT

Untangle

Graphics Hardware 2003

Fragment Programs
• Written in Cg, compiled for GeForce FX.

Graphics Hardware 2003

Applications
• Digital image filtering.

Graphics Hardware 2003

Applications
• Texture generation.
• Volume rendering.

Graphics Hardware 2003

Performance
• Computation speed: 2.5 GigaFLOPS
• Texture read rate: 3.4 GB/sec

Graphics Hardware 2003

Conclusions
• The Fourier transform on the GPU has many potential applications.
• A well established FFT on the CPU (FFTW) still has an edge over GPU implementation.
• Both software and hardware of GPU are first generations.
• Room for improvement.

Graphics Hardware 2003

Get the Cg Code