Introduction: A Fast Fourier Transform (FFT) is an efficient algorithm for computing a Discrete Fourier Transform and its inverse. DFTs take a function in the time domain and convert it into a function in the frequency domain. MRI machines, such as the ones being developed by General Electric, use two dimensional FFTs to map incoming data to high resolution images. However, the hardware that interprets this data is always a bottleneck to faster and higher resolution imaging. The IBM Cell processor is a new take on multi-core processors. Instead of combing two or three processors, the Cell incorporates eight synergistic processing units(SPU) that are run by a power processing unit(PPU). FFTs lend themselves well to parallelism and, thus, the Cell is a good fit to attempt to speed up their computation and in turn the ability of MRI machines to work faster and better.
Senior Project – Computer Science - 2007Mapping the FFT Algorithm to the IBM Cell ProcessorAndrew PolidoreAdvisors – Prof. Burns, Joe Czechowski
Data Movement: Since the SPUs have limited memory, the input data needs to be sent in parts. Each SPU gets a designated piece of the input data which it receives in parts. Each portion of its input data has a one dimensional FFT applied to it and is then moved back to main memory to a pre-made space for output. Once all of the SPUs have finished their portion of the input data, they are synced and begin the same process on the output data they just processed. This is the second FFT and completes the two dimensional FFT of the input.
Buffering: Due to the repeated receiving and outputting of small chunks of memory, quad-buffering is necessary to maximize processing. Four buffers handle all of the data movement and FFT processing in each SPU. The buffers are: FILL which handles the incoming data, FFTin and FFTout which do the actual computation, and OUT which contains the processed data ready to be sent to main memory.
Data Striping: The 2d FFT function included in the Cell architecture is actually only a one dimensional FFT that arranges the data in the proper column form after the first transformation which makes calling the second FFT easier. However, when calling the function on each SPU the outputted “column” data is in a contiguous chunk of memory. Therefore, the data needs to be striped back to main memory as seen to the right in order to orient the data correctly.