Loading in 5 sec....

FFT Accelerator ProjectPowerPoint Presentation

FFT Accelerator Project

- 103 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' FFT Accelerator Project' - nailah

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Overview

- Multiprocessor Implementation
- Problems faced
- Solutions
- Results

- FPGA IO
- Work done
- Problems faced
- Possible solutions

MultiprocessorFFT: Problems

- The previous code worked for some inputs but not all
- The program seemed to communicate well but still error prone
- Lots of segmentation faults (even after getting the results)
- Serial debugger does not work
- Commercial debuggers available, but evaluation is restricted to single IP, 30 days

Suggested solutions (lam-mpi/google groups)

- “Execution Environment does not match the compile environment”
- Same code worked with MPICH version 2, GCC
- Complex datatype NOT supported in C version (but MPI_2COMPLEX seemed to work for me)
- Finally changed the code in C++ using complex <float> and MPI::COMPLEX (this worked)

System Info (Identical for all)

- Machine 1: Saveri
- Machine 2: Abhogi
- Machine 3: Sahana
- Machine 4: Jaunpuri
- Sysinfo :
- Intel Pentium 4, 3.4 GHz
- Cache Size: 2048KB
- RAM 1GB
- Operating System : Fedora Core 6
- Compiler : mpic++
- Flags: -O3 –march=pentium4
- FFT : radix 2

Theoretical Execution time TN is the time taken to compute the FFT of input size N KN is the time taken to combine two N-point FFT’s B is the network bandwidth (bytes/sec)

- For p processors, the total execution time is :
(TN/p) + (1 – 1/p)(2N/B + KN)

- p is a power of 2

Nature of this function

- Sum of two functions –
- (TN/p)
- (1 – 1/p)(2N/B + KN)

- When (TN/p) dominates
- When (1 – 1/p)(2N/B + KN) dominates

Inference

- Input of 33554432 is a kind of breakeven point (thereafter we start getting speedup)
- Below this point
- the execution time increases with the increase in # processors
- the %age communication time decreases as the #processors increase

- Above this point
- the execution time decreases with the increase in #processors
- the %age communication time increases as the #processors decreases

Possible errors

- Measuring real time which is affected by the load on a particular processor
- Network Communication latency affects the time taken to establish a synchronous handshake
- The pipeline is actually not “perfect”

4 processor pipelined layout

Send(2)

P4

Recv(2)

FFT(N/4)

Send(1)

Recv(1)

FFT(N/4)

P3

Recv(4)

Combine

Send(1)

Recv(1)

Send(4)

FFT(N/4)

P2

Recv(3)

Recv(1)

Combine

Send(2)

Send(3)

FFT(N/4)

Combine

P1

(KN/2B)

(N/2B)

(N/2B)

(N/4B)

(TN/4)

(N/4B)

(KN/4B)

Time taken by these can surpass the boundaries

Further Work

- Rewrite the code with new data type in C
- Optimize the code
- Try with more processors ?
- Analyze using profilers ?

FPGA: PCI IO

- Built and ran admxrc2 demos
- Studied the wrapper and vhdl codes
- Struct ADMXRC2_SPACE_INFO
- The VirtualBase member is the address, in the application's address space, by which the region may be accessed using pointers.

Mapping to logical space

- All the demo vhdl codes have been written using the names of the standard card signals as inputs and outputs
- This approach makes the vhdl code card-dependent

FPGA: Next step

- There exists another approach that uses ADMXRC2_Read and ADMXRC2_Write API calls
- See which of the two approaches is more useful and work with it
- DMA code of Parikshit Patidar (work on Hardware Accelerator for Ray Tracing)

References

- ADM-XRC-II user manual
- www.forums.xilinx.com
- www.fpga-faq.org

Download Presentation

Connecting to Server..