Fft accelerator project
Download
1 / 28

FFT Accelerator Project - PowerPoint PPT Presentation


  • 103 Views
  • Uploaded on

FFT Accelerator Project. Rohit Prakash (2003CS10186) Anand Silodia (2003CS50210). September 27 th ,2007. Overview. Multiprocessor Implementation Problems faced Solutions Results FPGA IO Work done Problems faced Possible solutions. MultiprocessorFFT: Problems.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'FFT Accelerator Project' - nailah


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Fft accelerator project

FFT Accelerator Project

Rohit Prakash (2003CS10186)

Anand Silodia (2003CS50210)

September 27th,2007


Overview
Overview

  • Multiprocessor Implementation

    • Problems faced

    • Solutions

    • Results

  • FPGA IO

    • Work done

    • Problems faced

    • Possible solutions


Multiprocessorfft problems
MultiprocessorFFT: Problems

  • The previous code worked for some inputs but not all

  • The program seemed to communicate well but still error prone

  • Lots of segmentation faults (even after getting the results)

    • Serial debugger does not work

    • Commercial debuggers available, but evaluation is restricted to single IP, 30 days


Suggested solutions lam mpi google groups
Suggested solutions (lam-mpi/google groups)

  • “Execution Environment does not match the compile environment”

  • Same code worked with MPICH version 2, GCC

  • Complex datatype NOT supported in C version (but MPI_2COMPLEX seemed to work for me)

  • Finally changed the code in C++ using complex <float> and MPI::COMPLEX (this worked)


System info identical for all
System Info (Identical for all)

  • Machine 1: Saveri

  • Machine 2: Abhogi

  • Machine 3: Sahana

  • Machine 4: Jaunpuri

  • Sysinfo :

    • Intel Pentium 4, 3.4 GHz

    • Cache Size: 2048KB

    • RAM 1GB

    • Operating System : Fedora Core 6

    • Compiler : mpic++

    • Flags: -O3 –march=pentium4

    • FFT : radix 2


Theoretical execution time
Theoretical Execution time

  • For p processors, the total execution time is :

    (TN/p) + (1 – 1/p)(2N/B + KN)

    • p is a power of 2

  • TN is the time taken to compute the FFT of input size N

  • KN is the time taken to combine two N-point FFT’s

  • B is the network bandwidth (bytes/sec)


  • Nature of this function
    Nature of this function

    • Sum of two functions –

      • (TN/p)

      • (1 – 1/p)(2N/B + KN)

    • When (TN/p) dominates

    • When (1 – 1/p)(2N/B + KN) dominates














    Inference
    Inference

    • Input of 33554432 is a kind of breakeven point (thereafter we start getting speedup)

    • Below this point

      • the execution time increases with the increase in # processors

      • the %age communication time decreases as the #processors increase

    • Above this point

      • the execution time decreases with the increase in #processors

      • the %age communication time increases as the #processors decreases


    Possible errors
    Possible errors

    • Measuring real time which is affected by the load on a particular processor

    • Network Communication latency affects the time taken to establish a synchronous handshake

    • The pipeline is actually not “perfect”


    4 processor pipelined layout
    4 processor pipelined layout

    Send(2)

    P4

    Recv(2)

    FFT(N/4)

    Send(1)

    Recv(1)

    FFT(N/4)

    P3

    Recv(4)

    Combine

    Send(1)

    Recv(1)

    Send(4)

    FFT(N/4)

    P2

    Recv(3)

    Recv(1)

    Combine

    Send(2)

    Send(3)

    FFT(N/4)

    Combine

    P1

    (KN/2B)

    (N/2B)

    (N/2B)

    (N/4B)

    (TN/4)

    (N/4B)

    (KN/4B)

    Time taken by these can surpass the boundaries


    Further work
    Further Work

    • Rewrite the code with new data type in C

    • Optimize the code

    • Try with more processors ?

    • Analyze using profilers ?


    Fpga pci io
    FPGA: PCI IO

    • Built and ran admxrc2 demos

    • Studied the wrapper and vhdl codes

    • Struct ADMXRC2_SPACE_INFO

      • The VirtualBase member is the address, in the application's address space, by which the region may be accessed using pointers.


    Mapping to logical space
    Mapping to logical space

    • All the demo vhdl codes have been written using the names of the standard card signals as inputs and outputs

    • This approach makes the vhdl code card-dependent


    Fpga next step
    FPGA: Next step

    • There exists another approach that uses ADMXRC2_Read and ADMXRC2_Write API calls

    • See which of the two approaches is more useful and work with it

    • DMA code of Parikshit Patidar (work on Hardware Accelerator for Ray Tracing)


    References
    References

    • ADM-XRC-II user manual

    • www.forums.xilinx.com

    • www.fpga-faq.org



    ad