1 / 25

Parallel accelerator project

Parallel accelerator project. Final presentation Summer 2008 Student Vitaly Zakharenko Supervisor Inna Rivkin Duration semester. System functionality Large picture. Multiple signal sources share the same media. Each source produces a periodic pulse sequence in the media.

Download Presentation

Parallel accelerator project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel accelerator project Final presentation Summer 2008 Student VitalyZakharenko Supervisor Inna Rivkin Duration semester

  2. System functionalityLarge picture • Multiple signal sources share the same media. • Each source produces a periodic pulse sequence in the media. • Observer of the media senses superposed pulse sequences with the addition of noise. • Preprocessor detects pulses in the signal and stores each pulse as pulse TOA (time of arrival). • The pulse TOA array produced by the preprocessor is conveyed to the system. • The system separates pulses into original signals (i.e. into periodic pulse sequences).

  3. Signal produced by source # 1 Signalproducedbysource # 2 Signal as seen by observer Missing pulse effect Missing pulse effect TOA1 TOA3 TOA4 TOA5 TOA6 TOA7 TOA8 TOA9 TOA10 TOA11 TOA2 System output : pulses separated by source Data structure for signal representation TOA1 TOA1 TOA2 TOA2 TOA3 TOA3 TOA4 TOA4 TOA5 TOA5 TOA6 TOA6 TOA7 TOA7 TOA8 TOA8 TOA9 TOA9

  4. System components Simulator On a PC constructs datagrams. Datagram switch On the FPGA manages flow of datagrams between the simulator and the processing units. Data processing units On the FPGA each unit processes datagrams.

  5. Main system components Simulator PC Switch Processing unit • Processing unit • Processing unit • Processing unit • Processing unit • Processing unit FPGA

  6. Data processing units Each unit contains Nios II processor and C2H generated H/W accelerators. Nios II embeddedprocessor Avalon switch fabric Avalon switch fabric Histogram builder C2H generated accelerator Sequence search C2H generated accelerator

  7. Data processing algorithm for {level} := 1 up to {maximum level} do 1. Build histogram of differences (SDIF) of level:= {level}. 2. Add SDIF to cumulative histogram (CDIF). 3. Find lowest periodicity column of CDIF above threshold. 4. if {column found} = TRUE then 4.1. Detect all pulse sequences of the periodicity. 4.2. Mark pulses as associated. end if 5. Check whether to break the loop. end for

  8. Data processing example Source 1 signal Source 2 signal Source 3 signal Observed signal a a a a a b b b b b c c c c c

  9. Data processing example Observed signal CDIF CDIF SDIF(level = 1) Cumulative histogram (CDIF) update a a b b a a a a a b b b b b c c c c c c c

  10. Data processing example Threshold crossing check Threshold function No periodicity candidate No sequence search CDIF a b c

  11. Data processing example Observed signal a+b c+a b+c Cumulative histogram (CDIF) update SDIF(level = 2) CDIF CDIF c c a a a a a b b b b b c c c c c c+a c+a b+c b+c b a a b a+b a+b

  12. Data processing example Threshold crossing check c Threshold function CDIF No periodicity candidate No sequence search c+a b+c b a a+b

  13. Data processing example Observed signal a+b c+a b+c a+b+c Cumulative histogram (CDIF) update SDIF(level = 3) CDIF CDIF c c a a a a a b b b b b c c c c c b+c c+a c+a b+c b b a a a+b+c a+b a+b a+b+c

  14. Data processing example Threshold crossing check Threshold satisfied by periodicity (a+b+c) Search for all sequences of periodicity (a+b+c) Threshold function c CDIF b+c c+a b a a+b a+b+c

  15. Data processing example Sequence search results (final results) Detected sequence # 3 Detected sequence # 1 Detected sequence # 2

  16. ID Control Bits Len TOA 1 TOA 2 ... TOA N Input datagram format 64 bits

  17. Output datagram format Field name Size (bytes) Control fields set 2 Length 2 ID 4 Total pulses associated 2 Total sequences detected 2 Association of pulse 1 1 Association of pulse 2 1 … … Association of pulse N 1 Total pulses associated with sequence 1 4 PRI of sequence 1 4 Jitter of sequence 1 4 Confidence level 1 of sequence 1 4 Confidence level 3 of sequence 1 4 PRI of sequence 2 4 … …

  18. Implementation for Nios II Testing and profiling • In Visual Studio (VS) floating point calculations were replaced by fixed point • C code of the algorithm was ported from VS to Nios IDE • Algorithm was profiled on Nios II

  19. SoPC system generation • H/w design was generated in AlteraSoPC Builder environment

  20. SoPC system generation • Different SoPC system configurations were compared • SoPC system was optimized • multiple clock domains were provided for • interconnect was minimized • different processor types were compared

  21. C2H Acceleration • C2H h/w accelerators were generated for two blocks of the algorithm: • Sequence search function (FindSeqs) • Histogram builder function (BuildHist)

  22. C2H acceleratorsPerformance optimization • Sequence search (FindSeqs) function acceleration • Accelerator results unsatisfactory • Consumes great amount of FPGA logic • Low acceleration gain (X4 at most) • Discarded after much efforts wasted in optimization

  23. C2H acceleratorsPerformance optimization • Sequence search (BuildHist) function acceleration • Good acceleration results • X50 acceleration gain • Moderate FPGA logic consumption

  24. Design performanceFPGA resources • 6% logic consumption • 5% memory consumption

  25. Design performance Timing • 1 up to 7 ms processing time • 3 Nios systems significantly outperform Pentium 4 processor

More Related