Parallel beam back projection implementation
Download
1 / 23

Parallel Beam Back Projection: Implementation - PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on

Parallel Beam Back Projection: Implementation. Srdjan Coric Miriam Leeser Eric Miller. Outline. Annapolis Wildstar “Simple Architecture” algorithm datapath Performance Results Parallelism extraction “Advanced Architecture 4x” datapath Performance Results Implementation issues

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Parallel Beam Back Projection: Implementation' - jamar


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Parallel beam back projection implementation

Parallel Beam Back Projection:Implementation

Srdjan Coric

Miriam Leeser

Eric Miller


Outline

  • Annapolis Wildstar

  • “Simple Architecture”

    • algorithm

    • datapath

    • Performance

    • Results

  • Parallelism extraction

  • “Advanced Architecture 4x”

    • datapath

    • Performance

    • Results

    • Implementation issues

  • Future directions


Sinogram data address generation

Sinogram data retrieval

Sinogram data prefetch

Linear

interpolation

Data

accumulation

Data

read

Data

write

Data Flow


LUT1 starting position

Critical error-accumulation path

LUT1 quantization error

Bit reduction error

LUT2 quantization error

LUT3 quantization error

5

10

.

LUT1:

15

1

.

LUT2:

15

.

2

LUT3:

Interpolation factor errorCorner starting position



Performance Results: Software vs. FPGA Hardware

  • Software - Floating point - 450 MHz Pentium : ~ 240 s

  • Software - Floating point - 1 GHz Dual Pentium : ~ 94 s

  • Software - Fixed point - 450 MHz Pentium : ~ 50 s

  • Software - Fixed point - 1 GHz Dual Pentium : ~ 28 s

  • Hardware - 50 MHz : ~ 5.4 s

Parameters: 1024 projections

1024 samples per projection

512*512 pixels image

9-bit sinogram data

3-bit interpolation factor


Original image

Hardware output image

Zoom: ~200%

Grayscale range < Pixel value range

(heart features in focus)


Original image

Hardware output image

Zoom: ~200%

Grayscale range < Pixel value range

(lung features in focus)



Memory bandwidth requirements at 50 MHz (for data accumulation)

Case 1: 0.4 GB/s

Case 2: 1.6 GB/s

Case 3: 0.4 GB/s

Memory bandwidth limit

1.2 GB/s

Parallelism Issues

Case 1:

No parallelism extracted

Case 2:

Pixel level parallelism extracted

Case 3:

Projection level parallelism extracted

Projections

Image

columns

V1

Image

rows

V3

V2

T~k1*V1

T~k1*V2

T~k2*V3

k1 <k2, V2 =V3 =V1 /4, T=Execution time


Simple Architecture accumulation)

Advanced Architecture - Data Path

projection parallelism extracted


Performance Results: accumulation)Software vs. FPGA Hardware

  • Software - Floating point - 450 MHz Pentium : ~ 240 s

  • Software - Floating point - 1 GHz Dual Pentium : ~ 94 s

  • Software - Fixed point - 450 MHz Pentium : ~ 50 s

  • Software - Fixed point - 1 GHz Dual Pentium : ~ 28 s

  • Hardware - 50 MHz : ~ 5.4 s

  • Hardware (Advanced Architecture) - 50 MHz : ~ 1.3 s

Parameters: 1024 projections

1024 samples per projection

512*512 pixels image

9-bit sinogram data

3-bit interpolation factor


Implementation Issues accumulation)

- fanout -

prj_num(3)

fanout = 1565 !

routing delay = 7.913 ns (~39.99%)


Implementation Issues accumulation)

- fanout -

odd_2_A_4[4]

fanout = 144 !


Memory bridges stuff
Memory Bridges Stuff accumulation)

3 architectures implemented:

  • “Simple Architecture” = non-parallel (on slide 6)

  • “Advanced Architecture” = 4-way parallel (slide 12)

  • “Bridge Free Advanced Arch” =

    as B but contains no memory bridges (all design buffers in BlockRAMs) from PCI bus to memory banks required for Host-Memory communication. Bridges are separate design that is downloaded before (after) design C is downloaded so that input data can be stored to (output data read from) memories on the WildStar board.

    Virtex1000 resource utilization:

  • 11% logic, 90% BlockRAMs (with bridges)

  • 39% logic, 100% BlockRAMs

  • 21% logic, 100% BlockRAMs


Floorplan of the accumulation)

“Bridge Free Advanced Architecture”

(design C on the previous slide)


Future Directions accumulation)

  • Graduate


ad