Parallel beam back projection implementation
Sponsored Links
This presentation is the property of its rightful owner.
1 / 23

Parallel Beam Back Projection: Implementation PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on
  • Presentation posted in: General

Parallel Beam Back Projection: Implementation. Srdjan Coric Miriam Leeser Eric Miller. Outline. Annapolis Wildstar “Simple Architecture” algorithm datapath Performance Results Parallelism extraction “Advanced Architecture 4x” datapath Performance Results Implementation issues

Download Presentation

Parallel Beam Back Projection: Implementation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Parallel Beam Back Projection:Implementation

Srdjan Coric

Miriam Leeser

Eric Miller


Outline

  • Annapolis Wildstar

  • “Simple Architecture”

    • algorithm

    • datapath

    • Performance

    • Results

  • Parallelism extraction

  • “Advanced Architecture 4x”

    • datapath

    • Performance

    • Results

    • Implementation issues

  • Future directions


Sinogram data address generation

Sinogram data retrieval

Sinogram data prefetch

Linear

interpolation

Data

accumulation

Data

read

Data

write

Data Flow


LUT1 starting position

Critical error-accumulation path

LUT1 quantization error

Bit reduction error

LUT2 quantization error

LUT3 quantization error

5

10

.

LUT1:

15

1

.

LUT2:

15

.

2

LUT3:

Interpolation factor errorCorner starting position


“Simple Architecture” Datapath


Performance Results: Software vs. FPGA Hardware

  • Software - Floating point - 450 MHz Pentium : ~ 240 s

  • Software - Floating point - 1 GHz Dual Pentium : ~ 94 s

  • Software - Fixed point - 450 MHz Pentium : ~ 50 s

  • Software - Fixed point - 1 GHz Dual Pentium : ~ 28 s

  • Hardware - 50 MHz : ~ 5.4 s

Parameters:1024 projections

1024 samples per projection

512*512 pixels image

9-bit sinogram data

3-bit interpolation factor


Original image

Hardware output image

Zoom: ~200%

Grayscale range < Pixel value range

(heart features in focus)


Original image

Hardware output image

Zoom: ~200%

Grayscale range < Pixel value range

(lung features in focus)


Original image - Hardware output image


Memory bandwidth requirements at 50 MHz (for data accumulation)

Case 1:0.4 GB/s

Case 2: 1.6 GB/s

Case 3:0.4 GB/s

Memory bandwidth limit

1.2 GB/s

Parallelism Issues

Case 1:

No parallelism extracted

Case 2:

Pixel level parallelism extracted

Case 3:

Projection level parallelism extracted

Projections

Image

columns

V1

Image

rows

V3

V2

T~k1*V1

T~k1*V2

T~k2*V3

k1 <k2, V2 =V3 =V1 /4, T=Execution time


Simple Architecture

Advanced Architecture - Data Path

projection parallelism extracted


Performance Results: Software vs. FPGA Hardware

  • Software - Floating point - 450 MHz Pentium : ~ 240 s

  • Software - Floating point - 1 GHz Dual Pentium : ~ 94 s

  • Software - Fixed point - 450 MHz Pentium : ~ 50 s

  • Software - Fixed point - 1 GHz Dual Pentium : ~ 28 s

  • Hardware - 50 MHz : ~ 5.4 s

  • Hardware (Advanced Architecture) - 50 MHz : ~ 1.3 s

Parameters:1024 projections

1024 samples per projection

512*512 pixels image

9-bit sinogram data

3-bit interpolation factor


Implementation Issues

- fanout -

prj_num(3)

fanout = 1565 !

routing delay = 7.913 ns (~39.99%)


Implementation Issues

- fanout -

odd_2_A_4[4]

fanout = 144 !


Memory Bridges Stuff

3 architectures implemented:

  • “Simple Architecture” = non-parallel (on slide 6)

  • “Advanced Architecture” = 4-way parallel (slide 12)

  • “Bridge Free Advanced Arch” =

    as B but contains no memory bridges (all design buffers in BlockRAMs) from PCI bus to memory banks required for Host-Memory communication. Bridges are separate design that is downloaded before (after) design C is downloaded so that input data can be stored to (output data read from) memories on the WildStar board.

    Virtex1000 resource utilization:

  • 11% logic, 90% BlockRAMs (with bridges)

  • 39% logic, 100% BlockRAMs

  • 21% logic, 100% BlockRAMs


Floorplan of the

“Bridge Free Advanced Architecture”

(design C on the previous slide)


Future Directions

  • Graduate


  • Login