Parallel beam back projection implementation
Download
1 / 23

Parallel Beam Back Projection: Implementation - PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on
  • Presentation posted in: General

Parallel Beam Back Projection: Implementation. Srdjan Coric Miriam Leeser Eric Miller. Outline. Annapolis Wildstar “Simple Architecture” algorithm datapath Performance Results Parallelism extraction “Advanced Architecture 4x” datapath Performance Results Implementation issues

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Parallel Beam Back Projection: Implementation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Parallel Beam Back Projection:Implementation

Srdjan Coric

Miriam Leeser

Eric Miller


Outline

  • Annapolis Wildstar

  • “Simple Architecture”

    • algorithm

    • datapath

    • Performance

    • Results

  • Parallelism extraction

  • “Advanced Architecture 4x”

    • datapath

    • Performance

    • Results

    • Implementation issues

  • Future directions


Sinogram data address generation

Sinogram data retrieval

Sinogram data prefetch

Linear

interpolation

Data

accumulation

Data

read

Data

write

Data Flow


LUT1 starting position

Critical error-accumulation path

LUT1 quantization error

Bit reduction error

LUT2 quantization error

LUT3 quantization error

5

10

.

LUT1:

15

1

.

LUT2:

15

.

2

LUT3:

Interpolation factor errorCorner starting position


“Simple Architecture” Datapath


Performance Results: Software vs. FPGA Hardware

  • Software - Floating point - 450 MHz Pentium : ~ 240 s

  • Software - Floating point - 1 GHz Dual Pentium : ~ 94 s

  • Software - Fixed point - 450 MHz Pentium : ~ 50 s

  • Software - Fixed point - 1 GHz Dual Pentium : ~ 28 s

  • Hardware - 50 MHz : ~ 5.4 s

Parameters:1024 projections

1024 samples per projection

512*512 pixels image

9-bit sinogram data

3-bit interpolation factor


Original image

Hardware output image

Zoom: ~200%

Grayscale range < Pixel value range

(heart features in focus)


Original image

Hardware output image

Zoom: ~200%

Grayscale range < Pixel value range

(lung features in focus)


Original image - Hardware output image


Memory bandwidth requirements at 50 MHz (for data accumulation)

Case 1:0.4 GB/s

Case 2: 1.6 GB/s

Case 3:0.4 GB/s

Memory bandwidth limit

1.2 GB/s

Parallelism Issues

Case 1:

No parallelism extracted

Case 2:

Pixel level parallelism extracted

Case 3:

Projection level parallelism extracted

Projections

Image

columns

V1

Image

rows

V3

V2

T~k1*V1

T~k1*V2

T~k2*V3

k1 <k2, V2 =V3 =V1 /4, T=Execution time


Simple Architecture

Advanced Architecture - Data Path

projection parallelism extracted


Performance Results: Software vs. FPGA Hardware

  • Software - Floating point - 450 MHz Pentium : ~ 240 s

  • Software - Floating point - 1 GHz Dual Pentium : ~ 94 s

  • Software - Fixed point - 450 MHz Pentium : ~ 50 s

  • Software - Fixed point - 1 GHz Dual Pentium : ~ 28 s

  • Hardware - 50 MHz : ~ 5.4 s

  • Hardware (Advanced Architecture) - 50 MHz : ~ 1.3 s

Parameters:1024 projections

1024 samples per projection

512*512 pixels image

9-bit sinogram data

3-bit interpolation factor


Implementation Issues

- fanout -

prj_num(3)

fanout = 1565 !

routing delay = 7.913 ns (~39.99%)


Implementation Issues

- fanout -

odd_2_A_4[4]

fanout = 144 !


Memory Bridges Stuff

3 architectures implemented:

  • “Simple Architecture” = non-parallel (on slide 6)

  • “Advanced Architecture” = 4-way parallel (slide 12)

  • “Bridge Free Advanced Arch” =

    as B but contains no memory bridges (all design buffers in BlockRAMs) from PCI bus to memory banks required for Host-Memory communication. Bridges are separate design that is downloaded before (after) design C is downloaded so that input data can be stored to (output data read from) memories on the WildStar board.

    Virtex1000 resource utilization:

  • 11% logic, 90% BlockRAMs (with bridges)

  • 39% logic, 100% BlockRAMs

  • 21% logic, 100% BlockRAMs


Floorplan of the

“Bridge Free Advanced Architecture”

(design C on the previous slide)


Future Directions

  • Graduate


ad
  • Login