1 / 14

Image Reconstruction on Multicore Processors

Image Reconstruction on Multicore Processors. Graduate Students Eric Fontaine and Viraj Paropkari Faculty Members: Ada Gavrilovska and Hsien-Hsin S. Lee. Agenda. Background FDK algorithm Overview Parallelization Method Current Results Katsevich Algorithm Overview

Download Presentation

Image Reconstruction on Multicore Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Image Reconstruction on Multicore Processors Graduate Students Eric Fontaine and Viraj Paropkari Faculty Members: Ada Gavrilovska and Hsien-Hsin S. Lee

  2. Agenda • Background • FDK algorithm • Overview • Parallelization Method • Current Results • Katsevich Algorithm • Overview • Parallelization Method • Current Results • Future Plans

  3. Background • Use 3-D CT scan to identify tumors and other defects inside the body. • Two common methods • MRI • Complex math and physics • Main function ─ Simple IFFT • Filtered back-projection • Two common filtered back-projection algorithms • FDK • Approximation, fast • Use projections taken on a circular path surrounding the object • More accurate on the plane containing the circle • Katsevich • More accurate, but also more compute-intensive • Use projections taken on a helical path surrounding the object • It can reconstruct long objects, unlike the original FDK. • Both contain large data parallelism

  4. FDK Algorithm Overview • Cone beam image reconstruction with source on a helix for a flat detector • Reconstruction for 3-D volume • Initialize the helix source parameters • Compute/load cone beam data • Length correction weighting • 1-D horizontal filtering • Linear Pre-interpolation • Back projection • Compare Results with standard phantom

  5. Parallelization Strategy • Based on FDK algorithm for general scanning paths like helix.* • Each thread is assigned a subset of the total number of projections, and performs length correction weighting, filtering and back-projections of its assigned projections. • After all threads are done, there is an implicit barrier necessary for synchronization. Then each thread is assigned a subset of the total volume to reconstruct. • We use OpenMP • Reconstruct subsets of the total volume in parallel (to fit into individual cache) • Piece the image together at the end (reduced inter-core communication) Length correction weighting, filtering, back-projection Assign Projections barrier Reconstructed Image Length correction weighting, filtering, back-projection *Ge Wang, Tein-Hsiang Lin, Ping-chin Cheng, and Douglas M. Shinozaki. A general cone-beam reconstruction algorithm. IEEE Trans. On Medical Imaging, 12(3):486-496, September 1993

  6. Slowdown Single and Dual-Thread Performance Speedup of dual-thread OpenMP code Performance (Seconds)

  7. FDK Analysis for Memory Behavior Statistics of Single Thread Statistics of Two Threads

  8. Katsevich Algorithm Overview • Reconstructs a 3-D cylindrical volume exactly from 2-D projections.[1] • The inputs are projections (b) taken from a helical path surrounding the volume of interest (a). • Implemented the Noo method [2]: • These projections are differentiated and weighted appropriately (c). • These undergo a 1-D Hilbert transform along the κ-lines. • First undergo remapping to κ-line coordinates (d). • Perform 1-D convolution w/ filter kernel (e). • Return to projection coordinates by remapping (f). • To reconstruct the 3-D volume (g), each voxel’s coordinates is back projected the source projections • The cumulative sum is taken for all projections belonging to the PI-interval containing that voxel. • Used similar parallelization strategy to FDK • Each thread processes a subset of the projections. • After synchronization, each thread reconstructs a subset of the total volume. (a) (b) (c) (d) (e) (f) (g) [1] Alexander Katsevich, "Theoretically exact FBP-type inversion algorithm for spiral CT", Society for Industrial and Applied Mathematics Journal on Applied Mathematics, 62:2012-2026, 2002. [2] F. Noo, J. Pack, and D. Heuscher, “Exact helical reconstruction using native cone-beam geometries,” Physics in Medicine and Biology, vol. 48, pp. 3787–3818, 2003.

  9. Results • Using Intel Core2 Duo @ 2.66 GHz. • Close to 2x speedup

  10. Image Quality 512^3 Reconstruction 512 Projections per Turn, 512x64 size projections 512^3 original Phantom

  11. Benchmark • Compared against the published timing results in [3], which used 64-bit AMD Opteron processors. • Unable to determine exact parameters used by author of [3], so the comparison may be questionable. [3] Deng, J., Yu, H., Ni, J., He, T., Zhao, S., Wang, L., and Wang, G. 2006. A Parallel Implementation of the Katsevich Algorithm for 3-D CT Image Reconstruction. J. Supercomput. 38, 1 (Oct. 2006), 35-47.

  12. Optimizations Used • Majority of time spent during backprojection and determining the PI-intervals. • PI-intervals are constant for a particular helix. • PI-intervals are precomputed and saved to a file. • Only necessary to precompute PI-intervals for one horizontal slice. • PI-intervals for different horizontal slices can be determined by rotation. • Easy ~25% speedup

  13. Optimizations Used • Next focused on backprojection inner loop. • Removed trival lookup tables to save cache space. • ~10% speedup. • Used sin, cos lookup tables • ~15% speedup. • Moved if statements for smoothing the ends of the PI-interval outside the loop. • Duplicated inner loop code. • ~10% speedup. • Removed if statements required for bounds testing the backprojected coordinates. • Needed to add extra row and column slack to projection data. • ~3% speedup.

  14. Future work • Explore memory layout to reduce cache misses and page faults. • Implement the same algorithms on Cell processor for competitive analysis.

More Related