1 / 31

Accelerating MATLAB Image Processing Toolbox Functions on GPUs

Accelerating MATLAB Image Processing Toolbox Functions on GPUs. Jingfei Kong , Martin Dimitrov , Yi Yang, Janaka Liyanage , Lin Cao, Jacob Staples, Mike Mantor , Huiyang Zhou. Motivation.

sakura
Download Presentation

Accelerating MATLAB Image Processing Toolbox Functions on GPUs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accelerating MATLAB Image Processing Toolbox Functions on GPUs Jingfei Kong, Martin Dimitrov, Yi Yang, JanakaLiyanage, Lin Cao, Jacob Staples, Mike Mantor, Huiyang Zhou

  2. Motivation • With high memory bandwidth and teraflops computing capability, Graphics Processor Units (GPUs) become quite attractive for accelerating general purpose applications • Developing high-performance GPU programs, however, requires deep understanding of both application algorithms and GPU hardware architecture • A systematic way of dealing with a generic class of applications is missing University of Central Florida

  3. Our Contributions • Compare performance-critical hardware features in different GPUs • Develop high-quality open-source library code for some representative functions in MATLAB™ Image Processing Toolbox (IPT) • https://sites.google.com/site/iptatiproject/ [15] • Reveal insights on efficiently accelerating a wide range of image processing algorithms University of Central Florida

  4. Presentation Outline • Motivation • Our Contributions • Implication of GPU hardware on GPGPU programming • A GPGPU library for IPT functions • categorization and optimization strategies • Case Studies • 2D convolution • dither • Conclusions University of Central Florida

  5. Implication of GPU hardware on GPGPU programming University of Central Florida

  6. Implication of GPU hardware on GPGPU programming University of Central Florida

  7. Implication of GPU hardware on GPGPU programming University of Central Florida

  8. Implication of GPU hardware on GPGPU programming University of Central Florida

  9. Implication of GPU hardware on GPGPU programming University of Central Florida

  10. Implication of GPU hardware on GPGPU programming University of Central Florida

  11. Summary of the LibraryMATLAB Image Processing Toolbox (IPT) Function Classification University of Central Florida

  12. MATLAB IPT Function Classification and Optimization Strategies • Characteristics: straightforward one on one mapping, abundant parallelism • Strategies: effectively utilize bandwidth by packing multiple pixels, perform multiple such light-weight tasks if possible to amortize the CPU-GPU data transfer overhead University of Central Florida

  13. MATLAB IPT Function Classification and Optimization Strategies • Characteristics: still one on one mapping, but there is an overlapping over input pixels for computing adjacent output pixel • Strategies: data reuse, computation reuse University of Central Florida

  14. MATLAB IPT Function Classification and Optimization Strategies • Characteristics: lack of explicit parallelism • Strategies: re-think algorithms, explore inherent parallelism University of Central Florida

  15. MATLAB IPT Function Classification and Optimization Strategies • Characteristics: lack of explicit parallelism, sequential nature with data dependency and fine-grain communication requirements • Strategies: give it a shot and you might have some surprise University of Central Florida

  16. Summary of the LibraryPerformance Comparison against MATLAB CPU (single-threaded) University of Central Florida

  17. Summary of the LibraryPerformance Comparison against MATLAB CPU (single-threaded) University of Central Florida

  18. Summary of the LibraryPerformance Comparison against MATLAB CPU (single-threaded) University of Central Florida

  19. Summary of the LibraryPerformance Comparison against MATLAB CPU (single-threaded) University of Central Florida

  20. 2D Convolution Overview input pixels 3 x 3 filter output pixels 1 1 2 1 2 3 4 1 1 5 6 1 55 7 2 1 8 1 9 University of Central Florida

  21. 2D Convolution Overview • Drag the filter over the each pixel of the source image and multiply and accumulate the overlapped input elements to generate an output pixel. filter pixel Input Image University of Central Florida

  22. 2D Convolution: Intra-Thread Data Reuse Thread i • Each thread computes multiple pixels along the column • Intra-Thread reuse: • For a 7x7 filter we reuse each input pixel up to 7 times Thread i Thread i Input Image University of Central Florida 22

  23. 2D Convolution: Inter-Thread Data Reuse threads 1 2 3 0 • Threads in the same warp/wavefront access the same row. • Inter-thread reuse • The row is fetched into texture cache/shared memory and reused by different threads on subsequent accesses. Reused row in texture cache/shared memory Input Image University of Central Florida

  24. 2D Convolution Performance A 4096 x 4096 image with a 7 x 7 filter • Jacket ‘s: • around 20 GFLOPS on GTX 280 • Jacket 1.2.2 trial version (released on 1/4/2010) from Accelereyes® • Ours: • around 350 GFLOPS on GTX 280 • around 733 GFLOPS on HD 5870 University of Central Florida

  25. Data Dependent Case Study: Dither University of Central Florida

  26. Dither input pixels output pixels Error = 230 – 128 = 102 230 < 128? error 230 0/1? 1 University of Central Florida

  27. Dither – Data Dependency i+j i j pixel at (i, j) University of Central Florida

  28. ... 2 5 7 1 3 4 6 8 4 7 9 3 5 6 8 10 6 9 11 5 7 8 10 12 8 11 13 7 9 10 12 14 10 13 15 9 11 12 14 16 12 15 17 11 13 14 16 18 14 17 19 13 15 16 18 20 15 16 17 18 19 20 21 22 From P. Metaxas [8] Dither – Parallel Processing Schedule ... University of Central Florida

  29. 1 3 2 4 5 4 5 7 6 8 7 8 9 10 11 10 11 13 12 14 Dither – Our GPU Implementation 1 3 4 2 5 4 5 A relatively small amount of thread blocks/threads are active at any given time • low resource utilization • synchronization overhead (among thread blocks/threads) We still get up to 10.3x kernel speedup and 3.5x overall speedup! University of Central Florida

  30. Conclusions • We identify performance-critical hardware features for GPGPU programs • We present our experience and optimization strategies in developing high performance GPU code for functions from MATLAB Image Processing Toolbox University of Central Florida

  31. Our Open-source Library Project Website https://sites.google.com/site/iptatiproject/ [15] You are more than welcome to contribute! Thank you and Questions? University of Central Florida

More Related