1 / 20

A Parallel Implementation of MSER detection

A Parallel Implementation of MSER detection. GPGPU Final Project Lin Cao. Review. Invariant to affine transformation, such as rotation, translation, and scale change; Denotes a set of stable connected components that

tannar
Download Presentation

A Parallel Implementation of MSER detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Parallel Implementation of MSER detection GPGPU Final Project Lin Cao

  2. Review Invariant to affine transformation, such as rotation, translation, and scale change; Denotes a set of stable connected components that are detected in gray scale image;

  3. Review • MSER is a stable Connected Component of thresholded image • All pixels inside the MSER have higher or lower intensities than in the surrounding regions • Regions are selected to be stable over intensity range

  4. Sequential and Parallel Approach Sequential { Parallel { bucketSort(); buildDirectedGraph( ); Find ( ); blockReduction( ); Union( ); parentCompression( ); Update( ); // already get regions GetRegion( ); computeVariation( ); computeVariation( ); findRoot( ); leastVariation( ); } } leastVariation( );

  5. buildDirectedGraph A parent’s value of each pixel should no less than its current value. local memory: visited, members Shared memory

  6. buildDirectedGraph Memory Usage: local memory: visited, members Shared memory Also process edge for next step

  7. 16*16, 8*8 Block Reduction

  8. 16*16, 8*8 Block Reduction

  9. 16*16, 8*8 Block Reduction

  10. log 24 Block Reduction log 22 totally 3 iterations are needed

  11. Load edge information to each pixel Block Reduction If (horizontal_pixelUpdate)

  12. History buffer Block Reduction

  13. Parent Compression Shared memory based on parent locality

  14. FindRegion • FindRoot, so that we can process each region’s tree respectively • Find region’s parent and child based on the delta, so that variation can be computed. • var = (area(parent) – area(child))/area(current region); • Send the region information to CPU • Scan every region’s tree, find the minival variation, which is MSER regions. • Filter the region

  15. Performance Analysis • For 256*256 image,

  16. Performance Analysis • For 1024*768 image,

  17. Performance Analysis Why 8*8 better than 16*16? • local memory usage • recursion times • block execution • block reduction times • parent locality

  18. Performance Analysis GPU vs CPU timing • intermidiate values • Synchronization • record information • memory transfer

  19. Conclusion • Very large data dependancy, still can be solved. • Should be suitable to multicore microprocessor, whose individual core is strong enough than the single thread in GPU. • The bottenleck is still memory.

  20. Future Work • More efficient block reduction. (decoder and encoder) Memory random access GPU code effciency

More Related