1 / 55

Xing Mei ;  Xun Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

On Building an Accurate Stereo Matching System on Graphics Hardware. Xing Mei ;  Xun Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang  Samsung Advanced Institute of Technology, China Lab Computer Vision Workshops, 2011 IEEE. Outline. Introduction Related Works

lark
Download Presentation

Xing Mei ;  Xun Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Building an Accurate Stereo Matching System on Graphics Hardware Xing Mei ; Xun Sun ;  Mingcai Zhou ;  Shaohui Jiao ;   Haitao Wang ; Xiaopeng Zhang  Samsung Advanced Institute of Technology, China Lab Computer Vision Workshops, 2011 IEEE

  2. Outline • Introduction • Related Works • Algorithmn • CUDA Implementation • Experimental Results • Conclusion

  3. Introduction

  4. Introduction Dense two-frame stereo matching • Compute a disparity map from stereo images. • Broad applications: 3D reconstruction, view interpolation

  5. Related Works

  6. Related Works • Local methods • Compute each pixel’s disparity independently over a local support region. • Fastbutinaccurate. • Global methods • Solve the stereo problem in an energy minimization process. • Accuratebutslowdue to time-comsuming global optimizer.(GC,BP)

  7. Related Works • Propagation-based methods • Produce quasi-dense or dense disparity results from a set of seed pixels. • Relatively fast but sensitive to early wrong matches • use segmented regions as guided propagation unit • expensivecost

  8. Related Works • Introduce a simple guided unit for propagation : pixel-wise 1D line segments. • No image segmentation required here. • Simple, fast and accurate

  9. Algorithmn

  10. Algorithmn • Framework • Input: • Stereoimages Output: Disparity map

  11. Algorithmn • Input: • Stereoimages Output: Disparity map

  12. Disparity Cost Computing • Cost mesure : AD, BT, gradient-based measures, non-parametric transforms(rank/census[3])...... • Combination : SAD+gradient[6],AD + Census • AD (Absolute Distance) • Constant color assumption • Repetitive structures • Census • Encodes local image structures • Textureless regions [3] H. Hirschmuller and D. Scharstein. “Evaluation of stereo matching costs on images with radiometric differences.”IEEE TPAMI, 31(9):2009. [6] A. Klaus, M. Sormann, and K. Karner. “Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure.” ICPR,2006.

  13. AD-Census Cost Initialization + • p : pixel • d : level • >> a robust function on variable 𝑐 • pd = (x-d,y) in the right image • : Hamming distance[22] d Left I Right I [22] R. Zabih and J. Woodfill. “Non-parametric local transforms for computing visual correspondence.” In Proc. ECCV, 1994.

  14. Census Transform Census transform window :

  15. Census Hamming Distance • Left image • Right image Hamming Distance = 3 XOR

  16. AD-Census Cost Initialization + • > >> a robust function on variable 𝑐

  17. AD-Census Cost Initialization • AD-Census measure produces proper disparity results for both repetitive structures and textureless regions.

  18. Algorithmn • Input: • Stereoimages Output: Disparity map

  19. Cross-based Cost Aggregation[23] • Cross construction • Line ending points P1, P2 for P are located when rule 1 or 2 are violated: • R1: Color self-similarity in the line region: smooth depth assumption • R2: Arm length limitation: avoid over-smoothness [23] K. Zhang, J. Lu, and G. Lafruit. “Cross-based local stereo matching using orthogonal integral images.” IEEE TCSVT,2009.

  20. Cross-based Cost Aggregation

  21. Cross-based Cost Aggregation • Enhancecross construction (use pixel p’s left arm and the endpointpixel pl as an example)

  22. Cross-based Cost Aggregation • Cost aggregation • Run this step for 4 iterations to get stablecost values. • For iteration 1 and 3, aggregated horizontally and thenvertically. • For iteration 2 and 4, aggregated verticallyand then horizontally. • Reduce the errors at depth discontinuities.

  23. Cross-based Cost Aggregation • Our aggregation method can better handle large textureless regions and depth discontinuities.

  24. Cross-based Cost Aggregation [21] K.-J. Yoon and I.-S. Kweon. “Adaptive support-weight approach for correspondence search.” IEEE TPAMI, 2006. [23] K. Zhang, J. Lu, and G. Lafruit. “Cross-based local stereo matching using orthogonal integral images.” IEEE TCSVT,2009.

  25. Algorithmn • Input: • Stereoimages Output: Disparity map

  26. ScanlineOptimization[2] • 4 scanline optimization processes are performed independently. • 2 horizontal directions • 2 vertical directions [2] H. Hirschmuller. Stereo processing by semiglobal matching and mutual information.” IEEETPAMI, 2008.

  27. Scanline Optimization p p-r r • r : direction • p-r : the previous pixel along the same direction • 𝑃1, 𝑃2: penalize the disparity changes between neighboring pixels. (𝑃1 ≤𝑃2) [8] [8]S. Mattoccia, F. Tombari, and L. D. Stefano. “Stereo vision enabling precise border localization within a scanline optimization framework.” In Proc. ACCV, pages 517–527, 2007.

  28. Scanline Optimization • The final cost : • The disparity with the minimum 𝐶2value is selected as pixel p’s intermediate result.

  29. Algorithmn • Input: • Stereoimages Output: Disparity map

  30. Multi-step Disparity Refinement • Outlier Handling • Outlier Detection • Iterative Region Voting • Proper Interpolation • Depth Discontinuity Adjustment • Sub-pixel Enhancement

  31. Outlier Handling--Detection • The outliers:𝐷𝐿(p) != 𝐷R(p − (𝐷𝐿(p), 0)) • Outliers are further classified into occlusion and mismatch points • p intersect its epipolar line and𝐷Ris checked • If no intersection p is labelled as “occlusion”, otherwise “mismatch”

  32. Outlier Handling--Iterative Region Voting • Construct cross-based regions and a robust voting scheme • Sp : • 𝜏𝑆, 𝜏𝐻 : threshold values • 5 iterations d d

  33. Outlier Handling--Proper Interpolation • occlusion • The pixel with the lowest disparity value is selected for interpolation • It’s most likely comes from the background • mismatch points • The pixel with the most similar color is selected for interpolation.

  34. Depth Discontinuity Adjustment • For each pixel p on the disparity edge, two pixels p1, p2 from both sides of the edge are collected. • 𝐷𝐿(p) is replaced by 𝐷𝐿(p1) or 𝐷𝐿(p2) if one of the two pixels has smaller matching cost than 𝐶2(p,𝐷𝐿(p)). 𝐷𝐿(P1) 𝐷𝐿(P) 𝐷𝐿(P2)

  35. Sub-pixel Enhancement[20] • Quadratic polynomial interpolation • With 3*3 median filter [20] Q. Yang, L. Wang, R. Yang, H. Stewenius, and D. Nister. “Stereo matching with color-weighted correlation, hierarchical belief propagation andocclusion handling.” IEEE TPAMI, 2009.

  36. Multi-step Disparity Refinement • The average error percentages after performing each refinement step.

  37. CUDA Implementation

  38. CUDA Implementation • Compute Unified Device Architecture (CUDA) is a programming interface for parallel computation tasks on NVIDIA graphics hardware. • The computation task is coded into a kernelfunction. • The allocation of the threads is controlled with two hierarchical concepts: grid andblock. • Akernelcreates a grid with multiple blocks, and each block consists of multiple threads.

  39. CUDA Implementation • Cost Initialization: • Parallelize with 𝑊 × 𝐻 threads. • Organize into a 2D grid and the block size is set to 32× 32. • Each thread computes a cost value for a pixel at a given disparity. • Forcensus transform, a square window is require for each pixel, which requires loading more data into the shared memory for fast access.

  40. CUDA Implementation • Cross-based Cost Aggregation: • A grid with 𝑊 × 𝐻 threads. • Cross construction:block size is 𝑊 or 𝐻 toefficiently handle a scanline • Cost aggregation:block size is 32X32 • Data reuse with shared memory is considered in both steps.

  41. CUDA Implementation • Scanline Optimization: • This step is different,because the process is sequential in the scanline direction and parallel in the orthogonal direction. • 𝑊 × 𝐷 or 𝐻 × 𝐷 threads • Disparity Refinement: • 𝑊 × 𝐻 threads

  42. Experimental Results

  43. Experimental Results • Device:A PC with Core 2 Duo 2.20GHz CPU and NVIDIA GeForce GTX 480 graphics card • Settingsparameters: • Source : Middlebury http://vision.middlebury.edu/stereo/ HHI database(book arrival) Microsofy i2i database(Ilkay)

  44. Experimental Results • The GPU-friendly system brings an impressive 140× speedup. • The average proportions of the GPU running time for the four computation steps are 1%,70%,28% and 1% respectively. • The iterative cost aggregation step and the scanline optimization process dominate the running time.

  45. Experimental Results • First row: disparity maps generated with our system. • Second row: disparity error maps with threshold 1. • Errors in unoccluded and occluded regions are marked in black and gray respectively.

  46. Experimental Results

  47. Experimental Results • video

  48. Experimental Results Snapshots on ’book arrival’ stereo video

  49. Experimental Results Snapshots on ’Ilkay’ stereo video

  50. Conclusion

More Related