1 / 42

Video-Based In Situ Tagging on Mobile Phones

Wonwoo Lee, Youngmin Park, Vincent Lepetit , Woontack Woo IEEE TRANSACTIONS ON CURCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 10, OCTOBER 2011. Video-Based In Situ Tagging on Mobile Phones. Outline. Introduction Online Target Learning Detection and Tracking

lluvia
Download Presentation

Video-Based In Situ Tagging on Mobile Phones

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wonwoo Lee, Youngmin Park, Vincent Lepetit, Woontack Woo IEEE TRANSACTIONS ON CURCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 10, OCTOBER 2011 Video-Based In Situ Tagging on Mobile Phones

  2. Outline • Introduction • Online Target Learning • DetectionandTracking • Experimental Results • Conclusion

  3. Introduction • Objective : Augment a real-world scene with minimal user intervention on a mobile phone. “Anywhere Augmentation” • Considerations: • Avoid reconstruction of 3D scene • Perspective patch recognition • Mobile phone processing power • Mobile phone accelerometers • Mobile phone Bluetooth connectivity • http://www.youtube.com/watch?v=Hg20kmM8R1A

  4. Introduction • The proposed method follows a standardprocedure of target learning and detection.

  5. Introduction • The proposed method follows a standardprocedure of target learning and detection

  6. Online Target Learning • Input: Image of the target plane • Output: Patch data and camera poses • Assumptions • Known camera parameters • Horizontal or vertical surface

  7. Online Target Learning

  8. Frontal View Generation • We need a frontal view to create the patch data and their associated poses. Targets whose frontal views are available.

  9. Frontal View Generation • However, frontal views are not alwaysavailable in the real world. Targets whose frontal views are NOT available.

  10. Frontal View Generation • Objective : Fronto-parallel view image from the input image. • Approach : Exploit the phone’s built-in accelerometer. • Assumption:Patch is on horizontal or vertical surface.

  11. Frontal View Generation • The orientation of a target (H / V) is recommended based on the current pose of the phone. π/4 Vertical Horizontal Parallel to Ground Horizontal -π/4 G (detected by acceleromaeter)

  12. Frontal View Generation Under the 1 degree of freedom assumption • Frontal view camera: [I|0] • Captured view camera: [R|c] • T = -Rc • Function to warp image to virtual frontal view.[12] [12] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. Cambridge, U.K.: Cambridge Univ. Press, 2000.

  13. Frontal View Generation

  14. Online Target Learning

  15. Blurred Patch Generation • Objective: Learn the appearances of a target surface fast. • Approach : Adopt the approach of patch learning in ”Gepard” [6] • Real-time learning of a patch on the desktop computer. [6] S. Hinterstoisser, V. Lepetit, S. Benhimane, P. Fua, and N. Navab, “Learning real-time perspective patch rectification,” Int. J. Comput. Vis., vol. 91, pp. 107–130, Jan. 2011.

  16. Review: Gepard[6] • Fast patch learning by linearizing image warping with principal component analysis. • “Mean patch” as a patch descriptor. • Difficult to directly apply to mobile phone platform. • Low performance of mobile phone CPU • Large amount of pre-computed data is required (about 90MB)

  17. Modified Gepard[6] • Remove need for fronto-parallel view • Using phone’s accelerometers and limiting to 2 planes • Skip the Feature Point Detection step • Instead use larger patches for robustness • Replace how templates are constructed • By blurring instead • Added Bluetooth sharing of AR configuration

  18. Blurred Patch Generation • Approach: Use blurred patch instead of mean patch

  19. Blurred Patch Generation • Generate blurred patches through multi-pass rendering in a GPU. • Faster image processing through a GPU’s parallelism.

  20. Blurred Patch Generation • 1st Pass: Warping • Render the input patch from a certain viewpoint • Much faster than on CPU

  21. Blurred Patch Generation • 2nd Pass: Radial blurring to the warped patch • Allow the blurred patch covers a range of poses close to the exact pose

  22. Blurred Patch Generation • 3rd Pass: Gaussian blurring to the radial-blurred patch • Make the blurred patch robust to image noise

  23. Blurred Patch Generation • Fig. 7. Effectiveness of radial blur. Combining the radial blur and the Gaussian blur outperforms simple Gaussian blurring.

  24. Blurred Patch Generation • 4th Pass: Accumulation of blurred patches in a texture unit. • Reduce the number of readback from GPU memory to CPU memory

  25. Online Target Learning

  26. Post-Processing • Downsamplingblurred patches • (128x128) to (32x32) • Normalization • Zero mean and Standard Deviation of 1

  27. Detection & Tracking • User points the target through the camera. • Square patch at the center of the image is used for detection.

  28. Detection & Tracking • Initial pose is retrieved by comparing the input patch with the learned mean patches. • ESM-Blur[20] is applied for further pose refinement. • NEON instructions are used for faster pose refinement. [20] Y. Park, V. Lepetit, and W. Woo, “ESM-blur: Handling and renderingblur in 3D tracking and augmentation,” in Proc. Int. Symp. MixedAugment. Reality, 2009, pp. 163–166.

  29. Experimental Results • Patch size: 128 x 128 • Number of views used for learning: 225 • Maximum radial blur range: 10 degrees • Gaussian blur kernel: 11x11 • Memory requirement: 900 KB for a target

  30. Experimental Results

  31. Experimental Results

  32. Experimental Results

  33. Experimental Results

  34. Experimental Results

  35. Experimental Results • More views, more rendering. • Slow radial blur due on the mobile phone. • Possible speed improvement through shader optimization. iPhone 3GS iPhone 4 PC

  36. Experimental Results • Comparison with Gepard[6] Fig. 11. Planar targets used for evaluation. (a) Sign-1. (b) Sign-2. (c) Car. (d) Wall. (e) City. (f) Cafe. (g) Book. (h) Grass. (i) MacMini. (j) Board. The patches delimited by the yellow squares are used as a reference patch. [6] S. Hinterstoisser, V. Lepetit, S. Benhimane, P. Fua, and N. Navab,“Learning real-time perspective patch rectification,” Int. J. Comput. Vis.,vol. 91, pp. 107–130, Jan. 2011.

  37. Experimental Results • Our approach performs slightly worse in terms of recognition rates, but it is better adapted to mobile phones. • Our approach performs slightly worse in terms of recognition rates, but it is better adapted to mobile phones.

  38. Experimental Results • The mean patches comparison takes about 3ms with 225 views. • The speed of pose estimation and tracking with ESM-Blur dependon the accuracy of the initial pose provided by patch detection.

  39. Limitations • Weak to repetitive textures and reflective surfaces. • Currently single target only.

  40. Conclusion • Potential applications • AR tagging on the real world • AR apps “anywhere anytime” • Future work • More optimization on mobile phones • Detection of multiple targets at the same time

  41. Video http://www.youtube.com/watch?v=DLegclJVa0E

  42. ~Thank you for your listening~

More Related