1 / 30

Practical Modeling and Recognition using R G B - D C ameras

Practical Modeling and Recognition using R G B - D C ameras. Xiaofeng Ren, Dieter Fox Intel Labs, University of Washington. Joint work with Liefeng Bo, Kevin Lai, Peter Henry, Evan Herbst, Mike Krainin, Hao Du and others @ University of Washington. June 27, 2011.

Download Presentation

Practical Modeling and Recognition using R G B - D C ameras

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Practical Modeling and Recognition using RGB-D Cameras Xiaofeng Ren, Dieter Fox Intel Labs, University of Washington Joint work with Liefeng Bo, Kevin Lai, Peter Henry, Evan Herbst, Mike Krainin, Hao Du and others @ University of Washington June 27, 2011

  2. RGB-D Camera: Color+Depth 640x480, 30Hz, color + dense depth

  3. At RGB-D 2010 Workshop: • 3D modeling of indoor environments RGBD-ICP matching + Loop closure; Flythrough visualization • 3D modeling of everyday objects Robot in-hand modeling through real-time registration and modeling • Robust recognition of everyday objects Preliminary object dataset captured with RGB-D Preliminary results on sparse distance learning

  4. RGB-D Perception @ UW and Intel • 3D modeling of objects & environments Indoor Modeling: [Henry, Krainin, Herbst, Ren, Fox; ISER ’10] Interactive Modeling: [Hao, Henry, Ren, Fox, Seitz; Ubicomp ’11] Dynamic Scene Modeling: [Herbst, Ren, Fox; ICRA ’11, IROS ‘11] Object Manipulation: [Krainin, Henry, Ren, Fox; IJRR ’10] Interactive 3D Visualization: [Cheng, Ren; ’11] • Robust recognition of everyday objects Egocentric recognition: [Ren, Gu; CVPR ’10] Joint object-pose recognition: [Gu, Ren; ECCV ’10] Kernel Descriptors: [Bo, Ren, Fox; NIPS ’10, IROS ’11] Hierarchical Kernel Descriptors: [Bo, Lai, Ren, Fox; CVPR ’11] RGB-D Benchmark: [Lai, Bo, Ren, Fox; ICRA ’11] Sparse distance learning: [Lai, Bo, Ren, Fox; ICRA ’11] (best vision paper) Scalable and hierarchical recognition: [Lai, Bo, Ren, Fox; AAAI ’11]

  5. RGB-D Perception @ UW and Intel • 3D modeling of objects & environments Indoor Modeling: [Henry, Krainin, Herbst, Ren, Fox; ISER ’10] Interactive Modeling: [Hao, Henry, Ren, Fox, Seitz; Ubicomp ’11] Dynamic Scene Modeling: [Herbst, Ren, Fox; ICRA ’11, IROS ’11] Object Manipulation: [Krainin, Henry, Ren, Fox; IJRR ’10] Interactive 3D Visualization: [Cheng, Ren; ’11] • Robust recognition of everyday objects Egocentric recognition: [Ren, Gu; CVPR ’10] Joint object-pose recognition: [Gu, Ren; ECCV ’10] Kernel Descriptors: [Bo, Ren, Fox; NIPS ’10] Hierarchical Kernel Descriptors: [Bo, Lai, Ren, Fox; CVPR ’11] RGB-D Benchmark: [Lai, Bo, Ren, Fox; ICRA ’11] Sparse distance learning: [Lai, Bo, Ren, Fox; ICRA ’11] (best vision paper) Scalable and hierarchical recognition: [Lai, Bo, Ren, Fox; AAAI ’11]

  6. RGB-D Mapping: Pipeline

  7. [Henry-Krainin-Herbst-Ren-Fox]

  8. Comparing to Laser-based Mapping

  9. From RGB-D to Interactive Modeling [Du-Henry-Ren-Fox-Goldman-Seitz; Ubicomp 11]

  10. Discovering and Learning Objects [Herbst-Henry-Ren-Fox; ICRA 2011]

  11. Discovering and Learning Objects • (Robot) capturing scenes in RGB-D over extended period of time • 3D scene reconstruction for efficient representation • Proper sensor models for both color and depth • Pairwise scene differencing with sensor models and MRF clean-up [Herbst-Henry-Ren-Fox; ICRA 2011]

  12. Discovering and Learning Objects • Handling changed detections in multiple visits with multi-label MRF • Matching potential objects by movements and appearance • ICP for shape matching • Color image recognition with kernel descriptors • Spectral clustering for object discovery [Herbst-Ren-Fox; IROS 2011]

  13. Discovering and Learning Objects [Herbst-Ren-Fox; IROS 2011]

  14. Object Learning through Manipulation [Krainin-Henry-Ren-Fox IJRR 2011]

  15. Next-Best-View Planning [Krainin-Curless-Fox ICRA 2011]

  16. RGB-D Perception @ UW and Intel • 3D modeling of objects & environments Indoor Modeling: [Henry, Krainin, Herbst, Ren, Fox; ISER ’10] Interactive Modeling: [Hao, Henry, Ren, Fox, Seitz; Ubicomp ’11] Dynamic Scene Modeling: [Herbst, Ren, Fox; ICRA ’11, IROS ’11] Object Manipulation: [Krainin, Henry, Ren, Fox; IJRR ’10] Interactive 3D Visualization: [Cheng, Ren; ’11] • Robust recognition of everyday objects Egocentric recognition: [Ren, Gu; CVPR ’10] Joint object-pose recognition: [Gu, Ren; ECCV ’10] Kernel Descriptors: [Bo, Ren, Fox; NIPS ’10] Hierarchical Kernel Descriptors: [Bo, Lai, Ren, Fox; CVPR ’11] RGB-D Benchmark: [Lai, Bo, Ren, Fox; ICRA ’11] Sparse distance learning: [Lai, Bo, Ren, Fox; ICRA ’11] (best vision paper) Scalable and hierarchical recognition: [Lai, Bo, Ren, Fox; AAAI ’11]

  17. RGB-D Object Dataset 300 objects from 51 categories, 250,000 RGB-D views Cluttered scenes http://www.cs.washington.edu/rgbd-dataset/ (search “rgbd”+”dataset”) [Lai-Bo-Ren-Fox; ICRA 2011]

  18. Benchmarking RGB-D Recognition Category-Level Recognition (51 categories) Instance-Level Recognition (303 instances) [Lai-Bo-Ren-Fox; ICRA 2011]

  19. RGB-D Object Recognition Bag-of-Words Sparse Coding (LLC,LCC) ? Your favorite model SIFT (or HOG) Spatial Pyramid Matching (SPM) Efficient Match Kernel (EMK) Feed-forward Networks Recognition Image Patch features Image features

  20. Kernel Descriptors: Generalizing SIFT Linear kernel on SIFT descriptors = a product of two histograms = a product summed over all pairs of pixels normalized gradient magnitude gradient orientation pixel coordinates Gradient Match Kernel image patch kernels Includes SIFT as a special case Avoids any “binning” issues in histogram features [Bo-Ren-Fox; NIPS 2010]

  21. Kernel Descriptors: Image Recognition • Low-dimensional approximations of match kernels • Explicitly compute descriptors/features from patches • Easily generalize gradient features to color, binary shape, etc • Outperform SIFT and sophisticated feature learning techniques Scene-15 KDES: 86.7% SIFT: 82.2% Caltech-101 KDES: 76.4% CDBN[2]: 65.5% SPM[1]: 64.4% LCC[4]: 73.4% CIFAR10KDES: 76.0% LCC[4]: 74.5% mcRBM-DBN[3]: 71.0% TCNN[5]: 73.1% [1] Lazebnik, Schmid, Ponce, CVPR ‘06. [2] Lee, Grosse, Ranganath, Ng, ICML ‘09. [3] Ranzato & Hinton, CVPR ‘10. [4] Yu & Zhang, ICML ‘10. [5] Le, Ngiam, Chen, Chia, Koh & Ng, NIPS ‘10. [Bo-Ren-Fox; NIPS 2010]

  22. Kernel Descriptors: RGB-D Recognition Category-Level Recognition (51 categories) Instance-Level Recognition (303 instances) [Bo-Lai-Ren-Fox; CVPR 2011; IROS 2011]

  23. Toward Practical Recognition • A mug? • Kevin’s mug? • A mug facing right? • A mug with orientation (90,15,0) • … …

  24. Scalable and Hierarchical Recognition 8 discrete views continuous angles [Lai-Bo-Ren-Fox; AAAI 2011]

  25. Joint Recognition with Object-Pose Tree • Tree structure enables efficient joint recognition • Object-Pose tree outperforms nearest neighbor and 1vsA baselines • Joint tree-based learning outperforms separate learning • Promising pose estimation results on generic objects • Natural tree structure of category-instance-pose works really well RGB-D Dataset: 300 objects, 51 categories, 250,000 color-depth pairs [Lai-Bo-Ren-Fox; AAAI 2011]

  26. Application: Interactive LEGO RGB-D used for object recognition and hand tracking [Ziola-Harrison-Powledge-Lai-Bo-Ren-Fox]

  27. Application: Chess Playing Robot [Matuszek-Mayton-Aimi-Bo-Deisenroth-Chu-Kung-LeGrand-Smith-Fox]

  28. RGB-D Perception: Summary • RGB-D cameras provide synchronized color and depth, making visual perception both robust and efficient. • RGB-D mapping generates detailed 3D maps at near real-time and enables on-the-fly user interaction and feedback. • Kernel descriptors provide a principled way to extract rich features from pixel attributes, outperforming SIFT and leading to robust RGB-D recognition. • Robust RGB-D recognition and modeling enable interesting scenarios for object-aware interactions and applications.

  29. RGB-D Perception: The Future? • Will RGB-D have a deep impact on vision applications? YES! It’s already happening, faster than we can track. • Will RGB-D start a revolution in vision applications? NO.We still need to solve recognition, segmentation, tracking, scene understanding, etc. etc. YES!RGB-D helps address two BIG issues in computer vision: loss of 3D from projection; lighting conditions. RGB-D helps “abstract away” many low-level problems. • Is RGB-D the future for smart vision-based systems? Why not? At $50 today and $10 tomorrow.

  30. THANK YOU

More Related