1 / 1

A Large-Scale Hierarchical Multi-View RGB-D Object Dataset

A Large-Scale Hierarchical Multi-View RGB-D Object Dataset. Kevin Lai 1 Liefeng Bo 1 Xiaofeng Ren 2 Dieter Fox 1,2 1 University of Washington, Seattle WA 2 Intel Labs Seattle, Seattle WA. Summary

eddy
Download Presentation

A Large-Scale Hierarchical Multi-View RGB-D Object Dataset

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Large-Scale Hierarchical Multi-View RGB-D Object Dataset Kevin Lai1 Liefeng Bo1 Xiaofeng Ren2 Dieter Fox1,2 1University of Washington, Seattle WA 2Intel Labs Seattle, Seattle WA • Summary • A large-scale, hierarchical multi-view object dataset collected using a Kinect-style (RGB-D) camera that provides both RGB and depth images. • Contains 300 objects organized into 51 categories according to WordNet hypernym / hyponym relations. • 250,000 640x480 RGB and depth image pairs. • 8 home and office scenes with ground truth object annotations. • Experiments demonstrate that combining color and depth information substantially improves quality of object recognition and localization. • Available at http://www.cs.washington.edu/rgbd-dataset Fig. 1. Sample objects from the RGB-D Object Dataset. • 2. RGB-D Object Dataset • Video sequences of each object being rotated on a motorized turntable. • Camera mounted at three different heights, making angles 30˚, 45˚, and 60˚ with the horizon. • Objects are segmented using a combination of depth and vision. • A video annotation tool that uses 3D scenes reconstructed from videos (Henry et al. ISER’10) to enable easy labeling of objects. Fig. 2. Labeling two objects in a kitchen scene using a 3D reconstruction. • 3. Object Recognition Experiments • Category (e.g. coffee mug vs. soda can) and Instance (Amelia’s coffee mug vs. Bob’s coffee mug). • Compare performance of shape features (Spin images, bounding box dimensions), visual features (SIFT, color histograms, texton histograms), and a combination of both. • Benchmark state-of-the-art classifiers on the RGB-D Object Dataset: linear SVM (linSVM), Gaussian kernel SVM (kSVM), and Random Forest (RF). Table 1. Category and instance recognition performance of various classifiers. Classifier Shape Vision All Category Recognition LinSVM 53.1±1.7 74.3±3.3 81.9±2.8 kSVM 64.7±2.2 74.5±3.1 83.8±3.5 RF 66.8±2.5 74.7±3.6 79.6±4.0 Instance (Alternating contiguous frames) LinSVM 32.4±0.5 90.9±0.5 90.2±0.6 kSVM 51.2±0.8 91.0±0.5 90.6±0.6 RF 52.7±1.0 90.1±0.8 90.5±0.4 Instance (Leave-sequence-out) LinSVM 32.3 59.3 73.9 kSVM 46.2 60.7 74.8 RF 45.5 59.9 73.1 • 4. Object Detection Experiments • Sliding window detector over image pyramid: • Features: Histogram of Oriented Gradients (HOG) on color image, HOG on depth image, and normalized depth histograms. • Linear SVM detectors trained on the RGB-D Object Dataset: Fig. 4. Object detection in a very complex scene. Contact email: kevinlai@cs.washington.edu Paper ID: 1661 This work was funded in part by an Intel grant, by ONR MURI grants N00014-07-1-0749 and N00014-09-1-1052, by the NSF under contract IIS-0812671, and through the Robotics Consortium sponsored by the U.S. Army Research Laboratory under Cooperative Agreement W911NF-10-2-0016.

More Related