1 / 39

From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains

From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains. Baochen Sun and Kate Saenko UMass Lowell. Conventional Approach : collect training images. Back pack model. Test Images. Bicycle model. Our Approach : generate 2D detector directly from 3D model.

karis
Download Presentation

From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains Baochen Sun and Kate Saenko UMass Lowell

  2. Conventional Approach: collect training images Back pack model Test Images Bicycle model Our Approach: generate 2D detector directly from 3D model Virtual Back pack model Back pack model Adapt Bicycle model Virtual Bicycle model

  3. Motivation

  4. Where is “mug”? Detector fails! Solution? Solution #1: Get more training data from the web: Problem: limited data, rare poses

  5. Where is “mug”? Detector fails! Solution? Solution #1: Get more training data from the web: Solution #2: Synthetic data from crowdsourced CAD models • many models available • any pose possible! …but, can we generate realistic images?

  6. real or fake?

  7. real fake

  8. CAD models + Photorealistic Rendering Project Cars • Example images

  9. Bad news  • photorealism is hard to achieve • must model BRDF, albedo, lighting, etc. • also model scene: cast shadows, etc. • most free models don’t • Good news  • we don’t need photorealism! • model the information we have, ignore the rest* • model: 3D shape, pose • ignore: scene, textures, color *obviously, this will not work for all objects, but it will work for many

  10. Conventional Approach: collect training images Back pack model Test Images Bicycle model Our Approach: generate 2D detector directly from 3D model Virtual Back pack model Back pack model Adapt Bicycle model Virtual Bicycle model

  11. Virtual Data Generation

  12. Trimble 3D Warehouse: 1-12

  13. Example models • Collected models for 31 objects in “Office” dataset • Selected 2 models per category manually from first page of results

  14. Virtual data generation • For each CAD model, 15 random poses were generated by rotating the original 3D model by a random angle from 0 to 20 degrees in each of the three axes • Note, we do not attempt to model multiviewcomponents in this paper, so pose variation is intentionally small • Then two types of background and texture are applied to each pose to get the final 2D virtual image: Virtual or Virtual-Gray

  15. Two types of virtual data generated random background and texture from ImageNet gray texture and white background “Virtual” “Virtual-Gray”

  16. Conventional Approach: collect training images Back pack model Test Images Bicycle model Our Approach: generate 2D detector directly from 3D model Virtual Back pack model Back pack model Adapt Bicycle model Virtual Bicycle model

  17. Virtual-to-Real Adaptation

  18. Goal: model what we have, ignore the rest specularity light spectrum light direction 3D shape color viewpoint background camera noise, etc… Question: How to ignore difference between resulting non-realistic images and real test images? Answer: domain adaptation

  19. Detection approach • We start with the standard discriminative scoring function for image I and some region b • Keep regions with high score • In our case, is learned via Linear Discriminant Analysis (LDA) • features are HOG (could use others)

  20. Hariharan, Malik and Ramanan, ECCV 2012 • Inspired by this work • Assume generative model: • Class-conditional feature distribution • Assume covariance S is same for all classes • Then classifier is efficiently computed as BharathHariharan, Jitendra Malik, and Deva Ramanan. Discriminative decorrelation for clustering and classification. In Computer Vision–ECCV 2012.

  21. Main idea Source points x: labeled virtual images Target points u: unlabeled real-domain images

  22. Main idea (d) decorrelated target (c) target decorrelated with source covariance (a) source (b) source decorrelated with source covariance

  23. Main idea • if T is target covariance, S is source covariance, then • Note, this corresponds to different whitening operation • Also, if source and target are the same, this reduces to

  24. Main idea • if T is target covariance, S is source covariance, then • Note, this corresponds to different whitening operation • Also, if source and target are the same, this reduces to

  25. Effect of mismatched statistics

  26. Bike model example Synthetic vs. Real Synthetic “bike” performs better on webcam target images!

  27. Experiments

  28. Data Training: Test: webcam *[Saenko et al, ECCV2010]

  29. Results: compare to [10] [10] BharathHariharan, Jitendra Malik, and Deva Ramanan. Discriminative decorrelation for clustering and classification. In Computer Vision–ECCV 2012.

  30. Results: compare to [7] [10] BharathHariharan, Jitendra Malik, and Deva Ramanan. Discriminative decorrelation for clustering and classification. In Computer Vision–ECCV 2012. [7] Daniel Goehring, Judy Hoffman, Erik Rodner, Kate Saenko, and Trevor Darrell. Inter- active adaptation of real-time object detectors. In International Conference on Robotics and Automation (ICRA), 2014.

  31. Training on Imagenet

  32. Related Work

  33. Related work:Synthetic data in Computer Vision • Kaneva’11: local features • Optic Flow benchmark Evaluation of Image Features Using a Photorealistic VirtualWorld BilianaKaneva, Antonio Torralba, William T. Freeman, ICCV 2011

  34. Related work:CAD models for Object Detection Wohlkinger, Walter, et al. "3DNet: Large-scale object class recognition from CAD models." Robotics and Automation (ICRA), 2012 IEEE International Conference on. IEEE, 2012. Point cloud test data Stark, Michael, Michael Goesele, and BerntSchiele. "Back to the Future: Learning Shape Models from 3D CAD Data." BMVC. Vol. 2. No. 4. 2010. limited to cars

  35. Future work

  36. Next Steps: more factors • Generate multi-component models • Pose variation • Intra-category variation • Analytic model of illumination? • illumination manifolds can be computed from just 3 linearly independent sources (Nayar and Murase ‘96) • Model of other reconstructive factors?

  37. Next steps: “go deep” rCNN 60,000 images of 20 categories

  38. Thanks!

More Related