1 / 33

3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger

3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger. Questions. What makes a good 3D scene model? How accurate does it need to be? How far can you get with automatic surface detection? Where do you need human input?. Modelling the scene.

carver
Download Presentation

3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger

  2. Questions • What makes a good 3D scene model? How accurate does it need to be? • How far can you get with automatic surface detection? Where do you need human input?

  3. Modelling the scene • Real scenes have way too many surfaces

  4. Modelling the scene • Option 1: Diorama world

  5. Tour Into the Picture (TIP)‏ • Model the scene as 5 planes + foreground objects • Easy implementation: planes/objects defined by humans Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997

  6. TIP Implementation • User defines vanishing point, rear wall of the scene (inner rectangle)‏ • Given some assumptions about the camera, position/size of all planes can be computed... Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997

  7. Defining the box • Define planes: Floor -> y=0, Ceiling -> y=H • Given horizon (vanishing point), corners of floor, ceiling can be computed from 2D image position Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997

  8. Defining the box • Once the positions of the planes are known, compute the texture of the planes Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997

  9. What about foreground objects? • Assume a quadrangle attached to floor, compute attachment points, upper points • Hierarchical model of foreground objects Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997

  10. Extracting foreground objects • Foreground objects removed, added to mask • Holes in background filled in using photo completion software Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997

  11. TIP Demonstration

  12. TIP Discussion • Pros: • Accurate model (due to human input)‏ • Deals with foreground objects, occlusions • Cons: • Requires human input, not automatic • Model too simple for many real-world scenes

  13. Modelling the scene • Option 2: Pop-up book world

  14. Automatic Photo Pop-Up • Three classes of surface: ground, sky, vertical • Not just a box: can model more kinds of scenes • Automatic classification, no labeling D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005.

  15. Photo Pop-Up Implementation • Pixels -> superpixels -> constellations • Automatic labeling of constellations as ground, vertical, or sky • Define angles of vertical planes (using attachment to ground)‏ • Map textures to vertical planes (as in TIP)‏ D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005.

  16. Superpixels, constellations • Superpixels are neighboring pixels that have nearly the same color (Tao et al, 2001)‏ • Superpixels assigned to constellations according to how likely they are to share a label (ground, vertical, sky) based on difference between feature vectors

  17. Feature vectors • Color features: RGB, hue, saturation • Texture features: Difference of oriented Gaussians, Textons • Location (absolute and percentile)‏ • N superpixels in constellation • Line and intersection detectors • Not used: constellation shape (contiguous, N sides), some texture features

  18. Training process • For each of 82 labeled training images • Compute superpixels, features, pairwise likelihoods • Form a set of N constellations (N = 3 to 25), each labeled with ground truth • Compute constellation features • Compute constellation label, homogeneity likelihood:

  19. Training process • Adaboost weak classifiers learn to estimate whether superpixels have same label (based on feature vector)‏ • Another set of Adaboost week classifiers learns constellation label, homogeneity likelihood (expressed as percent ground, vertical, sky, mixed)‏ • Emphasis on classifying larger constellations

  20. Building the 3D model • Along vertical/ground boundary, fit line segments (Hough transform) – goal is to find simplest shape (fewest lines)‏ • Project lines up from corners of boundary lines, cut and fold D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005.

  21. Photo Pop-Up Demonstration D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005.

  22. Photo Pop-Up Discussion • Pros: • Automatic • Can handle a variety of scenes, not just boxes • Cons: • No handling of foreground objects • Misclassification leads to very strange models • Only 2 kinds of surface: ground, vertical D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005.

  23. Modelling the scene • Option 3: Actually try to model surface angles

  24. 3D Scene Structure from Still Image • Compute surface normal for each surface • No right-angle assumptions; surfaces can have any angle • Automatic (trained on images with known depth maps)‏

  25. 3D Scene Implementation • Segment image into superpixels • Estimate surface normal of each superpixel (using Markov Random Field model)‏ • Optional: Detect and extract foreground objects • Map textures to planes Original image Modeled depth map A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene Structure from a Single Still Image". In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007 A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene Structure from a Single Still Image". In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007

  26. Image features • Superpixel features (xi)‏ • Color and texture features as in Photo Pop-Up • Vector also includes features of neighboring superpixels • Boundary features (xij)‏ • Color difference, texture difference, edge detector

  27. Markov Random Field Model • First term: model planes in terms of image features of superpixels • Second term: model planes in terms of pairs of superpixels, with constraints... A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene Structure from a Single Still Image". In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007

  28. Model constraints • Connected structure: except where there is an occlusion, neighboring superpixels are likely to be connected • Coplanar structure: except where there are folds, neighboring superpixels are likely to lie on the same plane • Co-linearity: long straight lines in the image correspond to straight lines in 3D

  29. Foreground objects • Automatically-detected foreground objects may be removed from model (for example: pedestrians, using Dalal & Triggs detector)‏ • Detected objects add 3D cues (pedestrians are basically vertical, occlude other surfaces)‏

  30. 3D Scene Demonstration

  31. Results A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene Structure from a Single Still Image". In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007

  32. 3D Scene Discussion • Pros: • Handles a variety of scene types • Fairly accurate (about 2/3 of scenes correct)‏ • Automatic • Handles foreground objects • Cons: • Still fails on 1/3 of scenes

  33. Discussion • Simple 3D models are adequate for many scenes • You can get pretty far without human input (but still would be better results with human annotation of scenes) • Extensions? • Use photo completion techniques to handle occlusions? • Massive training sets -> better 3D models?

More Related