recognition of 3d objects or 3d recognition of objects
Download
Skip this Video
Download Presentation
Recognition of 3D Objects or, 3D Recognition of Objects

Loading in 2 Seconds...

play fullscreen
1 / 62

Recognition of 3D Objects or, 3D Recognition of Objects - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

Recognition of 3D Objects or, 3D Recognition of Objects. Alec Rivers. Overview. 3D object recognition was dead, now it’s coming back These papers are within the last 2 years Doesn’t really work yet, but it’s just a beginning. Papers.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Recognition of 3D Objects or, 3D Recognition of Objects' - stacy-crane


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
overview
Overview
  • 3D object recognition was dead, now it’s coming back
    • These papers are within the last 2 years
  • Doesn’t really work yet, but it’s just a beginning
papers
Papers
  • The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects
    • CVPR 2006
  • 3D LayoutCRF for Multi-View Object Class Recognition and Segmentation
    • CVPR 2007
  • 3D Generic Object Categorization, Localization and Pose Estimation
    • ICCV 2007
slide4

The Layout Consistent Random Field for

Recognizing and Segmenting Partially Occluded Objects

John Winn

Microsoft Research

Cambridge

Jamie Shotton

University of

Cambridge

introduction
Introduction
  • Needed to understand next paper
    • It’s 2D
  • What does it try to solve?
    • Recognize one class of object at one pose and one scale, but with occlusions
  • Does it work?
    • Yes, really well, especially given occlusions
introduction1
Introduction
  • What is interesting about it?
    • Segments objects
    • Interesting methods
      • No sliding windows
    • Multiple instances for free
overview1
Overview
  • Instead of sparse parts at features, use a densely covering part grid

[Fischler & Elschlager 73]

[Winn & Shotton 06]

recognizing new image overview
Recognizing New Image – Overview
  • Walk through an example
recognizing a new image overview
Recognizing a New Image – Overview

1. Pixels guess their part

recognizing a new image overview1
Recognizing a New Image – Overview

2. Maximize layout consistency

layout consistency
Layout Consistency
  • Defined pairwise between two pixels:

PI, PJ => Bool

  • Means pixels I, J could be part of one instance
  • Toy example:

Object: 1,2,3,4,5

Image:

2,3,4,5,0,0,1,2,3,4,5,2,3,4,5,0,0

layout consistency1

instance 1

instance 2

instance 3

occlusion

Layout Consistency
  • Defined pairwise between two pixels:

PI, PJ => Bool

  • Means pixels I, J could be part of one instance
  • Toy example:

Object: 1,2,3,4,5

Image:

2,3,4,5,0,0,1,2,3,4,5,2,3,4,5,0,0

layout consistency2
Layout Consistency
  • In 2D, consistent IFF their relative assignments could exist in a deformed regular grid
  • Formally:
overview2
Overview

2. Maximize layout consistency

layout consistency3
Layout Consistency

3. Find consistent regions; create instances

Possible due to layout inconsistency at occluding borders

overview3
Overview

1. Pixels guess parts

2. Maximize layout consistency

3. Create instances

[Winn & Shotton 06]

implementation details
Implementation Details
  • Trained on manually segmented data
  • Crux of algorithm is conditional distribution
    • Like a probability for each possibility, or a score
  • Algorithm is just finding maximum
part appearance
Part Appearance
  • Each pixel prefers parts that match surrounding image data
  • Randomized decision trees
    • Multiple trees, each trained on a subset of the data
    • Node is maximal-information-gain binary test on two nearby pixels’ intensities
    • Leaf of node is histogram of part possibilities
    • Actual preference is average over all trees
deformed training part labelings
Deformed Training Part Labelings
  • Fits parts tighter

1. Label by grid

2. Learn from data

3. Apply to data

4. Set guesses as truth

5. Relearn

part layout
Part Layout
  • Preference for layout consistency plus additional pairwise costs:
  • Helps remove noise
  • Align edges along image edges
part layout1
Part Layout
  • Return to toy example

Just appearance:

1,2,0,4,5,0,0,1,2,3,3,4,0,0,1,0

With layout costs:

1,2,3,4,5,0,0,1,2,3,3,4,0,0,0,0

instance 1

instance 2

instance layout
Instance Layout
  • Apply weak force trying to keep parts at sane positions relative to instance data (centroid, L/R flip)
  • Toy example: 0,1,1,1,1,1,2,3,4,5 is bad!
implementation
Implementation
  • Theoretically, finding global maximum of
  • This is “MAP” estimation
    • MAP = Maximum A Posteriori
  • In reality, using tricks to find a local maximum
    • α-expansion, annealed expansion move
approximating map estimation
Approximating MAP Estimation
  • Global maximum is intractable
  • α-expansion
    • Start with given configuration
    • For a given new label, ask each pixel: do you want to switch?
    • Can be solved efficiently with graph cuts
  • Repeat over all part labels
  • Annealed expansion move
    • Relabel grid, but offset to avoid local maxima
results2
Results

Oh, snap!

thoughts
Thoughts
  • Bottom-up system is great
    • No sliding windows
    • Multiple instances for free
  • Information about segment boundaries: occlusion vs. completion
    • Reason about complete segment boundaries?
slide29

Derek Hoiem

Carnegie Mellon

University

Carsten Rother

Microsoft Research Cambridge

John Winn

3D LayoutCRF for Multi-View Object Class Recognition and Segmentation

introduction2
Introduction
  • What does it try to solve?
    • Extend LayoutCRF to be pose and scale invariant
  • Does it work?
    • Improvements to LayoutCRF work;3D information does little
  • What is interesting about it?
    • One method for combining 2D methods with a 3D framework
    • The improvements to 2D are good
overview4
Overview
  • Generate rough 3D model of class
  • Parts created over 3D model
overview5
Overview
  • Probability distribution
refinements
Refinements
  • Part layout, instance layout take into account 3D position
refinements1
Refinements
  • New term: Instance cost
instance cost
Instance Cost
  • Eliminates false positives
    • LayoutCRF: object-background cost
  • Explain multiple groups with one instance
refinements2
Refinements
  • New term: Instance appearance
instance appearance
Instance appearance
  • Learn color distribution for each instance
  • Separate groups of pixels: definitely object, definitely background
  • Use these to learn colors
  • Apply cost to non-standard-color pixels

This would fail…

implementation details1
Implementation Details
  • Parts are learned separately for each 45o viewing range, and for different scales
  • Instance layout is also discretized by viewpoint
results comparison to lcrf
Results – Comparison to LCRF
  • A little better(+ 8% recall)
  • BUT they actually turn off 3D information for this comparison
  • Better segmentation
results pascal 2006
Results – PASCAL 2006
  • 61% precision-recall
    • Previous best: 45%
    • But, reduced test set
  • Without 3D: -5%
  • Without color: -5%
thoughts1
Thoughts
  • Color, instance costs very nice
  • Shoehorns LCRF into 3D without much success
  • LCRF is already somewhat viewpoint-invariant: segments can stretch
slide42

Silvio Savarese

University of Illinois at

Urbana-Champaign

Fei-Fei Li

Princeton University

3D Generic Object Categorization, Localization and Pose Estimation

introduction3
Introduction
  • What does it try to solve?
    • Multiclass pose-invariant, scale-invariant object recognition
  • Does it work?
    • Not well. But it may be due to implementation
  • Why is it interesting?
    • Attempt learn actual 3D structure of an object
    • Interesting data structure for 3D info
overview data structure
Overview – Data Structure
  • Decompose object into large parts; find “canonical view”
  • Relate parts by mutual appearance
related work aspect graphs
Related Work – Aspect Graphs
  • Represent stable views rather than parts

Aspect graph of a cube:

Image [Khoh & Kovesi, 99]

data structure for cube
Data Structure for Cube

Top

Back

Left

Front

Right

Bottom

related work
Related Work
  • Constellation models
  • Similar, but wraps around in 3D

vs.

implementation links
Implementation – Links
  • Link from canonical PI to PJ consists of
  • Matrix defines transformation to observe PJ when PI is viewed canonically
  • AIJ is skew, tIJ is translation
implementation links1
Implementation – Links

HIJ

Part Jcanonical view

Part Icanonical view

implementation links2
Implementation – Links

HJI

Part Icanonical view

Part Jcanonical view

overview6
Overview
  • Learn data structure from images (unsupervised)
  • Apply to new image by recognizing parts and selecting model that best accounts for their appearances
implementation learning parts
Implementation – Learning Parts
  • Tricky implementation!
  • Part = collection of SIFT features

For each pair of images of the same instance:

1. Find set M of shared SIFT features

2. RANSAC M to find a group of pairs that transform together

3. Group close-together parts of M into candidate parts

background what is ransac
Background: What is RANSAC?
  • Finds subset of data that is accounted for by some model; ignores outliers

1. Guess points

2. Fit model

3. Select matching points

4. Calculate error

Repeat!

ransac
RANSAC
  • In our case: find points for which a homographic transformation of the points in image I yield the points in image J
implementation canonical views
Implementation – Canonical Views
  • Goal: front-facing view of part
  • Construct directed graph
    • Direction means “more front-facing”
  • Traverse to find canonical view
  • How to go from pairwise-defined to graph?
implementation1
Implementation
  • Upshot: a collection of parts with canonical views and links
recognizing a new image
Recognizing a New Image

1. Extract SIFT features

2. Use scanning windows to get 5 best canonical part matches

3. For every pair of found parts, for each model, score how well the model accounts for their relative appearances

4. Select the model with the best score

results3
Results
  • Not stellar
  • New test set
    • Overfit?
    • Comparison?
thoughts2
Thoughts
  • Low performance may make it useless as a system, but the data structure is very nice
  • Implementation has a lot of tricky parts
    • Doesn’t seem to select great canonical parts
    • I wonder if there’s a simpler way
    • Are SIFT features the right choice?
extremely confusing figure
Extremely Confusing Figure
  • “Each dashed box indicates a particular view. A subset of the canonical parts is presented for each view. Part relationships are denoted by arrows.”
overall conclusions
Overall Conclusions
  • 3D is just starting out. Doesn’t work too well right now, but neither did MV at the beginning.
  • LayoutCRF:
    • Nice method to learn 2D patches
  • 3D Object Categorization:
    • Nice conceptual model relating 3D parts
  • Possible to combine strengths of both?
ad