# Stereo and Projective Structure from Motion - PowerPoint PPT Presentation

1 / 49

04/13/10. Stereo and Projective Structure from Motion. Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem. Many slides adapted from Lana Lazebnik, Silvio Saverese, Steve Seitz. This class. Recap of epipolar geometry Recovering structure

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Stereo and Projective Structure from Motion

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

04/13/10

## Stereo and Projective Structure from Motion

Computer Vision

CS 543 / ECE 549

University of Illinois

Derek Hoiem

Many slides adapted from Lana Lazebnik, Silvio Saverese, Steve Seitz

### This class

• Recap of epipolar geometry

• Recovering structure

• Generally, how can we estimate 3D positions for matched points in two images? (triangulation)

• If we have a moving camera, how can we recover 3D points? (projective structure from motion)

• If we have a calibrated stereo pair, how can we get dense depth estimates? (stereo fusion)

### Basic Questions

• Why can’t we get depth if the camera doesn’t translate?

• Why can’t we get a nice panorama if the camera does translate?

### Recap: Epipoles

• Point x in left image corresponds to epipolar line l’ in right image

• Epipolar line passes through the epipole (the intersection of the cameras’ baseline with the image plane

### Recap: Fundamental Matrix

• Fundamental matrix maps from a point in one image to a line in the other

• If x and x’ correspond to the same 3d point X:

### Recap: Automatically Relating Projections

Assume we have matched points x x’ with outliers

Homography (No Translation)

Fundamental Matrix (Translation)

### Recap: Automatically Relating Projections

Assume we have matched points x x’ with outliers

Homography (No Translation)

Fundamental Matrix (Translation)

• Correspondence Relation

• Normalize image coordinates

• RANSAC with 4 points

• De-normalize:

### Recap: Automatically Relating Projections

Assume we have matched points x x’ with outliers

Homography (No Translation)

Fundamental Matrix (Translation)

Correspondence Relation

Normalize image coordinates

RANSAC with 8 points

Enforce by SVD

De-normalize:

• Correspondence Relation

• Normalize image coordinates

• RANSAC with 4 points

• De-normalize:

### Recap

• We can get projection matrices P and P’ up to a projective ambiguity

• Code:

function P = vgg_P_from_F(F)

[U,S,V] = svd(F);

e = U(:,3);

P = [-vgg_contreps(e)*F e];

See HZ p. 255-256

### Recap

• Fundamental matrix song

### Triangulation: Linear Solution

X

• Generally, rays Cx and C’x’ will not exactly intersect

• Can solve via SVD, finding a least squares solution to a system of equations

x

x'

### Triangulation: Linear Solution

Given P, P’, x, x’

• Precondition points and projection matrices

• Create matrix A

• [U, S, V] = svd(A)

• X = V(:, end)

Pros and Cons

• Works for any number of corresponding images

• Not projectively invariant

Code: http://www.robots.ox.ac.uk/~vgg/hzbook/code/vgg_multiview/vgg_X_from_xP_lin.m

### Triangulation: Non-linear Solution

• Minimize projected error while satisfying xTFx=0

• Solution is a 6-degree polynomial of t, minimizing

### Projective structure from motion

Xj

x1j

x3j

x2j

P1

P3

P2

• Given: m images of n fixed 3D points

• xij = Pi Xj, i = 1,… , m, j = 1, … , n

• Problem: estimate m projection matrices Pi and n 3D points Xj from the mn corresponding points xij

Slides from Lana Lazebnik

### Projective structure from motion

• Given: m images of n fixed 3D points

• xij = Pi Xj, i = 1,… , m, j = 1, … , n

• Problem: estimate m projection matrices Pi and n 3D points Xj from the mn corresponding points xij

• With no calibration info, cameras and points can only be recovered up to a 4x4 projective transformation Q:

• X → QX, P → PQ-1

• We can solve for structure and motion when

• 2mn >= 11m +3n – 15

• For two cameras, at least 7 points are needed

### Sequential structure from motion

• Initialize motion from two images using fundamental matrix

• Initialize structure by triangulation

• Determine projection matrix of new camera using all the known 3D points that are visible in its image – calibration

points

cameras

### Sequential structure from motion

• Initialize motion from two images using fundamental matrix

• Initialize structure by triangulation

• Determine projection matrix of new camera using all the known 3D points that are visible in its image – calibration

• Refine and extend structure: compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation

points

cameras

### Sequential structure from motion

• Initialize motion from two images using fundamental matrix

• Initialize structure by triangulation

• Determine projection matrix of new camera using all the known 3D points that are visible in its image – calibration

• Refine and extend structure: compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation

• Refine structure and motion: bundle adjustment

points

cameras

• Non-linear method for refining structure and motion

• Minimizing reprojection error

Xj

P1Xj

x3j

x1j

P3Xj

P2Xj

x2j

P1

P3

P2

### Self-calibration

• Self-calibration (auto-calibration) is the process of determining intrinsic camera parameters directly from uncalibrated images

• For example, when the images are acquired by a single moving camera, we can use the constraint that the intrinsic parameter matrix remains fixed for all the images

• Compute initial projective reconstruction and find 3D projective transformation matrix Q such that all camera matrices are in the form Pi = K [Ri| ti]

• Can use constraints on the form of the calibration matrix: zero skew

### Summary so far

• From two images, we can:

• Recover fundamental matrix F

• Recover canonical cameras P and P’ from F

• Estimate 3d position values X for corresponding points x and x’

• For a moving camera, we can:

• Initialize by computing F, P, X for two images

• Sequentially add new images, computing new P, refining X, and adding points

• Auto-calibrate assuming fixed calibration matrix to upgrade to similarity transform

Photo synth

Noah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring photo collections in 3D," SIGGRAPH 2006

http://photosynth.net/

### 3D from multiple images

Building Rome in a Day: Agarwal et al. 2009

### Plug: Steve Seitz Talk

• Steve Seitz will talk about “Reconstructing the World from Photos on the Internet”

• Monday, April 26th, 4pm in Siebel Center

### Special case: Dense binocular stereo

• Fuse a calibrated binocular stereo pair to produce a depth image

image 1

image 2

Dense depth map

Many of these slides adapted from Steve Seitz and Lana Lazebnik

### Basic stereo matching algorithm

• For each pixel in the first image

• Find corresponding epipolar line in the right image

• Examine all pixels on the epipolar line and pick the best match

• Triangulate the matches to get depth information

• Simplest case: epipolar lines are scanlines

• When does this happen?

### Simplest Case: Parallel images

• Image planes of cameras are parallel to each other and to the baseline

• Camera centers are at same height

• Focal lengths are the same

### Simplest Case: Parallel images

• Image planes of cameras are parallel to each other and to the baseline

• Camera centers are at same height

• Focal lengths are the same

• Then, epipolar lines fall along the horizontal scan lines of the images

### Special case of fundamental matrix

Epipolar constraint:

R = I t = (T, 0, 0)

x

x’

t

The y-coordinates of corresponding points are the same!

### Depth from disparity

X

z

x

x’

f

f

BaselineB

O

O’

Disparity is inversely proportional to depth!

### Stereo image rectification

• Reproject image planes onto a common plane parallel to the line between optical centers

• Pixel motion is horizontal after this transformation

• Two homographies (3x3 transform), one for each input image reprojection

• C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo Vision. IEEE Conf. Computer Vision and Pattern Recognition, 1999.

### Basic stereo matching algorithm

• If necessary, rectify the two stereo images to transform epipolar lines into scanlines

• For each pixel x in the first image

• Find corresponding epipolar scanline in the right image

• Examine all pixels on the scanline and pick the best match x’

• Compute disparity x-x’ and set depth(x) = 1/(x-x’)

### Correspondence search

Left

Right

• Slide a window along the right scanline and compare contents of that window with the reference window in the left image

• Matching cost: SSD or normalized correlation

scanline

Matching cost

disparity

Left

Right

scanline

SSD

Left

Right

scanline

Norm. corr

### Effect of window size

• Smaller window

• More detail

• More noise

• Larger window

• Smoother disparity maps

• Less detail

W = 3

W = 20

### Failures of correspondence search

Occlusions, repetition

Textureless surfaces

Non-Lambertian surfaces, specularities

### Results with window search

Data

Window-based matching

Ground truth

### How can we improve window-based matching?

• So far, matches are independent for each point

• What constraints or priors can we add?

### Stereo constraints/priors

• Uniqueness

• For any point in one image, there should be at most one matching point in the other image

### Stereo constraints/priors

• Uniqueness

• For any point in one image, there should be at most one matching point in the other image

• Ordering

• Corresponding points should be in the same order in both views

### Stereo constraints/priors

• Uniqueness

• For any point in one image, there should be at most one matching point in the other image

• Ordering

• Corresponding points should be in the same order in both views

Ordering constraint doesn’t hold

### Non-local constraints

• Uniqueness

• For any point in one image, there should be at most one matching point in the other image

• Ordering

• Corresponding points should be in the same order in both views

• Smoothness

• We expect disparity values to change slowly (for the most part)

### Stereo matching as energy minimization

I2

D

• Energy functions of this form can be minimized using graph cuts

I1

W1(i)

W2(i+D(i))

D(i)

• Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001

### Many of these constraints can be encoded in an energy function and solved using graph cuts

Ground truth

Graph cuts

• Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001

• For the latest and greatest: http://www.middlebury.edu/stereo/

### Summary

• Recap of epipolar geometry

• Epipoles are intersection of baseline with image planes

• Matching point in second image is on a line passing through its epipole

• Fundamental matrix maps from a point in one image to an epipole in the other

• Can recover canonical camera matrices from F (with projective ambiguity)

• Recovering structure

• Triangulation to recover 3D position of two matched points in images with known projection matrices

• Sequential algorithm to recover structure from a moving camera, followed by auto-calibration by assuming fixed K

• Get depth from stereo pair by aligning via homography and searching across scanlines to match; Depth is inverse to disparity.

### Next class

• KLT tracking

• Elegant SFM method using tracked points, assuming orthographic projection

• Optical flow