Loading in 5 sec....

Stereo and Projective Structure from MotionPowerPoint Presentation

Stereo and Projective Structure from Motion

- 86 Views
- Uploaded on
- Presentation posted in: General

Stereo and Projective Structure from Motion

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

04/13/10

Stereo and Projective Structure from Motion

Computer Vision

CS 543 / ECE 549

University of Illinois

Derek Hoiem

Many slides adapted from Lana Lazebnik, Silvio Saverese, Steve Seitz

- Recap of epipolar geometry
- Recovering structure
- Generally, how can we estimate 3D positions for matched points in two images? (triangulation)
- If we have a moving camera, how can we recover 3D points? (projective structure from motion)
- If we have a calibrated stereo pair, how can we get dense depth estimates? (stereo fusion)

- Why can’t we get depth if the camera doesn’t translate?
- Why can’t we get a nice panorama if the camera does translate?

- Point x in left image corresponds to epipolar line l’ in right image
- Epipolar line passes through the epipole (the intersection of the cameras’ baseline with the image plane

- Fundamental matrix maps from a point in one image to a line in the other
- If x and x’ correspond to the same 3d point X:

Assume we have matched points x x’ with outliers

Homography (No Translation)

Fundamental Matrix (Translation)

Assume we have matched points x x’ with outliers

Homography (No Translation)

Fundamental Matrix (Translation)

- Correspondence Relation
- Normalize image coordinates
- RANSAC with 4 points
- De-normalize:

Assume we have matched points x x’ with outliers

Homography (No Translation)

Fundamental Matrix (Translation)

Correspondence Relation

Normalize image coordinates

RANSAC with 8 points

Enforce by SVD

De-normalize:

- Correspondence Relation
- Normalize image coordinates
- RANSAC with 4 points
- De-normalize:

- We can get projection matrices P and P’ up to a projective ambiguity
- Code:
function P = vgg_P_from_F(F)

[U,S,V] = svd(F);

e = U(:,3);

P = [-vgg_contreps(e)*F e];

See HZ p. 255-256

- Fundamental matrix song

X

- Generally, rays Cx and C’x’ will not exactly intersect
- Can solve via SVD, finding a least squares solution to a system of equations

x

x'

Further reading: HZ p. 312-313

Given P, P’, x, x’

- Precondition points and projection matrices
- Create matrix A
- [U, S, V] = svd(A)
- X = V(:, end)
Pros and Cons

- Works for any number of corresponding images
- Not projectively invariant

Code: http://www.robots.ox.ac.uk/~vgg/hzbook/code/vgg_multiview/vgg_X_from_xP_lin.m

- Minimize projected error while satisfying xTFx=0
- Solution is a 6-degree polynomial of t, minimizing

Further reading: HZ p. 318

Xj

x1j

x3j

x2j

P1

P3

P2

- Given: m images of n fixed 3D points
- xij = Pi Xj, i = 1,… , m, j = 1, … , n
- Problem: estimate m projection matrices Pi and n 3D points Xj from the mn corresponding points xij

Slides from Lana Lazebnik

- Given: m images of n fixed 3D points
- xij = Pi Xj, i = 1,… , m, j = 1, … , n
- Problem: estimate m projection matrices Pi and n 3D points Xj from the mn corresponding points xij
- With no calibration info, cameras and points can only be recovered up to a 4x4 projective transformation Q:
- X → QX, P → PQ-1
- We can solve for structure and motion when
- 2mn >= 11m +3n – 15
- For two cameras, at least 7 points are needed

- Initialize motion from two images using fundamental matrix
- Initialize structure by triangulation
- For each additional view:
- Determine projection matrix of new camera using all the known 3D points that are visible in its image – calibration

points

cameras

- Initialize motion from two images using fundamental matrix
- Initialize structure by triangulation
- For each additional view:
- Determine projection matrix of new camera using all the known 3D points that are visible in its image – calibration
- Refine and extend structure: compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation

points

cameras

- Initialize motion from two images using fundamental matrix
- Initialize structure by triangulation
- For each additional view:
- Determine projection matrix of new camera using all the known 3D points that are visible in its image – calibration
- Refine and extend structure: compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation

- Refine structure and motion: bundle adjustment

points

cameras

- Non-linear method for refining structure and motion
- Minimizing reprojection error

Xj

P1Xj

x3j

x1j

P3Xj

P2Xj

x2j

P1

P3

P2

- Self-calibration (auto-calibration) is the process of determining intrinsic camera parameters directly from uncalibrated images
- For example, when the images are acquired by a single moving camera, we can use the constraint that the intrinsic parameter matrix remains fixed for all the images
- Compute initial projective reconstruction and find 3D projective transformation matrix Q such that all camera matrices are in the form Pi = K [Ri| ti]

- Can use constraints on the form of the calibration matrix: zero skew

- From two images, we can:
- Recover fundamental matrix F
- Recover canonical cameras P and P’ from F
- Estimate 3d position values X for corresponding points x and x’

- For a moving camera, we can:
- Initialize by computing F, P, X for two images
- Sequentially add new images, computing new P, refining X, and adding points
- Auto-calibrate assuming fixed calibration matrix to upgrade to similarity transform

Photo synth

Noah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring photo collections in 3D," SIGGRAPH 2006

http://photosynth.net/

Building Rome in a Day: Agarwal et al. 2009

- Steve Seitz will talk about “Reconstructing the World from Photos on the Internet”
- Monday, April 26th, 4pm in Siebel Center

- Fuse a calibrated binocular stereo pair to produce a depth image

image 1

image 2

Dense depth map

Many of these slides adapted from Steve Seitz and Lana Lazebnik

- For each pixel in the first image
- Find corresponding epipolar line in the right image
- Examine all pixels on the epipolar line and pick the best match
- Triangulate the matches to get depth information

- Simplest case: epipolar lines are scanlines
- When does this happen?

- Image planes of cameras are parallel to each other and to the baseline
- Camera centers are at same height
- Focal lengths are the same

- Image planes of cameras are parallel to each other and to the baseline
- Camera centers are at same height
- Focal lengths are the same
- Then, epipolar lines fall along the horizontal scan lines of the images

Epipolar constraint:

R = I t = (T, 0, 0)

x

x’

t

The y-coordinates of corresponding points are the same!

X

z

x

x’

f

f

BaselineB

O

O’

Disparity is inversely proportional to depth!

- Reproject image planes onto a common plane parallel to the line between optical centers
- Pixel motion is horizontal after this transformation
- Two homographies (3x3 transform), one for each input image reprojection
- C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo Vision. IEEE Conf. Computer Vision and Pattern Recognition, 1999.

- If necessary, rectify the two stereo images to transform epipolar lines into scanlines
- For each pixel x in the first image
- Find corresponding epipolar scanline in the right image
- Examine all pixels on the scanline and pick the best match x’
- Compute disparity x-x’ and set depth(x) = 1/(x-x’)

Left

Right

- Slide a window along the right scanline and compare contents of that window with the reference window in the left image
- Matching cost: SSD or normalized correlation

scanline

Matching cost

disparity

Left

Right

scanline

SSD

Left

Right

scanline

Norm. corr

- Smaller window
- More detail
- More noise

- Larger window
- Smoother disparity maps
- Less detail

W = 3

W = 20

Occlusions, repetition

Textureless surfaces

Non-Lambertian surfaces, specularities

Data

Window-based matching

Ground truth

- So far, matches are independent for each point
- What constraints or priors can we add?

- Uniqueness
- For any point in one image, there should be at most one matching point in the other image

- Uniqueness
- For any point in one image, there should be at most one matching point in the other image

- Ordering
- Corresponding points should be in the same order in both views

- Uniqueness
- For any point in one image, there should be at most one matching point in the other image

- Ordering
- Corresponding points should be in the same order in both views

Ordering constraint doesn’t hold

- Uniqueness
- For any point in one image, there should be at most one matching point in the other image

- Ordering
- Corresponding points should be in the same order in both views

- Smoothness
- We expect disparity values to change slowly (for the most part)

I2

D

- Energy functions of this form can be minimized using graph cuts

I1

W1(i)

W2(i+D(i))

D(i)

- Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001

Ground truth

Graph cuts

- Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001

- For the latest and greatest: http://www.middlebury.edu/stereo/

- Recap of epipolar geometry
- Epipoles are intersection of baseline with image planes
- Matching point in second image is on a line passing through its epipole
- Fundamental matrix maps from a point in one image to an epipole in the other
- Can recover canonical camera matrices from F (with projective ambiguity)

- Recovering structure
- Triangulation to recover 3D position of two matched points in images with known projection matrices
- Sequential algorithm to recover structure from a moving camera, followed by auto-calibration by assuming fixed K
- Get depth from stereo pair by aligning via homography and searching across scanlines to match; Depth is inverse to disparity.

- KLT tracking
- Elegant SFM method using tracked points, assuming orthographic projection
- Optical flow