Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion

Stanford CS223B Computer Vision, Winter 2006Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado Slides by: Gary Bradski, Intel Research and Stanford SAIL

Structure From Motion features camera Recover: structure (feature locations), motion (camera extrinsics)

Structure From Motion (1) [Tomasi & Kanade 92]

Structure From Motion (4a): Images Marc Pollefeys

Structure From Motion (4b) Marc Pollefeys

Structure From Motion • Problem 1: • Given n points pij =(xij, yij) in m images • Reconstruct structure: 3-D locations Pj =(xj, yj, zj) • Reconstruct camera positions (extrinsics) Mi=(Aj, bj) • Problem 2: • Establish correspondence: c(pij)

SFM: General Formulation O X -x Z f

SFM: Bundle Adjustment O X -x Z f

Bundle Adjustment • SFM = Nonlinear Least Squares problem • Minimize through • Gradient Descent • Conjugate Gradient • Gauss-Newton • Levenberg Marquardt (!) • Prone to local minima

Count # Constraints vs #Unknowns • m camera poses • n points • 2mn point constraints • 6m+3n unknowns • Suggests: need 2mn  6m + 3n • But: Can we really recover all parameters???

How Many Parameters Can’t We Recover? We can recover all but… Place Your Bet!

Count # Constraints vs #Unknowns • m camera poses • n points • 2mn point constraints • 6m+3n unknowns • Suggests: need 2mn  6m + 3n • But: Can we really recover all parameters??? • Can’t recover origin, orientation (6 params) • Can’t recover scale (1 param) • Thus, we need 2mn  6m + 3n -7

Are done? • No, bundle adjustment has many local minima.

The “Trick Of The Day” • Replace Perspective by Orthographic Geometry • Replace Euclidean Geometry by Affine Geometry • Solve SFM linearly (“closed” form, globally optimal) • Post-Process to make solution Euclidean • Post-Process to make solution perspective By Tomasi and Kanade, 1992

Orthographic Camera Model Extrinsic Parameters Rotation Orthographic Projection Limit of Pinhole Model:

Orthographic Projection Limit of Pinhole Model: Orthographic Projection

The Orthographic SFM Problem subject to

The Affine SFM Problem drop the constraints subject to

Count # Constraints vs #Unknowns • m camera poses • n points • 2mn point constraints • 8m+3n unknowns • Suggests: need 2mn  8m + 3n • But: Can we really recover all parameters???

How Many Parameters Can’t We Recover? We can recover all but… Place Your Bet!

The Answer is (at least): 12

Points for Solving Affine SFM Problem • m camera poses • n points • Need to have: 2mn  8m + 3n-12

Affine SFM Fix coordinate system by making p0=origin Rank Theorem: Q has rank 3 Proof:

The Rank Theorem 2m elements n elements

Singular Value Decomposition

Affine Solution to Orthographic SFM Gives also the optimal affine reconstruction under noise

Back To Orthographic Projection Find C and d for which constraints are met Search in 12-dim space (instead of 8m + 3n-12)

Back To Projective Geometry Orthographic (in the limit) Projective

Back To Projective Geometry O X -x Z f Optimize Using orthographic solution as starting point

The “Trick Of The Day” • Replace Perspective by Orthographic Geometry • Replace Euclidean Geometry by Affine Geometry • Solve SFM linearly (“closed” form, globally optimal) • Post-Process to make solution Euclidean • Post-Process to make solution perspective By Tomasi and Kanade, 1992

Structure From Motion • Problem 1: • Given n points pij =(xij, yij) in m images • Reconstruct structure: 3-D locations Pj =(xj, yj, zj) • Reconstruct camera positions (extrinsics) Mi=(Aj, bj) • Problem 2: • Establish correspondence: c(pij)

The Correspondence Problem View 1 View 2 View 3

Correspondence: Solution 1 • Track features (e.g., optical flow) • …but fails when images taken from widely different poses

Correspondence: Solution 2 • Start with random solution A, b, P • Compute soft correspondence: p(c|A,b,P) • Plug soft correspondence into SFM • Reiterate See Dellaert/Seitz/Thorpe/Thrun, Machine Learning Journal, 2003

Example

Results: Cube

Animation

Tomasi’s Benchmark Problem

Reconstruction with EM

3-D Structure

Correspondence: Alternative Approach • Ransac [Fisher/Bolles] = Random sampling and consensus

Summary SFM • Problem • Determine feature locations (=structure) • Determine camera extrinsic (=motion) • Two Principal Solutions • Bundle adjustment (nonlinear least squares, local minima) • SVD (through orthographic approximation, affine geometry) • Correspondence • (RANSAC) • Expectation Maximization

Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion

Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion

Presentation Transcript

Stanford CS223B Computer Vision, Winter 2005 Lecture 3 Filters and Features (with Matlab)

Stanford CS223B Computer Vision, Winter 2006 Lecture 2 Lenses, Filters, Features

Motion in Computer Vision

Stanford CS223B Computer Vision, Winter 2006 Lecture 5 Stereo I

Stanford CS223B Computer Vision, Winter 2006 Lecture 11 Filters

Stanford CS223B Computer Vision, Winter 2005 Lecture 11: Structure From Motion 2

Stanford CS223B Computer Vision, Winter 2006 Lecture 4 Camera Calibration

Stanford CS223B Computer Vision, Winter 2006 Lecture 1 Intro and Image Formation

Structure from Motion

Introduction to Computer Vision Lecture 8

Structure from motion

Introduction to Computer Vision Lecture 8

Stanford CS223B Computer Vision, Winter 2005 Lecture 3 Advanced Features

Lecture 12: Structure from motion

Lecture 20: Structure from Motion

Stanford CS223B Computer Vision, Winter 2006 Lecture 7 Optical Flow

Stanford CS223B Computer Vision, Winter 2007 Lecture 8 Structure From Motion

Stanford CS223B Computer Vision, Winter 2005 Lecture 4 Advanced Features

Stanford CS223B Computer Vision, Winter 2005 Lecture 3 Filters and Features (with Matlab)

Stanford CS223B Computer Vision, Winter 2005 Lecture 2 Lenses and Camera Calibration

CSCE 643 Computer Vision: Structure from Motion

Stanford CS223B Computer Vision, Winter 2007 Lecture 2b Software for Computer Vision