Rigid Structure from Video

406 Views

Download Presentation
## Rigid Structure from Video

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Rigid Structure from Video**Pedro M. Q. Aguiar**Outline**• Other methods - limitations • Proposed approach • Problem formulation • Algorithms • Experiments • Motivation • Segmentation of 2D rigid moving objects • Inference of 3D rigid structure**Content-based video representation**apps: compression, non-linear editing, virtual reality, etc Motivation • Video • Generative Video (GV) [Jasinschi & Moura, 95] • flat scenario • flat moving objects • PROBLEM: Segmentation of 2D rigid moving objects • 3D content-based representation • 3D rigid shape • 3D motion • PROBLEM: Inference of 3D rigid structure (shape and motion)**Motion segmentation in low texture**with low texture, segmentation fails ! • Two-frame motion-based segmentation • No prior knowledge about shape, texture • [Diehl, 91] time consumingalgorithms ! • Possible solution - smoothing • Statistical regularization [Dubuisson & Jain, 95] • Combine motion with other attributes [Bouthemy & François, 93] • Proposed approach - exploit rigidity over a set of frames • Explicit modeling of occlusion • Feasible implementation of MLE**Observation model**background camera window camera position camera position object template (modeling of oclusion) object position object texture noise**Maximum Likelihood estimation**• Given • set of F frames • Estimate • background texture • object texture • object template • camera motion • object motion • ML cost function over all frames and pixels • ML estimate**Minimization procedure**• ML estimation quadratic in O and B average of the observations, after registration • Object and background estimates linear in T average of the observations, in the regions not occluded by the object nonlinear in T • Decouple the estimation of the position vectors • Motion is estimated on a frame by frame basis [Bergen et al, 92]**Minimization procedure - two-step iterative method**• Replacing and in the ML cost function nonlinear minimization ! • Replacing only in the ML cost function • minimize using a two-step iterative method: • solve for with fixed • solve for with fixed (quadratic, closed-form solution) (linear, closed-form solution)**Minimization procedure - segmentation matrix**Segmentation matrix • Template estimate • Replacing only in the ML cost function Accumulated differences between each pair of co-registered frames Accumulated differences between each frame and the background • regions where the test is inconclusive with the available F frames linear in T !**Experiment**moving object three frames from the image sequence background**Experiment**background estimate Two-step method template estimate**Experiment**background estimate moving objects four frames from a video sequence**3D structure from 2D video**• Motivation: 3D content-based video representation (application areas go well behind digital video) • Key step: recovery of 3D shape and 3D motion from an image sequence • Strongest cue: motion of the brightness pattern • Structure From Motion: • Step 1. Compute the 2D motion on the image plane • Step 2. Recover the 3D motion and the depth**Two-frame SFM - common problem**• step 1. track feature points across a set of frames • step 2. recover relative depth and set of 3D positions • Two-frame SFM failswhen object is far from camera 3D • Solution: exploit rigidity - multi-frame SFM • Multi-frame Structure From Motion:**Factorization method**expedite method • Factorization [Tomasi & Kanade, 92]: • uses linear subspace constraints • 3D structure is estimated by factorizing a measurement matrix R whose entries are the trajectories of feature point projections • without noise, R is rank 3. AnSVD is used to factorize matrix R • Multi-frame SFM - hard problem: • non-linear • large set of unknowns (due to the entire set of 3D positions) • Problems: • track a large set of features: computationally very heavy, if possible • cost of SVD: high for large number of features or frames**Proposed approach: surfaced-based factorization**• Induces a parametric description for the 2D motion in the image plane • Recover the 3D shape and 3D motion parameters from the 2D motion parameters by further exploiting linear subspace constraints: • surface-based factorization • rank 1 factorization • weighted factorization uses a fast algorithm to compute only the largest singular value computes the weighted estimate without additional computational cost • Describe the 3D shape by a local parameterization**Maximum Likelihood formulation**rather than the two components of the motion, local depth is a single unknown • Observations: the images in the sequence. Unknowns: object texture, 3D shape, 3D motion • Through ML, 3D structure is recovered: • Exploiting object rigidity over a set of frames • Directly from the image intensity values so, where do SFM and factorization come from ? • Minimization procedure : • Minimize with respect to the texture in terms of 3D shape and 3D motion • After replacing the texture estimate, the ML cost function depends on the 3D structure only through the 2D motion in the image plane • Estimate 3D motion by inferring SFM (factorization). Plug-in the 3D motion estimates • Minimize the ML cost function with respect to the relative depth • Local 2D motion estimation is ill-posed - aperture problem. Direct methods: • Infer 3D structure by using the brightness change constraintbetween two frames [Horn & Weldon, 88] • Kalman filter to update estimates over time [J. Hell, 90]**Observation model**• Observation model texture shape 3D position • Unknowns:**Texture estimate**• Texture estimate - weighted average • ML estimate**SFM as an approximation to MLE**• The ML cost function depends on the 3D structure only through the 2D motion induced in the image plane (no approximations involved) • Insert the texture estimate into the cost function • 3D structure estimation: • 3D motion estimation: • Compute 2D motion • SFM: rank 1 surface-based factorization • 3D shape estimation: • Plug-in the 3D motion estimate into the ML cost function • Then, minimize with respect to the shape • (The estimates can be refined by minimizing the ML cost function in two alternate steps, • but initialization is the key problem)**Feature-based SFM**Translation estimate: Define:**Rank 1 factorization**• Decomposition (minimize without constraints) Define: • Normalization (computes by approximating the constraints) Define:**Rank 1 factorization - experiment**three larger singularvalues of R matrix is well described by its largest singular value**Rank 1 factorization - experiment**all trajectories have equal shape - it depends only on the 3D motion. The scaling factor depends on the 3D shape (relative depth) 3D shape and 3D motion are observed in a coupled way through the feature trajectories**Surface-based factorization**• Orthographic projection (easily extended to scaled-orthographic and para-perspective projections) • 2D motion in the image plane is affine Relation between the parameters: • Rank 1 factorization Multi-frame SFM: • Piecewise planar 3D shapes**Surface-based factorization - experiment**smooth texture image motion parameters image sequence**Surface-based factorization - experiment**motion shape**Weighted factorization**observation noise • rank 1 factorization**Weighted factorization - experiment**non-weighted estimates weighted estimates two components of translation six entries of the rotation matrix feature trajectories**ML estimate of the 3D shape**• Image motion: known motion parameters affine mapping that depends only on the 3D motion • Define a sequence: • Motion of the affine mapped sequence: unknown relative depth shape of the trajectory of s (known from 3D motion) magnitude of the trajectory of s (unknown relative depth) • Plug-in the 3D motion estimate into the ML cost function • Estimating the relative depth after plugging-in the 3D motion is more constrained than estimating the image motion • Motivation for the minimization procedure**Minimization procedure - multiresolution**• Multiresolution continuation-type method • coarse-to-fine as more images are being taken into account • each stage minimizes the ML cost function by using a Gauss-Newton method components of the image gradient • Region R - constant relative depth z**Experiment**• Image sequence: • and motion: • Shape**Experiment**Affine mapped image sequence: • Shape:**Experiment**without smoothing Multiresolution continuation-type method. Shape estimate:**Experiment**• Synthesizing different views:**Application - video compression**Original Compressed 317:1 Compressed 575:1 Texture patches JPEG compressed**Major contributions and extensions**• Explicit modeling of occlusion • Multiframe motion segmentation algorithm (two-step) • Surface-based factorization • Rank 1 factorization • Weighted factorization • extension: contour model • extensions: • other projection models • multibody • occlusion • 3D deformable shape from a set of cameras • subspace constraints for image motion estimation • Multiresolution algorithm for direct inference of 3D shape • extension: parameterized surface model**Experiment**Multiresolution continuation-type method. Shape estimate: