Rigid Structure from Video

1 / 46

# Rigid Structure from Video - PowerPoint PPT Presentation

Rigid Structure from Video Pedro M. Q. Aguiar Outline Other methods - limitations Proposed approach Problem formulation Algorithms Experiments Motivation Segmentation of 2D rigid moving objects Inference of 3D rigid structure Content-based video representation

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Rigid Structure from Video' - benjamin

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Rigid Structure from Video

Pedro M. Q. Aguiar

Outline
• Other methods - limitations
• Proposed approach
• Problem formulation
• Algorithms
• Experiments
• Motivation
• Segmentation of 2D rigid moving objects
• Inference of 3D rigid structure
Content-based video representation

apps: compression, non-linear editing, virtual reality, etc

Motivation
• Video
• Generative Video (GV) [Jasinschi & Moura, 95]
• flat scenario
• flat moving objects
• PROBLEM: Segmentation of 2D rigid moving objects
• 3D content-based representation
• 3D rigid shape
• 3D motion
• PROBLEM: Inference of 3D rigid structure (shape and motion)
Motion segmentation in low texture

with low texture,

segmentation fails !

• Two-frame motion-based segmentation
• No prior knowledge about shape, texture
• [Diehl, 91]

time consumingalgorithms !

• Possible solution - smoothing
• Statistical regularization [Dubuisson & Jain, 95]
• Combine motion with other attributes [Bouthemy & François, 93]
• Proposed approach - exploit rigidity over a set of frames
• Explicit modeling of occlusion
• Feasible implementation of MLE
Observation model

background

camera window

camera position

camera position

object template

(modeling of oclusion)

object position

object texture

noise

Maximum Likelihood estimation
• Given
• set of F frames
• Estimate
• background texture
• object texture
• object template
• camera motion
• object motion
• ML cost function

over all frames and pixels

• ML estimate
Minimization procedure
• ML estimation

quadratic in O and B

average of the observations,

after registration

• Object and background estimates

linear in T

average of the observations, in the

regions not occluded by the object

nonlinear in T

• Decouple the estimation of the position vectors
• Motion is estimated on a frame by frame basis [Bergen et al, 92]
Minimization procedure - two-step iterative method
• Replacing and in the ML cost function

nonlinear minimization !

• Replacing only in the ML cost function
• minimize using a two-step iterative method:
• solve for with fixed
• solve for with fixed

(linear, closed-form solution)

Minimization procedure - segmentation matrix

Segmentation

matrix

• Template estimate
• Replacing only in the ML cost function

Accumulated differences between each pair of co-registered frames

Accumulated differences between each frame and the background

• regions where the test is inconclusive

with the available F frames

linear in T !

Experiment

moving object

three frames from the image sequence

background

Experiment

background estimate

Two-step method

template estimate

Experiment

background estimate

moving objects

four frames from a video sequence

3D structure from 2D video
• Motivation: 3D content-based video representation (application areas go well behind digital video)
• Key step: recovery of 3D shape and 3D motion from an image sequence
• Strongest cue: motion of the brightness pattern
• Structure From Motion:
• Step 1. Compute the 2D motion on the image plane
• Step 2. Recover the 3D motion and the depth
Two-frame SFM - common problem
• step 1. track feature points across a set of frames
• step 2. recover relative depth and set of 3D positions
• Two-frame SFM failswhen object is far from camera

3D

• Solution: exploit rigidity - multi-frame SFM
• Multi-frame Structure From Motion:
Factorization method

expedite method

• Factorization [Tomasi & Kanade, 92]:
• uses linear subspace constraints
• 3D structure is estimated by factorizing a measurement matrix R whose entries are the trajectories of feature point projections
• without noise, R is rank 3. AnSVD is used to factorize matrix R
• Multi-frame SFM - hard problem:
• non-linear
• large set of unknowns (due to the entire set of 3D positions)
• Problems:
• track a large set of features: computationally very heavy, if possible
• cost of SVD: high for large number of features or frames
Proposed approach: surfaced-based factorization
• Induces a parametric description for the 2D motion in the image plane
• Recover the 3D shape and 3D motion parameters from the 2D motion parameters by further exploiting linear subspace constraints:
• surface-based factorization
• rank 1 factorization
• weighted factorization

uses a fast algorithm to compute only the largest singular value

computes the weighted estimate without additional computational cost

• Describe the 3D shape by a local parameterization
Maximum Likelihood formulation

rather than the two components of the motion, local depth is a single unknown

• Observations: the images in the sequence. Unknowns: object texture, 3D shape, 3D motion
• Through ML, 3D structure is recovered:
• Exploiting object rigidity over a set of frames
• Directly from the image intensity values

so, where do SFM and factorization come from ?

• Minimization procedure :
• Minimize with respect to the texture in terms of 3D shape and 3D motion
• After replacing the texture estimate, the ML cost function depends on the 3D structure only through the 2D motion in the image plane
• Estimate 3D motion by inferring SFM (factorization). Plug-in the 3D motion estimates
• Minimize the ML cost function with respect to the relative depth
• Local 2D motion estimation is ill-posed - aperture problem. Direct methods:
• Infer 3D structure by using the brightness change constraintbetween two frames [Horn & Weldon, 88]
• Kalman filter to update estimates over time [J. Hell, 90]
Observation model
• Observation model

texture

shape

3D position

• Unknowns:
Texture estimate
• Texture estimate - weighted average
• ML estimate
SFM as an approximation to MLE
• The ML cost function depends on the 3D structure only through the 2D motion induced in the image plane (no approximations involved)
• Insert the texture estimate into the cost function
• 3D structure estimation:
• 3D motion estimation:
• Compute 2D motion
• SFM: rank 1 surface-based factorization
• 3D shape estimation:
• Plug-in the 3D motion estimate into the ML cost function
• Then, minimize with respect to the shape
• (The estimates can be refined by minimizing the ML cost function in two alternate steps,
• but initialization is the key problem)
Feature-based SFM

Translation estimate:

Define:

Rank 1 factorization
• Decomposition (minimize without constraints)

Define:

• Normalization (computes by approximating the constraints)

Define:

Rank 1 factorization - experiment

three larger singularvalues of R

matrix is well described

by its largest singular value

Rank 1 factorization - experiment

all trajectories have equal shape - it depends only on the 3D motion. The scaling factor depends on the 3D shape (relative depth)

3D shape and 3D motion are observed in a coupled way through the feature trajectories

Surface-based factorization
• Orthographic projection

(easily extended to scaled-orthographic and para-perspective projections)

• 2D motion in the image plane is affine

Relation between the parameters:

• Rank 1 factorization

Multi-frame SFM:

• Piecewise planar 3D shapes
Surface-based factorization - experiment

smooth texture

image motion

parameters

image sequence

Weighted factorization

observation noise

• rank 1 factorization
Weighted factorization - experiment

non-weighted estimates

weighted estimates

two components of translation

six entries of the rotation matrix

feature trajectories

ML estimate of the 3D shape
• Image motion:

known motion parameters

affine mapping that depends only on the 3D motion

• Define a sequence:
• Motion of the affine mapped sequence:

unknown relative depth

shape of the trajectory of s (known from 3D motion)

magnitude of the trajectory of s (unknown relative depth)

• Plug-in the 3D motion estimate into the ML cost function
• Estimating the relative depth after plugging-in the 3D motion is more constrained than estimating the image motion
• Motivation for the minimization procedure
Minimization procedure - multiresolution
• Multiresolution continuation-type method
• coarse-to-fine as more images are being taken into account
• each stage minimizes the ML cost function by using a Gauss-Newton method

components of the image gradient

• Region R - constant relative depth z
Experiment
• Image sequence:
• and motion:
• Shape
Experiment

Affine mapped image sequence:

• Shape:
Experiment

without smoothing

Multiresolution continuation-type method. Shape estimate:

Experiment
• Synthesizing different views:
Application - video compression

Original

Compressed 317:1

Compressed 575:1

Texture patches JPEG compressed

Major contributions and extensions
• Explicit modeling of occlusion
• Multiframe motion segmentation algorithm (two-step)
• Surface-based factorization
• Rank 1 factorization
• Weighted factorization
• extension: contour model
• extensions:
• other projection models
• multibody
• occlusion
• 3D deformable shape from a set of cameras
• subspace constraints for image motion estimation
• Multiresolution algorithm for direct inference of 3D shape
• extension: parameterized surface model
Experiment

Multiresolution continuation-type method. Shape estimate: