1 / 46

sample slides - PowerPoint PPT Presentation

Rigid Structure from Video Pedro M. Q. Aguiar Outline Other methods - limitations Proposed approach Problem formulation Algorithms Experiments Motivation Segmentation of 2D rigid moving objects Inference of 3D rigid structure Content-based video representation

Related searches for sample slides

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'sample slides' - benjamin

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Pedro M. Q. Aguiar

• Other methods - limitations

• Proposed approach

• Problem formulation

• Algorithms

• Experiments

• Motivation

• Segmentation of 2D rigid moving objects

• Inference of 3D rigid structure

apps: compression, non-linear editing, virtual reality, etc

Motivation

• Video

• Generative Video (GV) [Jasinschi & Moura, 95]

• flat scenario

• flat moving objects

• PROBLEM: Segmentation of 2D rigid moving objects

• 3D content-based representation

• 3D rigid shape

• 3D motion

• PROBLEM: Inference of 3D rigid structure (shape and motion)

with low texture,

segmentation fails !

• Two-frame motion-based segmentation

• No prior knowledge about shape, texture

• [Diehl, 91]

time consumingalgorithms !

• Possible solution - smoothing

• Statistical regularization [Dubuisson & Jain, 95]

• Combine motion with other attributes [Bouthemy & François, 93]

• Proposed approach - exploit rigidity over a set of frames

• Explicit modeling of occlusion

• Feasible implementation of MLE

background

camera window

camera position

camera position

object template

(modeling of oclusion)

object position

object texture

noise

• Given

• set of F frames

• Estimate

• background texture

• object texture

• object template

• camera motion

• object motion

• ML cost function

over all frames and pixels

• ML estimate

• ML estimation

quadratic in O and B

average of the observations,

after registration

• Object and background estimates

linear in T

average of the observations, in the

regions not occluded by the object

nonlinear in T

• Decouple the estimation of the position vectors

• Motion is estimated on a frame by frame basis [Bergen et al, 92]

• Replacing and in the ML cost function

nonlinear minimization !

• Replacing only in the ML cost function

• minimize using a two-step iterative method:

• solve for with fixed

• solve for with fixed

(quadratic, closed-form solution)

(linear, closed-form solution)

Segmentation

matrix

• Template estimate

• Replacing only in the ML cost function

Accumulated differences between each pair of co-registered frames

Accumulated differences between each frame and the background

• regions where the test is inconclusive

with the available F frames

linear in T !

moving object

three frames from the image sequence

background

background estimate

Two-step method

template estimate

background estimate

moving objects

four frames from a video sequence

• Motivation: 3D content-based video representation (application areas go well behind digital video)

• Key step: recovery of 3D shape and 3D motion from an image sequence

• Strongest cue: motion of the brightness pattern

• Structure From Motion:

• Step 1. Compute the 2D motion on the image plane

• Step 2. Recover the 3D motion and the depth

• step 1. track feature points across a set of frames

• step 2. recover relative depth and set of 3D positions

• Two-frame SFM failswhen object is far from camera

3D

• Solution: exploit rigidity - multi-frame SFM

• Multi-frame Structure From Motion:

expedite method

• Factorization [Tomasi & Kanade, 92]:

• uses linear subspace constraints

• 3D structure is estimated by factorizing a measurement matrix R whose entries are the trajectories of feature point projections

• without noise, R is rank 3. AnSVD is used to factorize matrix R

• Multi-frame SFM - hard problem:

• non-linear

• large set of unknowns (due to the entire set of 3D positions)

• Problems:

• track a large set of features: computationally very heavy, if possible

• cost of SVD: high for large number of features or frames

• Induces a parametric description for the 2D motion in the image plane

• Recover the 3D shape and 3D motion parameters from the 2D motion parameters by further exploiting linear subspace constraints:

• surface-based factorization

• rank 1 factorization

• weighted factorization

uses a fast algorithm to compute only the largest singular value

computes the weighted estimate without additional computational cost

• Describe the 3D shape by a local parameterization

rather than the two components of the motion, local depth is a single unknown

• Observations: the images in the sequence. Unknowns: object texture, 3D shape, 3D motion

• Through ML, 3D structure is recovered:

• Exploiting object rigidity over a set of frames

• Directly from the image intensity values

so, where do SFM and factorization come from ?

• Minimization procedure :

• Minimize with respect to the texture in terms of 3D shape and 3D motion

• After replacing the texture estimate, the ML cost function depends on the 3D structure only through the 2D motion in the image plane

• Estimate 3D motion by inferring SFM (factorization). Plug-in the 3D motion estimates

• Minimize the ML cost function with respect to the relative depth

• Local 2D motion estimation is ill-posed - aperture problem. Direct methods:

• Infer 3D structure by using the brightness change constraintbetween two frames [Horn & Weldon, 88]

• Kalman filter to update estimates over time [J. Hell, 90]

• Observation model

texture

shape

3D position

• Unknowns:

• Texture estimate - weighted average

• ML estimate

• The ML cost function depends on the 3D structure only through the 2D motion induced in the image plane (no approximations involved)

• Insert the texture estimate into the cost function

• 3D structure estimation:

• 3D motion estimation:

• Compute 2D motion

• SFM: rank 1 surface-based factorization

• 3D shape estimation:

• Plug-in the 3D motion estimate into the ML cost function

• Then, minimize with respect to the shape

• (The estimates can be refined by minimizing the ML cost function in two alternate steps,

• but initialization is the key problem)

Translation estimate:

Define:

• Decomposition (minimize without constraints)

Define:

• Normalization (computes by approximating the constraints)

Define:

three larger singularvalues of R

matrix is well described

by its largest singular value

all trajectories have equal shape - it depends only on the 3D motion. The scaling factor depends on the 3D shape (relative depth)

3D shape and 3D motion are observed in a coupled way through the feature trajectories

• Orthographic projection

(easily extended to scaled-orthographic and para-perspective projections)

• 2D motion in the image plane is affine

Relation between the parameters:

• Rank 1 factorization

Multi-frame SFM:

• Piecewise planar 3D shapes

smooth texture

image motion

parameters

image sequence

motion

shape

observation noise

• rank 1 factorization

non-weighted estimates

weighted estimates

two components of translation

six entries of the rotation matrix

feature trajectories

• Image motion:

known motion parameters

affine mapping that depends only on the 3D motion

• Define a sequence:

• Motion of the affine mapped sequence:

unknown relative depth

shape of the trajectory of s (known from 3D motion)

magnitude of the trajectory of s (unknown relative depth)

• Plug-in the 3D motion estimate into the ML cost function

• Estimating the relative depth after plugging-in the 3D motion is more constrained than estimating the image motion

• Motivation for the minimization procedure

• Multiresolution continuation-type method

• coarse-to-fine as more images are being taken into account

• each stage minimizes the ML cost function by using a Gauss-Newton method

components of the image gradient

• Region R - constant relative depth z

• Image sequence:

• and motion:

• Shape

Affine mapped image sequence:

• Shape:

without smoothing

Multiresolution continuation-type method. Shape estimate:

• Synthesizing different views:

Original

Compressed 317:1

Compressed 575:1

Texture patches JPEG compressed

• Explicit modeling of occlusion

• Multiframe motion segmentation algorithm (two-step)

• Surface-based factorization

• Rank 1 factorization

• Weighted factorization

• extension: contour model

• extensions:

• other projection models

• multibody

• occlusion

• 3D deformable shape from a set of cameras

• subspace constraints for image motion estimation

• Multiresolution algorithm for direct inference of 3D shape

• extension: parameterized surface model

Multiresolution continuation-type method. Shape estimate: