Layered scene representations
Download
1 / 87

Layered Scene Representations - PowerPoint PPT Presentation


  • 133 Views
  • Uploaded on

Layered Scene Representations. Vision for Graphics CSE 590SS, Winter 2001 Richard Szeliski. Motion representations. How can we describe this scene?. Block-based motion prediction. Break image up into square blocks Estimate translation for each block

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Layered Scene Representations' - amethyst-webster


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Layered scene representations

Layered Scene Representations

Vision for GraphicsCSE 590SS, Winter 2001Richard Szeliski


Motion representations
Motion representations

  • How can we describe this scene?

Vision for Graphics


Block based motion prediction
Block-based motion prediction

  • Break image up into square blocks

  • Estimate translation for each block

  • Use this to predict next frame, code difference (MPEG-2)

Vision for Graphics


Layered motion
Layered motion

  • Break image sequence up into “layers”:

  •  =

  • Describe each layer’s motion

Vision for Graphics


Outline
Outline

  • Why layers?

  • 2-D layers [Wang & Adelson 94; Weiss 97]

  • 3-D layers [Baker et al. 98]

  • Layered Depth Images [Shade et al. 98]

  • Transparency [Szeliski et al. 00]

  • Bayesian estimation [Torr et al. 99]

Vision for Graphics


Layered motion1
Layered motion

  • Advantages:

  • can represent occlusions / disocclusions

  • each layer’s motion can be smooth

  • video segmentation for semantic processing

  • Difficulties:

  • how do we determine the correct number?

  • how do we assign pixels?

  • how do we model the motion?

Vision for Graphics


Layers for video summarization
Layers for video summarization

Vision for Graphics


Background modeling mpeg 4
Background modeling (MPEG-4)

  • Convert masked images into a background sprite for layered video coding

  • + + +

  • =

Vision for Graphics


What are layers
What are layers?

  • [Wang & Adelson, 1994]

  • intensities

  • alphas

  • velocities

Vision for Graphics


How do we composite them
How do we composite them?

Vision for Graphics


How do we form them
How do we form them?

Vision for Graphics


How do we form them1
How do we form them?

Vision for Graphics


How do we estimate the layers
How do we estimate the layers?

  • compute coarse-to-fine flow

  • estimate affine motion in blocks (regression)

  • cluster with k-means

  • assign pixels to best fitting affine region

  • re-estimate affine motions in each region…

Vision for Graphics


Layer synthesis
Layer synthesis

  • For each layer:

  • stabilize the sequence with the affine motion

  • compute median value at each pixel

  • Determine occlusion relationships

Vision for Graphics


Results
Results

Vision for Graphics


What if the motion is not affine
What if the motion is not affine?

  • Use a “regularized” (smooth) motion field

  • [Weiss, CVPR’97]

Vision for Graphics


A layered approach to stereo reconstruction

A Layered Approach To Stereo Reconstruction

Simon Baker, Richard Szeliski and P. Anandan

CVPR’98


Volumetric approaches to stereo

z

y

x

Camera 2

Camera 1

Volumetric Approaches to Stereo

  • Examples:

    • Disparity-Spaces [Intille and Bobick, ‘94] [Scharstein and Szeliski, ‘96]

    • Space-Coloring [Seitz and Dyer, ‘97]

    • Maximum-Flow Stereo [Roy and Cox, ‘98]

  • Advantages:

    • Modeling occlusions [Intille and Bobick, ‘94]

    • Mixed pixels + transparency [Szeliski and Golland, ‘98]

    • Equal treatment of many images [Collins, ‘96]

Vision for Graphics


2 5 d layered approach

Layer 1

Layer 2

Layer 3

Camera 2

Camera 1

2.5-D Layered Approach

  • Additional advantages over volumetric approaches:

    • Fewer degrees of freedom

    • Less resampling artifacts

    • Robustness of global model + local correction

      • c.f. “Plane + Parallax” and “Model-Based Stereo”

    • Output particularly suitable for certain applications

      • e.g. Image-based rendering and interactive editing

Vision for Graphics


Layered stereo

layers (“sprites”)

Layered Stereo

  • Use arbitrarily oriented sprites

  • Estimate 3D plane equation for each sprite

Vision for Graphics


Layer representation

World point

Plane vector

n= (n , n , n , n )

T

x= (x, y, z, 1)

T

x

y

z

d

l

Plane equation

n . x = 0

l

u

Layer sprite

L = (a . r , a . g , a . b , a)

v

l

T

(u, v, 1)

World origin

= Q

x

l

Residual depth Z

l

Coordinate frame defined by

u =

Layer Representation

Vision for Graphics


Image formation

Image I

k

v

u

Camera P

k

Image Formation

Layer l

Scene

v

u

Boolean mask B

k l

v

u

Masked image M

k l

Vision for Graphics


Overview

Input: Images I & Cameras P

k

k

Initialize layer assignment B

kl

Estimate plane vectors n

Re-assign pixelslayers B

l

kl

Estimate residual depth Z

Estimate sprite images Ll

l

Refine Layer Sprites L

l

Output: n , L , & Z

l

l

l

Overview

Vision for Graphics


Layer initialization alternatives
Layer Initialization Alternatives

  • Iterate dominant motion estimation

    • e.g. [Irani et al., ‘95]

  • Apply simple stereo algorithm + fit planes

  • Color segmentation

    • e.g. [Sawhney and Ayer, ‘94]

  • Human initialization

    • e.g. [Debevec et al., ‘96]

Vision for Graphics


Estimation of plane equations

M

M

M

M

M

M

M

kl

kl

kl

jl

il

jl

jl

o

o

o

l

l

l

l

l

Warped images , , … functions of n only

H

H

H

H

H

ik

ik

ij

ij

ij

l

Minimize image variance using hierarchical gradient descent

Estimation of Plane Equations

Layer l

l

H

ik

Camera P

k

Camera P

j

o

Camera P

i

Vision for Graphics


Estimation of layer sprites

M

M

M

kl

il

jl

“Blend” the masked images, warped onto the layer plane

Estimation of Layer Sprites

Plane n

l

Camera P

k

Camera P

j

Camera P

i

Vision for Graphics


Estimation of residual depth
Estimation of Residual Depth

  • Per-pixel residual depth estimation

    • plane plus parallax[Anandan et al.]

    • model-based stereo[Debevec et al.]

    • better accuracy / fidelity

    • makes forward warping more difficult

Vision for Graphics


Estimation of residual depth1

T

Perturbed Plane n + (0,0,0,d)

l

M

il

Estimation of Residual Depth

  • Warp masked images onto perturbed plane

  • Compute variance image

  • For each pixel, choose d that minimizes variance

  • Smooth, incorporating confidence weighting [Szeliski & Golland, ‘98]

  • Recompute sprite using “Plane + Parallax” warp

Camera P

k

Camera P

M

jl

M

j

kl

Camera P

i

Vision for Graphics


Pixel assignment
Pixel Assignment

Sprite L

l

  • Warp masked image onto

  • each layer plane

Plane n

l

  • Compute difference images

  • Un-warp difference images

  • For each pixel, choose the

  • best difference across layers

Un-warped

difference

image

  • Smooth pixel assignment

M

il

Camera P

i

Vision for Graphics


Flower garden results

Image 1

Image 9

Grey coded planar depth

Initial Segmentation

Flower Garden Results

Vision for Graphics


Flower garden results1
Flower Garden Results

Recovered Sprite: Without residual depth estimation

Recovered Sprite: With residual depth estimation

Vision for Graphics


Graphics symposium results

Image 1 of 5

Initial segmentation

Grey coded planar depth

Residual depth

Graphics Symposium Results

Vision for Graphics


Graphics symposium results1
Graphics Symposium Results

  • Resulting sprite collection

Vision for Graphics


Graphics symposium results2

Original image 3

Re-synthesized image 3

Novel view without residual depth

Novel view with residual depth

Graphics Symposium Results

Vision for Graphics


Layered stereo demo
Layered Stereo Demo

  • SpriteViewer: renders sprites with depth

Vision for Graphics


Discussion
Discussion

  • Layer initialization:

    • Can tolerate bad initial plane estimates

    • Residual depth estimation:

      • Plane sweep algorithm, similar to

        [Szeliski and Golland, ‘98]

  • Pixel assignment:

    • Combine color and residual depth estimates

    • Currently under investigation

Vision for Graphics


Summary
Summary

  • New approach to stereo matching:

    • represent scene as collection of layers

    • each layer has a 3-D plane equation, an alpha-matted color image, and an optional residual depth

    • generalizes layered motion to 3-D

  • Computation:

    • plane eqns. by warping mosaics of masked images

    • residual depth by perturbing planes

    • iteratively refine color values and pixel assignments

Vision for Graphics


Layered depth images

Layered Depth Images

Jonathan Shade Steven Gortler

Li-wei He Richard Szeliski

SIGGRAPH’98


How to render a layer parallax
How to render a layer + parallax?

  • Can’t use inverse warping [Laveau 94]

Vision for Graphics


3d sprites with depth
3D Sprites with Depth

  • 3D sprite consists of:

  • alpha-matted image I1(x1,y1)

  • 4×4 camera matrix C1[ w1x1 w1y1 w1d1w1]T = C1 [X Y Z 1]T

  • plane equation AX + BY + CZ + D = 0(forms third row of C1 )

  • optional per-pixel depth d1(x1,y1)

Vision for Graphics


Sprites with depth
Sprites with Depth

  • Store d1(x1,y1) (scaled displacement) along with each sprite image I1(x1,y1)

  • I1 d1 I1 d1

Vision for Graphics


3d sprites reprojection
3D Sprites — Reprojection

  • sprites new view

  • use standard texture mapping (projective warp)

Vision for Graphics


Forward mapping

Forward Mapping

  • Mapping equation with per-pixel depth d1:[ w2x2 w2y2 w2 ]T = H1,2 [ x1 y1 1 ]T +d1 e1,2

I1 d1(I2 ) I2

  • Problems: gaps and aliasing

Vision for Graphics


Inverse mapping

Inverse Mapping

  • Reverse order of images 1 & 2:[ w1x1 w1y1 w1 ]T = H2,1 [ x2 y2 1 ]T +d2 e2,1

I1(I2)d2 I2

  • Problem: we don’t know d2!

Vision for Graphics


Crude perspective map
Crude perspective map

  • How to map d1 d2?

  • Simple idea: use perspective transform H2,1

I1 d1 d2 I2

  • Works well for small amounts of motion

Vision for Graphics


Better forward map
Better forward map

  • How to map d1 d2?

  • Better idea: use full H1,2x1+d1e1,2 fwd. map 

I1 d1 d2 I2

  • Works better for moderate amounts of motion

Vision for Graphics


2 pass mapping
2-pass Mapping

  • Why is 2-pass mapping (d1 d2 forward followed by I1 I2 backward) a good idea?

    • can tolerate bigger errors in d1 mapping (since d1 is typically smooth)

    • can store/process d1 at lower resolution

    • can use better filtering on color image

Vision for Graphics


Sprites with depth demo
Sprites with Depth — Demo

  • Demo

Vision for Graphics


Refinements
Refinements

  • Only forward map d1 with parallax component

  • Use affine approximation to parallax flow

  • Better gap filling

  • Forward map (u,v)flow instead of d1 depth

Vision for Graphics


Layered depth images ldis
Layered Depth Images (LDIs)

  • Store multiple (color,z) values at each pixel

  • Similar to [sparse] volumetric representation

  • Render with forward warp (splat)

Vision for Graphics


Layer extraction from multiple images containing reflections and transparency

Layer extraction from multiple images containing reflections and transparency

Richard Szeliski

Shai Avidan

P. Anandan

CVPR’2000


Transparent motion
Transparent motion and transparency

  • Photograph (Lee) and reflection (Michael)

Vision for Graphics


Previous work
Previous work and transparency

  • Physics-based vision and polarization[Shafer et al.; Wolff; Nayar et al.]

  • Perception of transparency [Adelson…]

  • Transparent motion estimation[Shizawa & Mase; Bergen et al.; Irani et al.; Darrell & Simoncelli]

  • 3-frame layer recovery [Bergen et al.]

Vision for Graphics


Problem formulation
Problem formulation and transparency

MotionX,i( )

X

+

MotionY,i( )

Y

Vision for Graphics


Image formation model
Image formation model and transparency

  • Pure additive mixing of positive signals

  • mk(x) = lWklfl(x)

  • or

  • mk = lWklfl

  • Assume motion is planar (perspective transform, aka homography)

Vision for Graphics


Two processing stages
Two processing stages and transparency

Estimate the motions and initial layer estimates

Compute optimal layer estimates (for known motion)

Vision for Graphics


Dominant motion estimation
Dominant motion estimation and transparency

  • Stabilize sequence by dominant motionrobust affine [Bergen et al. 92; Szeliski & Shum]

Vision for Graphics


Dominant layer estimate

Intensity and transparency

Time

Dominant layer estimate

  • How do we form composite (estimate)?

Vision for Graphics


Average

Intensity and transparency

Time

Average?

Vision for Graphics


Median

Intensity and transparency

Time

Median?

  • Hint: all layers are non-negative

Vision for Graphics


Min composite

Intensity and transparency

Time

Min-composite

  • Smallest value is over-estimate of layer

Vision for Graphics


Difference sequence
Difference sequence and transparency

  • Subtract min-composite from original image =

original - min composite = difference image

Vision for Graphics


Min composite1
Min composite and transparency

Intensity

Time

(overestimate of background layer)

Vision for Graphics


Difference sequence1

Intensity and transparency

Time

Difference sequence

(underestimate of foreground layer)

Vision for Graphics


Stabilizing secondary motion

Intensity and transparency

Time

How do we form composite (estimate)?

Stabilizing secondary motion

Vision for Graphics


Max composite

Intensity and transparency

Time

Max-composite

Largest value is under-estimate of layer

Vision for Graphics


Min max alternation
Min-max alternation and transparency

  • Subtract secondary layer (under-estimate) from original sequence

  • Re-compute dominant motion and better min-composite

  • Iterate …

  • Does this process converge?

Vision for Graphics


Min max alternation1
Min-max alternation and transparency

  • Does this process converge?

    • in theory: yes

    • each iteration reduces number of mis-estimated pixels (tightens the bounds) — proof in paper

Vision for Graphics


Min max alternation2
Min-max alternation and transparency

  • Does this process converge?

    • in practice: no

    • resampling errors and noise both lead to divergence — discussion in paperresampling error noisy

Vision for Graphics


Two processing stages1
Two processing stages and transparency

Estimate the motions and initial layer estimates

Compute optimal layer estimates (for known motion)

Vision for Graphics


Optimal estimation
Optimal estimation and transparency

  • Recall: additive mixing of positive signals

  • mk = lWklfl

  • Use constrained least squares(quadratic programming)

  • min k | lWklfl – mk |2 s.t. fl 0

Vision for Graphics


Least squares example

blue: least squares and transparencyred: constrained LS

Least squares example

background foreground

Vision for Graphics


Uniqueness of solution
Uniqueness of solution and transparency

  • If any layer does not have a “black” region, i.e., if flc, then can add this offset to another layer (and subtract it from fl)

background foreground

Vision for Graphics


Degeneracies in solution

and transparencyrecovered 

 scaled errors 

Degeneracies in solution

  • If motion is degenerate (e.g., horizontal), regions (scanlines) decouple (w/o MRF)

mixed

Vision for Graphics


Noise sensitivity
Noise sensitivity and transparency

  • In general, low-frequency components hard to recover for small motions

 recovered 

mixed

 scaled errors 

Vision for Graphics


Three layer example
Three-layer example and transparency

  • 3 layers with general motion works well = + +

Vision for Graphics


Complete algorithm
Complete algorithm and transparency

Dominant motion with min-composites

Difference (residual) images

Non-dominant motion on differences

Improve the motion estimates

Unconstrained least-squares problem

Constrained least-squares problem

Vision for Graphics


Complete example

min-composite and transparency

Complete example

original

stabilized

Vision for Graphics


Complete example1

max-composite and transparency

Complete example

difference

stabilized

Vision for Graphics


Final results
Final Results and transparency

= +

Vision for Graphics


Another example
Another example and transparency

  • original stabilized min-comp. resid. 2

Vision for Graphics


Results anne and books
Results: Anne and books and transparency

= +

original background foreground (photo)

Vision for Graphics


Transparent layer recovery
Transparent layer recovery and transparency

  • Pure (additive) mixing of intensities

    • simple constrained least squares problem

    • degeneracies for simple or small motions

  • Processing stages

    • dominant motion estimation

    • min- and max-composites to initialize

    • optimization of motion and layers

Vision for Graphics


Future work
Future work and transparency

  • Mitigating degeneracies (regularization)

  • Opaque layers ( estimation)

  • Non-planar geometry (parallax)

Vision for Graphics


Bibliography
Bibliography and transparency

  • J. Y. A. Wang and E. H. Adelson. Representing moving images with layers. IEEE Transactions on Image Processing, 3(5):625--638, September 1994.

  • Y. Weiss and E. H. Adelson. A unified mixture framework for motion segmentation: Incorporating spatial coherence and estimating the number of models. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96), pages 321--326, San Francisco, California, June 1996.

  • Y. Weiss. Smoothness in layers: Motion segmentation using nonparametric mixture estimation. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'97), pages 520--526, San Juan, Puerto Rico, June 1997.

  • P. R. Hsu, P. Anandan, and S. Peleg. Accurate computation of optical flow by using layered motion representations. In Twelfth International Conference on Pattern Recognition (ICPR'94), pages 743--746, Jerusalem, Israel, October 1994. IEEE Computer Society Press

Vision for Graphics


Bibliography1
Bibliography and transparency

  • T. Darrell and A. Pentland. Cooperative robust estimation using layers of support. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(5):474--487, May 1995.

  • S. X. Ju, M. J. Black, and A. D. Jepson. Skin and bones: Multi-layer, locally affine, optical flow and regularization with transparency. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96), pages 307--314, San Francisco, California, June 1996.

  • M. Irani, B. Rousso, and S. Peleg. Computing occluding and transparent motions. International Journal of Computer Vision, 12(1):5--16, January 1994.

  • H. S. Sawhney and S. Ayer. Compact representation of videos through dominant multiple motion estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8):814--830, August 1996.

  • M.-C. Lee et al. A layered video object coding system using sprite and affine motion model. IEEE Transactions on Circuits and Systems for Video Technology, 7(1):130--145, February 1997.

Vision for Graphics


Bibliography2
Bibliography and transparency

  • S. Baker, R. Szeliski, and P. Anandan. A layered approach to stereo reconstruction. In IEEE CVPR'98, pages 434--441, Santa Barbara, June 1998.

  • R. Szeliski, S. Avidan, and P. Anandan. Layer extraction from multiple images containing reflections and transparency. In IEEE CVPR'2000, volume 1, pages 246--253, Hilton Head Island, June 2000.

  • J. Shade, S. Gortler, L.-W. He, and R. Szeliski. Layered depth images. In Computer Graphics (SIGGRAPH'98) Proceedings, pages 231--242, Orlando, July 1998. ACM SIGGRAPH.

  • S. Laveau and O. D. Faugeras. 3-d scene representation as a collection of images. In Twelfth International Conference on Pattern Recognition (ICPR'94), volume A, pages 689--691, Jerusalem, Israel, October 1994. IEEE Computer Society Press.

  • P. H. S. Torr, R. Szeliski, and P. Anandan. An integrated Bayesian approach to layer extraction from image sequences. In Seventh ICCV'98, pages 983--990, Kerkyra, Greece, September 1999.

Vision for Graphics


ad