1 / 87

Layered Scene Representations

Layered Scene Representations. Vision for Graphics CSE 590SS, Winter 2001 Richard Szeliski. Motion representations. How can we describe this scene?. Block-based motion prediction. Break image up into square blocks Estimate translation for each block

Download Presentation

Layered Scene Representations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Layered Scene Representations Vision for GraphicsCSE 590SS, Winter 2001Richard Szeliski

  2. Motion representations • How can we describe this scene? Vision for Graphics

  3. Block-based motion prediction • Break image up into square blocks • Estimate translation for each block • Use this to predict next frame, code difference (MPEG-2) Vision for Graphics

  4. Layered motion • Break image sequence up into “layers”: •  = • Describe each layer’s motion Vision for Graphics

  5. Outline • Why layers? • 2-D layers [Wang & Adelson 94; Weiss 97] • 3-D layers [Baker et al. 98] • Layered Depth Images [Shade et al. 98] • Transparency [Szeliski et al. 00] • Bayesian estimation [Torr et al. 99] Vision for Graphics

  6. Layered motion • Advantages: • can represent occlusions / disocclusions • each layer’s motion can be smooth • video segmentation for semantic processing • Difficulties: • how do we determine the correct number? • how do we assign pixels? • how do we model the motion? Vision for Graphics

  7. Layers for video summarization Vision for Graphics

  8. Background modeling (MPEG-4) • Convert masked images into a background sprite for layered video coding • + + + • = Vision for Graphics

  9. What are layers? • [Wang & Adelson, 1994] • intensities • alphas • velocities Vision for Graphics

  10. How do we composite them? Vision for Graphics

  11. How do we form them? Vision for Graphics

  12. How do we form them? Vision for Graphics

  13. How do we estimate the layers? • compute coarse-to-fine flow • estimate affine motion in blocks (regression) • cluster with k-means • assign pixels to best fitting affine region • re-estimate affine motions in each region… Vision for Graphics

  14. Layer synthesis • For each layer: • stabilize the sequence with the affine motion • compute median value at each pixel • Determine occlusion relationships Vision for Graphics

  15. Results Vision for Graphics

  16. What if the motion is not affine? • Use a “regularized” (smooth) motion field • [Weiss, CVPR’97] Vision for Graphics

  17. A Layered Approach To Stereo Reconstruction Simon Baker, Richard Szeliski and P. Anandan CVPR’98

  18. z y x Camera 2 Camera 1 Volumetric Approaches to Stereo • Examples: • Disparity-Spaces [Intille and Bobick, ‘94] [Scharstein and Szeliski, ‘96] • Space-Coloring [Seitz and Dyer, ‘97] • Maximum-Flow Stereo [Roy and Cox, ‘98] • Advantages: • Modeling occlusions [Intille and Bobick, ‘94] • Mixed pixels + transparency [Szeliski and Golland, ‘98] • Equal treatment of many images [Collins, ‘96] Vision for Graphics

  19. Layer 1 Layer 2 Layer 3 Camera 2 Camera 1 2.5-D Layered Approach • Additional advantages over volumetric approaches: • Fewer degrees of freedom • Less resampling artifacts • Robustness of global model + local correction • c.f. “Plane + Parallax” and “Model-Based Stereo” • Output particularly suitable for certain applications • e.g. Image-based rendering and interactive editing Vision for Graphics

  20. layers (“sprites”) Layered Stereo • Use arbitrarily oriented sprites • Estimate 3D plane equation for each sprite Vision for Graphics

  21. World point Plane vector n= (n , n , n , n ) T x= (x, y, z, 1) T x y z d l Plane equation n . x = 0 l u Layer sprite L = (a . r , a . g , a . b , a) v l T (u, v, 1) World origin = Q x l Residual depth Z l Coordinate frame defined by u = Layer Representation Vision for Graphics

  22. Image I k v u Camera P k Image Formation Layer l Scene v u Boolean mask B k l v u Masked image M k l Vision for Graphics

  23. Input: Images I & Cameras P k k Initialize layer assignment B kl Estimate plane vectors n Re-assign pixelslayers B l kl Estimate residual depth Z Estimate sprite images Ll l Refine Layer Sprites L l Output: n , L , & Z l l l Overview Vision for Graphics

  24. Layer Initialization Alternatives • Iterate dominant motion estimation • e.g. [Irani et al., ‘95] • Apply simple stereo algorithm + fit planes • Color segmentation • e.g. [Sawhney and Ayer, ‘94] • Human initialization • e.g. [Debevec et al., ‘96] Vision for Graphics

  25. M M M M M M M kl kl kl jl il jl jl o o o l l l l l Warped images , , … functions of n only H H H H H ik ik ij ij ij l Minimize image variance using hierarchical gradient descent Estimation of Plane Equations Layer l l H ik Camera P k Camera P j o Camera P i Vision for Graphics

  26. M M M kl il jl “Blend” the masked images, warped onto the layer plane Estimation of Layer Sprites Plane n l Camera P k Camera P j Camera P i Vision for Graphics

  27. Estimation of Residual Depth • Per-pixel residual depth estimation • plane plus parallax[Anandan et al.] • model-based stereo[Debevec et al.] • better accuracy / fidelity • makes forward warping more difficult Vision for Graphics

  28. T Perturbed Plane n + (0,0,0,d) l M il Estimation of Residual Depth • Warp masked images onto perturbed plane • Compute variance image • For each pixel, choose d that minimizes variance • Smooth, incorporating confidence weighting [Szeliski & Golland, ‘98] • Recompute sprite using “Plane + Parallax” warp Camera P k Camera P M jl M j kl Camera P i Vision for Graphics

  29. Pixel Assignment Sprite L l • Warp masked image onto • each layer plane Plane n l • Compute difference images • Un-warp difference images • For each pixel, choose the • best difference across layers Un-warped difference image • Smooth pixel assignment M il Camera P i Vision for Graphics

  30. Image 1 Image 9 Grey coded planar depth Initial Segmentation Flower Garden Results Vision for Graphics

  31. Flower Garden Results Recovered Sprite: Without residual depth estimation Recovered Sprite: With residual depth estimation Vision for Graphics

  32. Image 1 of 5 Initial segmentation Grey coded planar depth Residual depth Graphics Symposium Results Vision for Graphics

  33. Graphics Symposium Results • Resulting sprite collection Vision for Graphics

  34. Original image 3 Re-synthesized image 3 Novel view without residual depth Novel view with residual depth Graphics Symposium Results Vision for Graphics

  35. Layered Stereo Demo • SpriteViewer: renders sprites with depth Vision for Graphics

  36. Discussion • Layer initialization: • Can tolerate bad initial plane estimates • Residual depth estimation: • Plane sweep algorithm, similar to [Szeliski and Golland, ‘98] • Pixel assignment: • Combine color and residual depth estimates • Currently under investigation Vision for Graphics

  37. Summary • New approach to stereo matching: • represent scene as collection of layers • each layer has a 3-D plane equation, an alpha-matted color image, and an optional residual depth • generalizes layered motion to 3-D • Computation: • plane eqns. by warping mosaics of masked images • residual depth by perturbing planes • iteratively refine color values and pixel assignments Vision for Graphics

  38. Layered Depth Images Jonathan Shade Steven Gortler Li-wei He Richard Szeliski SIGGRAPH’98

  39. How to render a layer + parallax? • Can’t use inverse warping [Laveau 94] Vision for Graphics

  40. 3D Sprites with Depth • 3D sprite consists of: • alpha-matted image I1(x1,y1) • 4×4 camera matrix C1[ w1x1 w1y1 w1d1w1]T = C1 [X Y Z 1]T • plane equation AX + BY + CZ + D = 0(forms third row of C1 ) • optional per-pixel depth d1(x1,y1) Vision for Graphics

  41. Sprites with Depth • Store d1(x1,y1) (scaled displacement) along with each sprite image I1(x1,y1) • I1 d1 I1 d1 Vision for Graphics

  42. 3D Sprites — Reprojection •  • sprites new view •  • use standard texture mapping (projective warp) Vision for Graphics

  43. Forward Mapping • Mapping equation with per-pixel depth d1:[ w2x2 w2y2 w2 ]T = H1,2 [ x1 y1 1 ]T +d1 e1,2 •  I1 d1(I2 ) I2 • Problems: gaps and aliasing Vision for Graphics

  44. Inverse Mapping • Reverse order of images 1 & 2:[ w1x1 w1y1 w1 ]T = H2,1 [ x2 y2 1 ]T +d2 e2,1 I1(I2)d2 I2 • Problem: we don’t know d2! Vision for Graphics

  45. Crude perspective map • How to map d1 d2? • Simple idea: use perspective transform H2,1 I1 d1 d2 I2 • Works well for small amounts of motion Vision for Graphics

  46. Better forward map • How to map d1 d2? • Better idea: use full H1,2x1+d1e1,2 fwd. map  I1 d1 d2 I2 • Works better for moderate amounts of motion Vision for Graphics

  47. 2-pass Mapping • Why is 2-pass mapping (d1 d2 forward followed by I1 I2 backward) a good idea? • can tolerate bigger errors in d1 mapping (since d1 is typically smooth) • can store/process d1 at lower resolution • can use better filtering on color image Vision for Graphics

  48. Sprites with Depth — Demo • Demo Vision for Graphics

  49. Refinements • Only forward map d1 with parallax component • Use affine approximation to parallax flow • Better gap filling • Forward map (u,v)flow instead of d1 depth Vision for Graphics

  50. Layered Depth Images (LDIs) • Store multiple (color,z) values at each pixel • Similar to [sparse] volumetric representation • Render with forward warp (splat) Vision for Graphics

More Related