1 / 77

Computer Vision – Lecture 18

Computer Vision – Lecture 18. Motion and Optical Flow 24.01.2012. Bastian Leibe RWTH Aachen http ://www.mmp.rwth-aachen.de leibe@umic.rwth-aachen.de. Many slides adapted from K. Grauman, S. Seitz, R. Szeliski, M. Pollefeys, S. Lazebnik. TexPoint fonts used in EMF.

rozene
Download Presentation

Computer Vision – Lecture 18

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Vision – Lecture 18 Motion and Optical Flow 24.01.2012 Bastian Leibe RWTH Aachen http://www.mmp.rwth-aachen.de leibe@umic.rwth-aachen.de Many slides adapted from K. Grauman, S. Seitz, R. Szeliski, M. Pollefeys, S. Lazebnik TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAA

  2. Announcements (2) • Exercise sheet 7 out! • Optical flow • Tracking with a Kalman filter • Application: Tracking billiard balls • Exercise will be on Thu, 02.02. • Submit your results until 01.02. night...

  3. Course Outline • Image Processing Basics • Segmentation & Grouping • Object Recognition • Local Features & Matching • Object Categorization • 3D Reconstruction • Motion and Tracking • Motion and Optical Flow • Tracking with Linear Dynamic Models • Repetition

  4. Xj x1j x3j x2j P1 P3 P2 Recap: Structure from Motion • Given: m images of n fixed 3D points xij = Pi Xj , i = 1, … , m, j = 1, … , n • Problem: estimate m projection matrices Piand n 3D points Xjfrom the mn correspondences xij B. Leibe Slide credit: Svetlana Lazebnik

  5. Recap: Structure from Motion Ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same. • More generally: if we transform the scene using a transformation Q and apply the inverse transformation to the camera matrices, then the images do not change B. Leibe Slide credit: Svetlana Lazebnik

  6. Recap: Hierarchy of 3D Transformations • With no constraints on the camera calibration matrix or on the scene, we get a projective reconstruction. • Need additional information to upgrade the reconstruction to affine, similarity, or Euclidean. Projective 15dof Preserves intersection and tangency Preserves parallellism, volume ratios Affine 12dof Similarity 7dof Preserves angles, ratios of length Euclidean 6dof Preserves angles, lengths B. Leibe Slide credit: Svetlana Lazebnik

  7. Recap: Affine Structure from Motion • Let’s create a 2m×n data (measurement) matrix: • The measurement matrix D = MS must have rank 3! Points (3× n) Cameras(2m ×3) C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method.IJCV, 9(2):137-154, November 1992. B. Leibe Slide credit: Svetlana Lazebnik

  8. Recap: Affine Factorization • Obtaining a factorization from SVD: This decomposition minimizes|D-MS|2 Slide credit: Martial Hebert

  9. Projective Structure from Motion • Given: m images of n fixed 3D points • zijxij = Pi Xj , i = 1,… , m, j = 1, … , n • Problem: estimate m projection matrices Pi and n 3D points Xj from the mn correspondences xij • With no calibration info, cameras and points can only be recovered up to a 4x4 projective transformation Q: X → QX, P → PQ-1 • We can solve for structure and motion when 2mn >= 11m +3n – 15 • For two cameras, at least 7 points are needed. B. Leibe Slide credit: Svetlana Lazebnik

  10. Projective Factorization • If we knew the depths z, we could factorize D to estimate M and S. • If we knew M and S, we could solve for z. • Solution: iterative approach (alternate between above two steps). Points (4× n) Cameras(3m ×4) D = MS has rank 4 B. Leibe Slide credit: Svetlana Lazebnik

  11. Sequential Structure from Motion • Initialize motion from two images using fundamental matrix • Initialize structure • For each additional view: • Determine projection matrixof new camera using all the known 3D points that are visible in its image – calibration Points Cameras B. Leibe Slide credit: Svetlana Lazebnik

  12. Sequential Structure from Motion • Initialize motion from two images using fundamental matrix • Initialize structure • For each additional view: • Determine projection matrixof new camera using all the known 3D points that are visible in its image – calibration • Refine and extend structure:compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation Points Cameras B. Leibe Slide credit: Svetlana Lazebnik

  13. Sequential Structure from Motion • Initialize motion from two images using fundamental matrix • Initialize structure • For each additional view: • Determine projection matrixof new camera using all the known 3D points that are visible in its image – calibration • Refine and extend structure:compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation • Refine structure and motion: bundle adjustment Points Cameras B. Leibe Slide credit: Svetlana Lazebnik

  14. Bundle Adjustment • Non-linear method for refining structure and motion • Minimizing mean-square reprojection error Xj P1Xj x3j x1j P3Xj P2Xj x2j P1 P3 B. Leibe P2 Slide credit: Svetlana Lazebnik

  15. Projective Ambiguity • If we don’t know anything about the camera or the scene, the best we can get with this is a reconstruction up to a projective ambiguity Q. • This can already be useful. • E.g. we can answer questions like “at what point does a line intersect a plane”? • If we want to convert this to a “true” reconstruction, we need a Euclidean upgrade. • Need to put in additional knowledgeabout the camera (calibration) orabout the scene (e.g. from markers). • Several methods available (see F&P Chapter 13.5 or H&Z Chapter 19) B. Leibe Images from Hartley & Zisserman

  16. Practical Considerations (1) • Role of the baseline • Small baseline: large depth error • Large baseline: difficult search problem • Solution • Track features between frames until baseline is sufficient. Small Baseline Large Baseline B. Leibe Slide adapted from Steve Seitz

  17. Practical Considerations (2) • There will still be many outliers • Incorrect feature matches • Moving objects  Apply RANSAC to get robust estimates based on the inlier points. • Estimation quality depends on the point configuration • Points that are close together in the image produce less stablesolutions.  Subdivide image into a grid and tryto extract about the same number offeatures per grid cell. B. Leibe

  18. Some Commercial Software Packages • boujou (http://www.2d3.com/) • PFTrack (http://www.thepixelfarm.co.uk/) • MatchMover (http://www.realviz.com/) • SynthEyes (http://www.ssontech.com/) • Icarus (http://aig.cs.man.ac.uk/research/reveal/icarus/) • Voodoo Camera Tracker (http://www.digilab.uni-hannover.de/) B. Leibe

  19. Applications: Matchmoving • Putting virtual objects into real-world videos • Original sequence • Tracked features • SfM results • Final video B. Leibe Videos from Stefan Hafeneger

  20. Applications: Large-Scale SfM from Flickr S. Agarwal, N. Snavely, I. Simon, S.M. Seitz, R. Szeliski, Building Rome in a Day, ICCV’09, 2009. (Video from http://grail.cs.washington.edu/rome/) B. Leibe

  21. Topics of This Lecture • Introduction to Motion • Applications, uses • Motion Field • Derivation • Optical Flow • Brightness constancy constraint • Aperture problem • Lucas-Kanade flow • Iterative refinement • Global parametric motion • Coarse-to-fine estimation • Motion segmentation • KLT Feature Tracking B. Leibe

  22. Video • A video is a sequence of frames captured over time • Now our image data is a function of space (x, y) and time (t) B. Leibe Slide credit: Svetlana Lazebnik

  23. Applications of Segmentation to Video • Background subtraction • A static camera is observing a scene. • Goal: separate the static background from the moving foreground. How to come up with background frame estimate without access to “empty” scene? B. Leibe Slide credit: Svetlana Lazebnik, Kristen Grauman

  24. Applications of Segmentation to Video • Background subtraction • Shot boundary detection • Commercial video is usually composed of shots or sequences showing the same objects or scene. • Goal: segment video into shots for summarization and browsing (each shot can be represented by a single keyframe in a user interface). • Difference from background subtraction: the camera is not necessarily stationary. B. Leibe Slide credit: Svetlana Lazebnik

  25. Applications of Segmentation to Video • Background subtraction • Shot boundary detection • For each frame, compute the distance between the current frame and the previous one: • Pixel-by-pixel differences • Differences of color histograms • Block comparison • If the distance is greater than some threshold, classify the frame as a shot boundary. B. Leibe Slide credit: Svetlana Lazebnik

  26. Applications of Segmentation to Video • Background subtraction • Shot boundary detection • Motion segmentation • Segment the video into multiple coherently moving objects B. Leibe Slide credit: Svetlana Lazebnik

  27. Motion and Perceptual Organization • Sometimes, motion is the only cue… B. Leibe Slide credit: Svetlana Lazebnik

  28. Motion and Perceptual Organization • Sometimes, motion is foremost cue B. Leibe Slide credit: Kristen Grauman

  29. Motion and Perceptual Organization • Even “impoverished” motion data can evoke a strong percept B. Leibe Slide credit: Svetlana Lazebnik

  30. Motion and Perceptual Organization • Even “impoverished” motion data can evoke a strong percept B. Leibe Slide credit: Svetlana Lazebnik

  31. Uses of Motion • Estimating 3D structure • Directly from optic flow • Indirectly to create correspondences for SfM • Segmenting objects based on motion cues • Learning dynamical models • Recognizing events and activities • Improving video quality (motion stabilization) B. Leibe Slide adapted from Svetlana Lazebnik

  32. Motion Estimation Techniques • Direct methods • Directly recover image motion at each pixel from spatio-temporal image brightness variations • Dense motion fields, but sensitive to appearance variations • Suitable for video and when image motion is small • Feature-based methods • Extract visual features (corners, textured areas) and track them over multiple frames • Sparse motion fields, but more robust tracking • Suitable when image motion is large (10s of pixels) B. Leibe Slide credit: Steve Seitz

  33. Topics of This Lecture • Introduction to Motion • Applications, uses • Motion Field • Derivation • Optical Flow • Brightness constancy constraint • Aperture problem • Lucas-Kanade flow • Iterative refinement • Global parametric motion • Coarse-to-fine estimation • Motion segmentation • KLT Feature Tracking B. Leibe

  34. Motion Field • The motion field is the projection of the 3D scene motion into the image B. Leibe Slide credit: Svetlana Lazebnik

  35. Motion Field and Parallax P(t+dt) • P(t) is a moving 3D point • Velocity of scene point: V = dP/dt • p(t) = (x(t),y(t))is the projection of P in the image. • Apparent velocity v in the image: given by components vx = dx/dtand vy = dy/dt • These components are known as the motion field of the image. V P(t) v p(t+dt) p(t) B. Leibe Slide credit: Svetlana Lazebnik

  36. Quotient rule: D(f/g) = (g f’ – g’ f)/g^2 Motion Field and Parallax P(t+dt) To find image velocity v, differentiate p with respect to t (using quotient rule): • Image motion is a function of both the 3D motion (V) and the depth of the 3D point (Z). V P(t) v p(t+dt) p(t) B. Leibe Slide credit: Svetlana Lazebnik

  37. Motion Field and Parallax • Pure translation: V is constant everywhere B. Leibe Slide credit: Svetlana Lazebnik

  38. Motion Field and Parallax • Pure translation: V is constant everywhere • Vzis nonzero: • Every motion vector points toward (or away from) v0, the vanishing point of the translation direction. B. Leibe Slide credit: Svetlana Lazebnik

  39. Motion Field and Parallax • Pure translation: V is constant everywhere • Vzis nonzero: • Every motion vector points toward (or away from) v0, the vanishing point of the translation direction. • Vz is zero: • Motion is parallel to the image plane, all the motion vectors are parallel. • The length of the motion vectors is inversely proportional to the depth Z. B. Leibe Slide credit: Svetlana Lazebnik

  40. Topics of This Lecture • Introduction to Motion • Applications, uses • Motion Field • Derivation • Optical Flow • Brightness constancy constraint • Aperture problem • Lucas-Kanade flow • Iterative refinement • Global parametric motion • Coarse-to-fine estimation • Motion segmentation • KLT Feature Tracking B. Leibe

  41. Optical Flow • Definition: optical flow is the apparent motion of brightness patterns in the image. • Ideally, optical flow would be the same as the motion field. • Have to be careful: apparent motion can be caused by lighting changes without any actual motion. • Think of a uniform rotating sphere under fixed lighting vs. a stationary sphere under moving illumination. B. Leibe Slide credit: Svetlana Lazebnik

  42. Apparent Motion  Motion Field B. Leibe Figure from Horn book Slide credit: Kristen Grauman

  43. Estimating Optical Flow • Given two subsequent frames, estimate the apparent motion field u(x,y) and v(x,y) between them. • Key assumptions • Brightness constancy: projection of the same point looks the same in every frame. • Small motion: points do not move very far. • Spatial coherence: points move like their neighbors. I(x,y,t–1) I(x,y,t) B. Leibe Slide credit: Svetlana Lazebnik

  44. The Brightness Constancy Constraint • Brightness Constancy Equation: • Linearizing the right hand side using Taylor expansion: • Hence, I(x,y,t–1) I(x,y,t) Spatial derivatives Temporal derivative B. Leibe Slide credit: Svetlana Lazebnik

  45. The Brightness Constancy Constraint • How many equations and unknowns per pixel? • One equation, two unknowns • Intuitively, what does this constraint mean? • The component of the flow perpendicular to the gradient (i.e., parallel to the edge) is unknown gradient (u,v) • If (u,v) satisfies the equation, so does (u+u’, v+v’) if (u’,v’) (u+u’,v+v’) edge B. Leibe Slide credit: Svetlana Lazebnik

  46. The Aperture Problem Perceived motion B. Leibe Slide credit: Svetlana Lazebnik

  47. The Aperture Problem Actual motion B. Leibe Slide credit: Svetlana Lazebnik

  48. The Barber Pole Illusion http://en.wikipedia.org/wiki/Barberpole_illusion B. Leibe Slide credit: Svetlana Lazebnik

  49. The Barber Pole Illusion http://en.wikipedia.org/wiki/Barberpole_illusion B. Leibe Slide credit: Svetlana Lazebnik

  50. The Barber Pole Illusion http://en.wikipedia.org/wiki/Barberpole_illusion B. Leibe Slide credit: Svetlana Lazebnik

More Related