1 / 39

Space-time interest points

Space-time interest points. Ivan Laptev and Tony Lindeberg. Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute of Technology) SE-100 44 Stockholm, Sweden. General motivation.

sevita
Download Presentation

Space-time interest points

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Space-time interest points Ivan Laptev and Tony Lindeberg Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute of Technology) SE-100 44 Stockholm, Sweden

  2. General motivation • Spatio-temporal image data contains rich information about the external world. • Traditional methods for video analysis include • optical flow estimation; • tracking of features/models over time. • Observation: Events in video are often characterised by non- constant motion and non-constant appearance

  3. Idea: detect points with high spatio-temporal variation of image values • Direct method for event detection Spatio-temporal data

  4. Why local features in time? Non-constant motion in images may be an indication of • physical interaction between objects in the world (ball bouncing the ground, car crash, etc.) • non-rigid motion, e.g. relative motion of body parts, gestures, etc. • occlusions/disocclusions in the field of view Goal: • make a sparse and informative representation of complex motion patterns; • obtain robustness w.r.t. missing data (occlusions) and outliers (dynamic, complex background)

  5. Interest points in space • (Harris and Stephens 1988): image points with high variation of values in both image direction • High eigenvectors of the second-moment matrix integrated at the local neighbourhood where Lx, Ly are Gaussian derivatives • Select points with positive maxima of the corner function

  6. Interest points in space-time • High variation of image values in both space and time • extend Harris corner function into 3D spatio-temporal domain; compute the second moment matrix where Lx, Ly , Lt are Gaussian derivatives in space-time obtained by spatio-temporal convolution: and

  7. Interest points in space-time • Points with high space-time variations of image values correspond to the maxima of where are eigenvalues of . • distinct scale parameters for the spatial scale and the temporal scale : spatial and temporal extents of events are independent in general. • Convolution with the Gaussian kernels violates causality constraint of temporal domain. Alternative (recursive) kernels can be used to address this problem (Koenderink 1988, Lindeberg & Fagerström 1996, Florack 1997)

  8. Experiments with synthetic sequences Spatio-temporal ”corner” Collision I

  9. 2=16 2=8 2=16 2=8 Experiments with synthetic sequences Collision II

  10. 2=2 2=8 2=2 2=2 2=2 2=8 2=8 2=8 Motivation for scale selection

  11. Motivation for velocity adaptation vx=-0.8 vx=0.0 vx=1.4

  12. Spatio-temporal scale selection • Estimate the spatio-temporal extent of image structures • Local scale estimation has been investigated and applied previously in the spatial domain (Lindeberg IJCV’98; Chomat et.al. ECCV’00; Mikolajczyk and Schmid ICCV’01): • Here: Extend scale selection into the spatio-temporal domain; estimate spatial and temporal scale parameters Task: find normalisation parameters (a,b,c,d) of such that normalised derivatives obtain extrema at scales corresponding to the extents of image structures in space-time

  13. Spatio-temporal scale selection • Analyse spatio-temporal blob • Extrema constraints Give parameter values a=1, b=1/4, c=1/2, d=3/4

  14. Spatio-temporal scale selection  The normalised spatio-temporal Laplacian operator assumes extrema values at positions and scales corresponding to the centres and the spatio-temporal extent of a Gaussian blob

  15. Velocity adaptation Want to adapt point neighbourhoods to the direction of motion and obtain invariance w.r.t. the first-order motion Stationary pattern: First-order motion is described by the Galilean transformation where and it follows

  16. expansion gives However, this scheme needs the estimate of in advance in order to adapt the smoothing filter kernel .  Iteratively estimate and adapt the filter kernel until the fixed-point condition is reached: with (Similar approach for affine shape adaptation in space, Lindeberg) Velocity adaptation

  17. Scale and velocity adaptation • Find interest points p=(x,y,t,2,2,vx,vy) that are • maxima of the corner function H over (x,y,t); • maxima of the normalised Laplacian over (2,2); • satisfy fixed-point condition Approach: • Find interest points P for a set of sampled (2,2,vx,vy) • For each pi in P • select new scale (2,2) at (x,y,t) that maximises Laplacian in the local scale-neighbourhood • estimate velocity (vx,vy) • re-detect interest point for new scales and velocities • If changes in (2,2 ,vx,vy) => repeat from 3. • else i=i+1 (Similar in spatial domain: Mikolajczyk and Schmid ICCV01, ECCV02)

  18. Scale- and velocity-adapted interest points

  19. Experiments Stabilised camera Stationary camera

  20. Experiments Scale and velocity adaptation No adaptation Scale adaptation Stabilised camera Stationary camera

  21. Experiments Invariance with respect to size changes

  22. Experiments Selection of temporal scales captures the temporal extents of events

  23. Applications of interest points (preliminary results) • Classify detected interest points using their spatio-temporal neighbourhoods • Represent video data by a set of classified interest points (features) • Align video sequences by matching spatio-temporal features • Recognise motion patterns using probability distribution of features derived from training sequences

  24. • Describe each interest point pi, i=1,...,n by the local responses of spatio-temporal Gaussian derivatives: and normalise descriptors w.r.t. the covariance Classification of events • When analysing periodic motion such as the gait pattern, the interest points with similar spatio-temporal structure are likely to correspond to the interesting events, while the others are more likely to be caused by noise. • Group similar points in the space of normalised descriptors using k-means clustering • Select significant clusters and represent each of them by the mean and the covariance matrix

  25. K-means clustering For the gait pattern, four significant clusters (clusters with most points) correspond to distinct spatio-temporal events c1 Clustering c2 c3 c4 Classification

  26. Application I: Sequence matching Problem: Find walking people and estimate their poses from image sequences Match a model sequence with data sequences using spatio-temporal interest points  • Represent the model sequence and the test sequence by a set of classified spatio-temporal points. • Find a valid transformation of a model that brings model features in correspondence with data features. Note: the feature matching is defined in a 3D spatio-temporal window

  27. Walking model • Represent the gait pattern using classified spatio-temporal points corresponding the one gait cycle • Define the state of the model X for the moment t0 by the position, the size, the phase and the velocity of a person: • Associate each phase  with a silhouette of a person extracted from the original sequence

  28. Sequence alignment • Given a data sequence with the current moment t0, detect and classify interest points in the time window of length tw: (t0, t0-tw) • Transform model features according to X and for each model feature fm,i=(xm,i, ym,i, tm,i, m,i, m,i, cm,i) compute its distance dito the most close data feature fd,j, cd,j=cm,i: • Define the ”fit function” D of model configuration X as a sum of distances of all features weighted w.r.t. their ”age” (t0-tm) such that recent features get more influence on the matching

  29. Sequence alignment At each moment t0 minimize D with respect to X using standard Gauss-Newton minimization method data features model features

  30. Experiments

  31. Experiments

  32. Application II: Action recognition • Detect spatio-temporal velocity- and scale-adapted interest points and compute their jet descriptors • Cluster all the descriptors using k-means • Compute distributions of points over detected clusters for each sequence separately Walking Exercise Running Cycling

  33. Model histograms Walking Exercise Running Cycling Cluster id Cluster id Cluster id Cluster id (related to Leung & Malik, IJCV01)

  34. Test sequences Walking Cycling Exercise Background Running

  35. 2. Compute distribution of cluster labels and classify the sequence as an action if Confusion matrix: Walking Exercise Running Cycling test walking test exercise test running test cycling test background Classification • Detect interest points and classify their jet responses w.r.t. the cluster means :

  36. Classification % correct % false ROC curve corresponding to changes of the decision threshold when classifying 37 sequences using different histogram-distance measures

  37. Performance comparison Velocity- and scale-adapted space-time interest points Non-adapted space-time interest points Spatial interest points

  38. Back-projection of points Test walking Test running Test exercise Test cycling

  39. Summary Interest point detection • Points with high variation of image values in space-time are detected • Direct approach for event detection (no tracking needed) • invariant treatment of events at different spatial and temporal scales; invariance w.r.t. camera motion Applications • Classified space-time features provide a compact representation of video information • Interpretation of scenes with complex, non-stationary backgrounds Future work: contrast and orientation invariant descriptors, large-scale action recognition experiments, integration of multi-local constraints, on-line implementation.

More Related