1 / 26

Recognizing Action at a Distance

Recognizing Action at a Distance. A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley. 3-pixel man Blob tracking vast surveillance literature. 300-pixel man Limb tracking e.g. Yacoob & Black, Rao & Shah, etc. Looking at People. Near field. Far field. Medium-field Recognition.

chyna
Download Presentation

Recognizing Action at a Distance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

  2. 3-pixel man Blob tracking vast surveillance literature 300-pixel man Limb tracking e.g. Yacoob & Black, Rao & Shah, etc. Looking at People Near field Far field

  3. Medium-field Recognition The 30-Pixel Man

  4. Appearance vs. Motion Jackson Pollock Number 21 (detail)

  5. Goals • Recognize human actions at a distance • Low resolution, noisy data • Moving camera, occlusions • Wide range of actions (including non-periodic)

  6. Our Approach • Motion-based approach • Non-parametric; use large amount of data • Classify a novel motion by finding the most similar motion from the training set • Related Work • Periodicity analysis • Polana & Nelson; Seitz & Dyer; Bobick et al; Cutler & Davis; Collins et al. • Model-free • Temporal Templates [Bobick & Davis] • Orientation histograms [Freeman et al; Zelnik & Irani] • Using MoCap data [Zhao & Nevatia, Ramanan & Forsyth]

  7. Gathering action data • Tracking • Simple correlation-based tracker • User-initialized

  8. Figure-centric Representation • Stabilized spatio-temporal volume • No translation information • All motion caused by person’s limbs • Good news: indifferent to camera motion • Bad news: hard! • Good test to see if actions, not just translation, are being captured

  9. Remembrance of Things Past run jog swing walk right walk left motion analysis database • “Explain” novel motion sequence by matching to previously seen video clips • For each frame, match based on some temporal extent input sequence Challenge: how to compare motions?

  10. How to describe motion? • Appearance • Not preserved across different clothing • Gradients (spatial, temporal) • same (e.g. contrast reversal) • Edges/Silhouettes • Too unreliable • Optical flow • Explicitly encodes motion • Least affected by appearance • …but too noisy

  11. Spatial Motion Descriptor blurred Image frame Optical flow

  12. Spatio-temporal Motion Descriptor Temporal extent E E A A E I matrix E B B E frame-to-frame similarity matrix motion-to-motion similarity matrix blurry I … … Sequence A S … … Sequence B t

  13. Football Actions: matching Input Sequence Matched Frames input matched

  14. Football Actions: classification 10 actions; 4500 total frames; 13-frame motion descriptor

  15. Classifying Ballet Actions 16 Actions; 24800 total frames; 51-frame motion descriptor. Men used to classify women and vice versa.

  16. Classifying Tennis Actions 6 actions; 4600 frames; 7-frame motion descriptor Woman player used as training, man as testing.

  17. Classifying Tennis • Red bars show classification results

  18. Querying the Database run jog swing walk right walk left Action Recognition: run walk left swing walk right jog Joint Positions: input sequence database

  19. 2D Skeleton Transfer We annotate database with 2D joint positions After matching, transfer data to novel sequence Ajust the match for best fit Input sequence: Transferred 2D skeletons:

  20. 3D Skeleton Transfer We populate database with rendered stick figures from 3D Motion Capture data Matching as before, we get 3D joint positions (kind of)! Input sequence: Transferred 3D skeletons:

  21. “Do as I Do” Motion Synthesis • Matching two things: • Motion similarity across sequences • Appearance similarity within sequence(like VideoTextures) • Dynamic Programming input sequence synthetic sequence

  22. “Do as I Do” Source Motion Source Appearance 3400 Frames Result

  23. “Do as I Say” Synthesis • Synthesize given action labels • e.g. video game control run walk left swing walk right jog run jog swing walk right walk left synthetic sequence

  24. “Do as I Say” • Red box shows when constraint is applied

  25. Actor Replacement SHOW VIDEO (GregWorldCup.avi, DivX)

  26. Conclusions • In medium field action is about motion • What we propose: • A way of matching motions at coarse scale • What we get out: • Action recognition • Skeleton transfer • Synthesis: “Do as I Do” & “Do as I say” • What we learned? • A lot to be said for the “little guy”!

More Related