1 / 59

Beyond Actions: Discriminative Models for Contextual Group Activities

M.Sc. Thesis Defense. Beyond Actions: Discriminative Models for Contextual Group Activities. Tian Lan School of Computing Science Simon Fraser University August 12, 2010. Outline. Group Activity Recognition with Context Structure-level (latent structures)

ulani
Download Presentation

Beyond Actions: Discriminative Models for Contextual Group Activities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. M.Sc. Thesis Defense Beyond Actions: Discriminative Models for Contextual Group Activities TianLan School of Computing Science Simon Fraser University August 12, 2010

  2. Outline • Group Activity Recognition with Context • Structure-level (latent structures) • Feature-level (Action Context descriptor) • Introduction • Experiments

  3. Activity Recognition • Goal Enable computers to analyze and understand human behavior. Answering a phone Kissing

  4. Action vs. Activity Activity: a group of people forming a queue Action: Stand in a queue and facing left

  5. Activity Recognition • Activity Recognition is important • Activity Recognition is difficult intra-class variation, background clutter, partial occlusion, etc. HCI Surveillance Sport Entertainment

  6. Group Activity Recognition • Motivation human actions are rarely performed in isolation, the actions of individuals in a group can serve as context for each other. • Goal explore the benefit of contextual information in group activity recognition in challenging real-world applications

  7. Group Activity Recognition Context

  8. Group Activity Recognition • Two types of Context Talk … … group-person interaction person-person interaction

  9. Latent Structured Model Activity activity class h y h1 y h2 … action class Action Hidden layer x2 xn Feature x1 image x0

  10. Latent Structured Model group-person Interaction activity class hn y h1 y person-person Interaction h2 … action class x2 xn Structure-level x1 Feature-level image x0

  11. Difference from Previous Work • Group Activity Recognition • Our work • Group activity recognition in realistic videos • Two new types of contextual information • A unified framework • Previous Work • Single-person action recognition • Schuldt et al. icpr 04 • Relative simple activity recognition • Vaswani et al. cvpr 03 • Dataset in controlled conditions

  12. Difference from Previous Work • Latent Structured Models Previous work a pre-defined structure for the hidden layer, e.g. tree (HCRF) ( Quattoni et al. pami 07, Felzenszwalb et al. cvpr 08) Our work latent structure for the hidden layer, automatically infer it during learning and inference.

  13. Outline • Group Activity Recognition with Context • Structure-level (latent structures) • Feature-level (Action Context descriptor) • Introduction • Experiments

  14. Structure-level Approach activity class y person-person Interaction hn y h1 … h2 action class Structure-level x2 xn x1 Feature-level image x0

  15. Structure-level Approach • Latent Structure Queue ? Talk Talk

  16. Model Formulation y Input: image-label pair (x,h,y) … hn y h1 h2 Image-Action Action-Activity Image-Activity Action-Action x1 x2 xn x0

  17. Inference • Score an image x with activity label y • Infer the latent variables NP hard !

  18. Inference • Holding Gy fixed, • Holding hy fixed, Loopy BP ILP

  19. Learning with Latent SVM Optimization: Non-convex bundle method (Do & Artieres, ICML 09)

  20. Feature-level Approach activity class y person-person Interaction hn y h1 … h2 action class Structure-level x2 xn x1 Feature-level image x0

  21. Feature-level Approach activity class y • Model action class h y h1 h2 … Action Context Descriptor x1 x2 xn image x0

  22. Action Context Descriptor τ τ z + action Focal person Context (b) (a) action (c)

  23. Action Context Descriptor Feature Descriptor Multi-class SVM e.g. HOG by Dalal & Triggs score score score score max action class action class action class action class …

  24. Outline • Group Activity Recognition with Context • Structure-level (latent structures) • Feature-level (Action Context descriptor) • Introduction • Experiments

  25. Dataset • Collective Activity Dataset (Choi et al. VS 09) • 5 action categories: crossing, waiting, queuing, walking, talking. (per person) • 44 video clips

  26. Collective Activity Dataset

  27. Dataset • Nursing Home Dataset • activity categories: fall, non-fall. (per image) • 5 action categories: walking, standing, sitting, bending and falling. (per person) • In total 22 video clips (2990 frames), 8 clips for test, the rest for training. 1/3 are labeled as fall.

  28. Nursing Home Dataset

  29. Baselines h2 h4 h4 h4 h4 • root (x0) + svm (no structure) • No connection • Min-spanning tree • Complete graph within r h2 h2 h2 h1 Hidden layer h1 h1 h3 h3 h3 r h1 h3 Structure-level approach

  30. System Overview u Person Detector Model Person Descriptor Video v • Pedestrian Detection • by Felzenszwalb et al. • Background Subtraction • HOG by Dalal & Triggs • LST by Loy et al. • at cvpr 09

  31. Results – Collective Activity Dataset

  32. Results – Correct Examples

  33. Results – Incorrect Examples Crossing Waiting

  34. Walking Talking Queuing

  35. Results – Nursing Home Dataset

  36. Results – Correct Examples

  37. Results – Incorrect Examples

  38. Conclusion • A discriminative model for group activity recognition with context. • Two new types of contextual information: • group-person interaction • person-person interaction • structure-level: Latent structure • Feature-level: Action Context descriptor • Experimental results demonstrate the effectiveness of the proposed model

  39. Future Work • Modeling Complex Structures • Temporal dependencies among action • Contextual Feature Descriptors • How to encode discriminative context? • Weakly supervised Learning • e.g. multiple instance learning for fall detection

  40. Thank you!

  41. Pairwise Weight hj y hk

  42. Pairwise Weight

  43. Pairwise Weight

  44. Infer the graph structures

  45. Results – Nursing Home Dataset 0/1 loss – optimize overall accuracy

  46. Results – Nursing Home Dataset new loss – optimize mean per-class accuracy

  47. Person Detectors • Collective Activity Dataset: • Pedestrian Detector (Felzenszwalb et al., CVPR 08) • Nursing Home Dataset Background Subtraction Moving Regions Video

  48. Person Descriptors • Collective Activity Dataset: • HOG • Nursing Home Dataset • Local Spatial Temporal (LST) Descriptor (Loy et al., ICCV 09) u v

More Related