html5-img
1 / 16

Univ. of Texas at San Antonio

Univ. of Texas at San Antonio. Human Action Recognition. Hong Lin. Univ. of Texas at San Antonio. Outline. Experimental Results: Image Sequence Tagging. Experimental Results: Image Classification. Research Background Method Experiment Current work.

taro
Download Presentation

Univ. of Texas at San Antonio

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Univ. of Texas at San Antonio Human Action Recognition Hong Lin

  2. Univ. of Texas at San Antonio Outline Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Research Background • Method • Experiment • Current work

  3. Univ. of Texas at San Antonio Research Background Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Human action recognition: • automatically analyze ongoing activities from an unknown video; • essential for visual surveillance, human computer interaction, video retrieval, et al. • Two categories methods: • Single-view: high variation of appearances, shapes; potential occlusions; • Multi-view: difficulties in correlation discovery among multiple views;

  4. Univ. of Texas at San Antonio Research Background: Roughly, we divide activity recognition techniques under single view into two categories: • Model-based methods[1][2] rely on human body tracking or pose estimationin order to model the dynamics of individual body parts for action recognition • Appearance-based methods[3][4] employ appearance features for action recognition 1. global space-time shape templates 2. local spatiotemporal interest points [1] C. Fanti, L. Zelnik-manor, and P. Perona, “Hybrid models for human motion recognition,” inProc. IEEE CVPR, pp. 1166–1173 Jun. 2005. [2] A. Yilmaz, “Recognizing human actions in videos acquired by uncalibrated moving cameras,” inProc. IEEE ICCV, pp. 150–157 Oct. 2005. [3] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 12, pp. 2247–2253, Dec. 2007. [4] C. Schuldt, I. Laptev, and B. Caputo, “Recognizing human actions: A local SVM approach,” in Proc. ICPR, pp. 32–36, Aug. 2004.

  5. Univ. of Texas at San Antonio Research Background: Single-view Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Local space-time feature-based methods: • Advantage: Capture local salient characteristics of appearance and motion; Robust to spatiotemporal shifts and scales, background clutter and multiple motions; • Framework of local space-time feature-based method: Local space–time feature extraction: 1.Detector: select spatio-temporal interest points in video by maximizing specific saliency functions 2.Descriptor: capture shape and motion in the neighborhoods of selected points using image measurements

  6. Univ. of Texas at San Antonio Research Background: Single-view Experimental Results: Image Sequence Tagging Experimental Results: Image Classification BoW+SVM framework: • Bag-of-Words (BoW) [5] : Space-time interest point features are quantized into visual words; A video is then represented as the frequency histogram over the visual words. • SVM classification for modeling and recognition • Wang et al. [6] gave a comprehensive evaluation of the popular local feature detectors and descriptors for the standard BoW+SVM framework Reference: [5]A. Klaser, M. Marszalek, C. Schmid, A spatio-temporal descriptor based on 3d-gradients, in: BMVC’08, 2008. [6]H. Wang, M. M. Ullah, A. Klaser, I. Laptev, C. Schmid, Evaluation of local spatio-temporal features for action recognition, in: BMVC’09, 2009.

  7. Univ. of Texas at San Antonio Method Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Motivation: information about the structure of human body

  8. Univ. of Texas at San Antonio Method Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Partwise BoW + Graph-based Multi-task Learning • Partwise BoW representation Discover the information about human body structure • Multi-task Learning Discover the latent correlation among part-wise visual features Single-taskclassification Part-induced multi-task classification

  9. Univ. of Texas at San Antonio Method: Partwise BoW + graph-based MTL Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Partwise bag-of-word (PBoW) representation • Local space–time feature extraction: Harris3D, HoG/HoF • Body part localization: part model, skeleton information • PBoW generation: 7 Components of PBoW: Level 0: limb-wise BoW head-wise BoW leg-wise BoW foot-wise BoW Level 1: upper body-wise BoW lower body-wise BoW Level 2: full body-wise BoW Part model & Skeleton

  10. Univ. of Texas at San Antonio Method: Partwise BoW + graph-based MTL Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Graph-based Multi-task Learning (GMTL) • Objective: Covert individual BoW-based single-task learning into joint multiple components of PBoW-based multi-task learning • Formulation: To encode the reasonable latent relatedness between part-wise features.

  11. Univ. of Texas at San Antonio Experiment Result Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Evaluation on KTH • KTH Dataset: • 6 kind of actions: • Each of the 6 actions was performed four times by 25 subjects in 4 different scenarios. • All videos were with a static camera with 25fps frame rate. The sequences were down sampled to the spatial resolution of 160x120pixels and have a length of four seconds in average. 

  12. Univ. of Texas at San Antonio Experiment Result Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Evaluation on KTH • Baseline: BoW+SVM Implement the standard framework of BoW+SVM ( kernal) on KTH The best accuracies is 91.0% with 4000-D codebook. The worse results were obtained with 100-D. Reference: An-An Liu, Yuting Su, Hong Lin, et al., “Single/Multi-view Human Action Recognition via Regularized Multi-Task Learning”, Neurocomputing, 2013 (under review).

  13. Univ. of Texas at San Antonio Experiment Result Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Evaluation on MV-TJU Further, we implemented the framework of BoW+SVM ( kernal) with individual part-wise BoW features. The performances show that we can achieve the competitive performance(89.0%) with only 100-D partwise BoW against the best one (91.0%) by 4000-D feature in the standard BoW+SVM framework. Reference: [7]An-An Liu, Yuting Su, Hong Lin, et al., “Single/Multi-view Human Action Recognition via Regularized Multi-Task Learning”, Neurocomputing, 2013 (under review).

  14. Univ. of Texas at San Antonio Experiment Result Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Evaluation on KTH • Performance by PBoW (100D) +GMTL Depending on the human body structure, we implemented seven kinds of graph structures to formulated the 3 levels part-wise BoW features into one multi-task learning problem, hope to encode reasonable latent relatedness between part-wise features

  15. Univ. of Texas at San Antonio Experiment Result Experimental Results: Image Sequence Tagging Experimental Results: Image Classification • Evaluation on MV-TJU • Performance by PBoW (100D) +GMTL Analysis: • The graph penalty can further facilitate common knowledge discovery by MTL • The overall accuracies by MTL with graph structure R6 is promising. • R6 structure is important for effective relatedness transferring

  16. Univ. of Texas at San Antonio Experimental Results: Image Sequence Tagging Experimental Results: Image Classification Thank you! Hong Lin

More Related