1 / 34

Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment

Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment. Dong Xu , Member, IEEE, and Shih-Fu Chang, Fellow, IEEE. Outline. Introduction Scene-Level Concept Score Feature Single-Level Earth Mover’s Distance in The Temporal Domain Temporally Aligned Pyramid Matching

efrat
Download Presentation

Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Video Event Recognition Using Kernel Methodswith Multilevel Temporal Alignment Dong Xu, Member, IEEE, and Shih-Fu Chang, Fellow, IEEE

  2. Outline Introduction Scene-Level Concept Score Feature Single-Level Earth Mover’s Distance in The Temporal Domain Temporally Aligned Pyramid Matching Experiments Contributions and Conclusion

  3. 1. Introduction Previous work on video event recognition can be roughly classified as either activity recognition or abnormal event recognition

  4. Model-based Abnormal event recognition - Zhang et al. [1] propose a semisupervised adapted Hidden Markov Model (HMM) framework Activity recognition - HMM - coupled HMM - Dynamic Bayesian Network

  5. Appearance-based Abnormal event recognition - Boiman and Irani [7] Activity recognition - Ke et al. [8] - Efros et al. [9] - Other

  6. Event recognition in broadcast news video Rich information Emerging applications of open source intelligence Online video search

  7. LSCOM ontology Large-Scale Concept Ontology for Multimedia Defined 56 event/activity concepts Manual annotation of such event concepts has been completed for a large data set in TRECVID 2005 [15]

  8. Challenges of events in news video Large variations of scenes and activities Difficult to - reliably track moving objects - detect the salient spatiotemporal interest regions - extract the spatial-temporal features

  9. Address the challenges of news video Ebadollahi et al. [17] midlevel Concept score (CS) nonparametric approach bag-of-words model

  10. Bag-of-words model Represent one video clip as a bag of orderless features, extracted from all of the frames Earth Mover’s Distance (EMD) [21] Single-level EMD (SLEMD) Support Vector Machine (SVM) Temporally Aligned Pyramid Matching (TAPM)

  11. Temporally Aligned Pyramid Matching (TAPM)

  12. 2. Scene-Level Concept Score Feature Holistic features to represent content in constituent image frames Multilevel temporal alignment framework to match temporal characteristics of various events

  13. Three low-level global feature Grid Color Moment Gabor Texture Edge Direction Histogram

  14. We used because Efficiently extracted over the large video corpus Effective for detecting several concepts Suitable for capturing the characteristics of scenes

  15. 3. Single-Level Earth Mover’s Distance in The Temporal Domain One video clip P can be represented as a signature: m is the total number of frames, pi is the feature extracted from the ith frame, wpi is the weight of the ith frame, We also represent another video clip Q as a signature: n is the total number of frames

  16. dij is the ground distance between pi and qj

  17. SVM classification

  18. 4. Temporally Aligned Pyramid Matching Spatial Pyramid Matching (SPM) Pyramid Match Kernel (PMK) Temporally Constrained Hierarchical Agglomerative Clustering (T-HAC)

  19. T-HAC

  20. Alignment of Different Subclips Principle Component Analysis (PCA)

  21. Integer-value-constrained EMD

  22. Fusion of Information from Different Levels hl is the weight for level-l

  23. TAPM

  24. 5. Experiments SLEMD algorithm with the simplistic detector that uses a single keyframe and multiple keyframes Multilevel TAPM with the SLEMD method Midlevel CS feature with three low-level features

  25. Single-Level EMD versus Keyframe-Based Algorithm SLEMD algorithm , i.e., TAPM at level-0 Keyframe-based algorithm (KF-CS) Multiframe-based representation (MF-CS)

  26. Multilevel Matching versus Single-Level EMD Level-0 (L0), level-1 (L1), level-2 (L2) Combination of L0 and L1 (L0+L1) - h0 = h1 = 1 Combination of L0, L1 and L2 (L0+L1+L2) - h0 = h1 = h2 = 1 Combination of L0, L1 and L2 (L0+L1+L2-d) - h0 = h1 = 1, h2 = 2

  27. Sensitivity to Clustering Method and BoundaryPrecision

  28. The Effect of Temporal Alignment

  29. Algorithmic Complexity Analysis and Speedup

  30. Concept Score Feature versus Low-LevelFeatures

  31. 6. Contributions and Conclusion First systematic studies of diverse visual event recognition in the unconstrained broadcast news domain with clear performance improvements

More Related