1 / 47

Action Recognition Robust to Occlusion U sing an Efficient Part-Based Approach

Action Recognition Robust to Occlusion U sing an Efficient Part-Based Approach. 強健於遮蔽之一種 有效率 組件式動作辨識. 學 生:蔡日昇 指導 教授:傅立成 教授. 1. Outline. Introduction Part-Based Representation Action Recognition Experiments Conclusion & Future Work. 2. Outline. Introduction

tanuja
Download Presentation

Action Recognition Robust to Occlusion U sing an Efficient Part-Based Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Action Recognition Robust to Occlusion Using an Efficient Part-Based Approach 強健於遮蔽之一種有效率組件式動作辨識 學 生:蔡日昇 指導教授:傅立成 教授 1

  2. Outline • Introduction • Part-Based Representation • Action Recognition • Experiments • Conclusion & Future Work 2

  3. Outline • Introduction • Part-Based Representation • Action Recognition • Experiments • Conclusion & Future Work 3

  4. Introduction ─ Background • Human action recognition has become popular with many applications • Human robot interaction • Surveillance • Video games • Natural way is preferablein 10 years intrusive cumbersome natural 4

  5. Introduction ─ Motivation • Human action recognition in complex environment • Occlusion is an important issue for vision-based system • Occlusion is a challenging problem[1─3] [1] S. Vishwakarma and A. Aggarwal, “A Survey on Activity Recognition and Behavior Understanding in Video Surveillance,” The Visual Computer, 2012. [2] J. Aggarwal and M. S. Ryoo, “Human activity analysis: A review,” ACM Computing Surveys (CSUR), 2011. [3] Weinlandet al. “A survey of vision-based methods for action representation, segmentation and recognition,” Computer Vision and Image Understanding, 2010. 5

  6. Introduction ─ Challenges • Occlusion can be categorized into • Self occlusion • Partial occlusion • Temporary complete occlusion Self occlusion Partial occlusion Temporary complete occlusion 6

  7. Introduction ─ Related Work * T.C.: Temporary complete O: Yes ─: No ∆: Not mentioned [1] Keet al. “Event detection in crowded videos,” IEEE International Conference onComputer Vision, 2007. [2] Ahadet al. “Analysis of motion self-occlusion problem due to motion overwriting for human activity recognition,” Journal of Multimedia, 2010. [3] Weinlandet al. “Making action recognition robust to occlusions and viewpoint changes,” European Conference on Computer Vision, 2010. [4] Wang et al. “Robust 3d action recognition with random occupancy patterns,” European Conference on Computer Vision, 2012. 7

  8. Introduction ─ Objective • Build an efficient vision-based action recognition system • Accurately recognize human actions • Robustly handle occlusion • Design an efficient part-based approach • Correctly spot continuous actions 8

  9. Introduction ─ System Overview Off-line training phase Part-Based Representation Part-Based Representation SVM Training SVM Training Action Recognition Action Recognition RGB Image Sequence Action Spotting Action Spotting Action Classifiers Sub-action Classifiers Sub-action Classifiers Action Classifiers Action Prior Database Action Prior Database Depth Image Sequence Result Result Part-Based Representation Part-Based Representation On-line testing phase 9

  10. Outline • Introduction • Part-Based Representation • Action Recognition • Experiments • Conclusion & Future Work 10

  11. Part-Based Representation─ Flowchart t x y RGB Image Sequence Preprocessing Temporal-Pyramid BoW Part Assignment Feature Extraction Depth Image Sequence BoW: Bag-of-Words 11

  12. Part-Based Representation─ Preprocessing & Feature Extraction Spatio-Temporal Interest Point[1] Human Segmentation Noise Removal ▪ To eliminate features out of human segment ▪ Local feature is attractive for handling occlusion ▪ Shape and motion are two important features of an action ▪ To remove complex background ▪ To detect occlusion Non-occlusion Occlusion 12 [1] Laptevet al. “Learning realistic human actions from movies,” IEEE Conference onComputer Vision and Pattern Recognition, 2008.

  13. Part-Based Representation─ Part Assignment (1/3) • Define part based on skeleton provided by OpenNI • With physical meaning • Take within-class variation into consideration ■ Head ■ Torso ■ Hand ■ Foot 13

  14. Part-Based Representation─ Part Assignment (2/3) • Some joints of skeleton are vulnerable when there is occlusion • Every pair of parts are independentin ourdefinition Out of camera view Valid joints Self occlusion Unreasonable depth ■ Head ■ Torso ■ Hand ■ Foot Self occlusion Partial occlusion 14

  15. Part-Based Representation─ Part Assignment (3/3) • Every feature is assigned into one part in a nearest-neighbor scheme : set of valid joints : Euclidean distance function : 2D position of feature : 2D position of joint : part label of joint ■ Head ■ Torso ■ Hand ■ Foot Non-occlusion Occlusion 15

  16. Part-Based Representation─ Action Representation t x • An action is represented by a set of RGB-D Temporal-Pyramid BoWs[1] y BoW: Bag-of-Words RGB-D Temporal-Pyramid BoW Global Part Assignment RGB-D Temporal-Pyramid BoW Part 1 … … RGB-D Temporal-Pyramid BoW Part k [1] J.-S. Tsai and L.-C. Fu,“An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid Representation,” IEEE/RSJ International Conference onIntelligent Robots and Systems, 2013. 16

  17. Part-Based Representation─ Temporal-Pyramid Bag-of-Words (1/2) • Bag-of-Words (BoW) • Independent of number of features • Lose temporal layout 1. Generate codebook during the training phase 3. Count the frequency of each feature type 2. Represent each feature by its type Frequency Feature type A given sample Training features 17

  18. Part-Based Representation─ Temporal-Pyramid Bag-of-Words (2/2) x t • Temporal-Pyramid BoW • Can distinguish actions with reversed temporal orders y RGB image sequence Depth image sequence … … … … Temporal Level 0 Temporal Level 1 Temporal Level L-1 RGB-D Temporal-Pyramid BoW t 18

  19. Outline • Introduction • Part-Based Representation • Action Recognition • Experiments • Conclusion & Future Work 19

  20. Action Recognition─ Training (1/2) • Two kinds of SVMs are trained for each class • A global SVM • A local SVM for each part • Each SVM is trained in two stages[1] SVM: Support Vector Machine Step 2: Re-train a new version of SVM Step 1: Train the preliminary SVM Hard Examples Training Examples SVM SVM Training Database Search for false positive examples in training database [1]N. Dalal and B. Triggs, “Histograms of Oriented gradients for human detection,” IEEE Conference onComputer Vision and Pattern Recognition, 2005. 20

  21. Action Recognition─ Training (2/2) • Action prior database is constructed to help action recognition • Some actions are much associated with particular parts : a training example : features belonging to part : small positive constant ■ Head ■ Torso ■ Hand ■ Foot Boxing Training Database Running : action class : number of examples labeled asclass in Kicking 21

  22. Action Recognition─ Action Spotting (1/2) • Starting frame detection Start Get next window Sliding window-based approach Sub-action Classifiers Action start? No Yes 22

  23. Action Recognition─ Action Spotting (2/2) • Ending frame detection Get next frame Action Classifiers Action end? Yes New candidate Sequential-based approach No No Time buffer is filled? Yes Winner selection End 23

  24. Action Recognition─ Action Recognition (1/2) • Recognition score is computed for each class Action Prior Database Action Prior Information Global (Part 0) RGB-D Temporal- Pyramid BoW RGB-D Temporal- Pyramid BoW RGB-D Temporal- Pyramid BoW SVM Score Part 1 Recognition Score Part Assignment Weighted Sum SVM Score … Part k : an example : action class SVM Score 24

  25. Action Recognition─ Action Recognition (2/2) • Weight is associated with degree of occlusion • Recognize actions mainly by less occluded parts • Action is recognized as the class with maximal recognition score : number of joints composing part : number of valid joints belonging to part at frame : number of parts : ratio of valid joints over all joints belonging to part in a time segment : action class : recognition score belonging to class of input example 25

  26. Outline • Introduction • Part-Based Representation • Action Recognition • Experiments • Conclusion & Future Work 26

  27. Experiments─ Experimental Setting (1/2) • Experimental platform 27

  28. Experiments─ Experimental Setting (2/2) • Measurement • Action recognition • Action spotting • Classifier type • Non-linear SVM with RBF Kernel : True positive : False positive : False negative More than 50% overlap with ground truth RBF: Radial Basis Function 28

  29. Experiments─ Temporal-Pyramid BoW Evaluation (1/3) • Dataset KTH[1] RGB-D HuDaAct[2] 12 types of human daily activities 6types of actions 4 scenarios outdoors outdoors with scale variation outdoors with different clothes indoors [1]Schuldtet al. “Recognizing human actions: a local SVM approach,” International Conference onPattern Recognition, 2004. [2]Ni et al. “RGBD-HuDaAct: A color-depth video database for human daily activity recognition,” IEEE International Conference on Computer Vision Workshops, 2011. 29

  30. Experiments─ Temporal-Pyramid BoW Evaluation (2/3) • Validation scheme • Leave-one-subject out cross validation • Result on KTH 30 [1] Laptevet al. “Learning realistic human actions from movies,” IEEE Conference onComputer Vision and Pattern Recognition, 2008.

  31. Experiments─ Temporal-Pyramid BoW Evaluation (3/3) • Result on RGB-D HuDaAct Temporal-Pyramid BoW Ni et al.[1] [1] Ni et al. “RGBD-HuDaAct: A color-depth video database for human daily activity recognition,” IEEE International Conference on Computer Vision Workshops, 2011. [2] Zhaoet al. “Combing RGB and Depth Map Features for human activity recognition,” Signal & Information Processing Association Annual Summit and Conference, Asia-Pacific, 2012. 31

  32. Experiments─ Recognition Performance (1/4) • Dataset • 8types of actions • 3 cases of occlusions Baseball striking Boxing Jumping Kicking Tennis serving Swimming Running Basketball shooting 32

  33. Experiments─ Recognition Performance (2/4) • Non-occlusion test • Manually segmented actions • Continuous actions precision recall : True positive : False positive : False negative 33

  34. Experiments─ Recognition Performance (3/4) • Occlusion test • Manually segmented actions Exc/Occlusion Exc/Clean ▪Train data: All non-occlusion data except those that the subject is also involved in testing data ▪Train data: All non-occlusion data except those that the subject is also involved in testing data ▪Testing data: Non-occlusion data ▪Testing data: Occlusion data [1] Weinlandet al. “Making action recognition robust to occlusions and viewpoint changes,” European Conference on Computer Vision, 2010. 34

  35. Experiments─ Recognition Performance (4/4) • Occlusion test • Continuous actions 35

  36. Outline • Introduction • Part-Based Representation • Action Recognition • Experiments • Conclusion & Future Work 36

  37. Conclusion • We proposed a robustly occlusion-handling action recognition system • Robustly handle occlusion by reliable parts • Proposed an efficient part-based approaches with effective Temporal-Pyramid BoW representation • Used action prior information to help action recognition 37

  38. Future Work • The performance of action spotting has space to be improved • Technique of parallel computing can be exploited to reduce latency 38

  39. Thank you for listening!! Q&A 39

  40. Appendix ─ Preprocessing & Feature Extraction • Human segmentation • Align depth image to RGB image • Regularize the range of depth value to [0,255] • Segment human according to depth and motion

  41. Appendix─ Preprocessing & Feature Extraction • Spatio-temporal interest point[1] • Interest point detection Construct scale-space Compute response : spatio-temporal separable Gaussian kernel : input image : constant [1] Laptevet al. “Learning realistic human actions from movies,” IEEE Conference onComputer Vision and Pattern Recognition, 2008.

  42. Appendix─ Preprocessing & Feature Extraction • Spatio-temporal interest point[1] • Use HOG and HOF as feature descriptor • Extract features from RGB image sequence and depth image sequence separately HOG: Histogram of Oriented Gradient HOF: Histogram of Optical Flow [1] Laptevet al. “Learning realistic human actions from movies,” IEEE Conference onComputer Vision and Pattern Recognition, 2008.

  43. Appendix─ Preprocessing & Feature Extraction • Noise removal • An interest point is considered as noise ifthere are many non-human pixels around it • Compute the ratio of pixels with zero-valued depth within a window

  44. Appendix─ Part Assignment • Invalid joint • Illegitimate joint • Joint with illegitimate projective position or unreasonable depth value • Occluded joint : difference between regularized depth value and depth intensity of joint : adjacent joints of joint ,: two thresholds,

  45. Appendix─ Part Assignment • Complexity • Our approach takes • Approaches based on part filters take : number of features : number of joints : number of parts frame size : window size of part filter

  46. Appendix─ Training • Support Vector Machine (SVM) • Linear SVM

  47. Appendix─ Training • Support Vector Machine (SVM) • Non-linear SVM • Transform the original input space into a high dimensional space by a transform function

More Related