1 / 26

Monocular 3D Pose Estimation and Tracking by Detection

Monocular 3D Pose Estimation and Tracking by Detection. Mykhaylo Andriluka 1,2 , Stefan Roth 1 , Bernt Schiele 1,2 , 1 Department of Computer Science, TU Darmstadt 2 MPI Informatics, Saarbrucken. Outline. Motivation Approach Related Work Result Conclusion Future Work. Motivation.

nikki
Download Presentation

Monocular 3D Pose Estimation and Tracking by Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Monocular 3D Pose Estimation and Tracking by Detection Mykhaylo Andriluka1,2, Stefan Roth1, Bernt Schiele1,2, 1Department of Computer Science, TU Darmstadt 2MPI Informatics, Saarbrucken

  2. Outline • Motivation • Approach • Related Work • Result • Conclusion • Future Work

  3. Motivation • Goal : find people and estimate their poses in 3D from monocular potentially moving camera

  4. Challenge: • Real world scenarios, such as crowded street scene • Tracking of multiple people • Ambiguity in 2D to 3D lifting

  5. Approach • Three stages • Two contribution • Novel approach to human pose estimation • New pedestrian detection approach Single Frame Detection 2D-Tracklet Detection 2D-to-3D Lifting

  6. Related Work • Single Frame Detection People Detection [Leibe et al., CVPR’05] [Dalal&Triggs, CVPR’05] [Wojek et al., CVPR’09] 2D Pose Estimation [Felzenszwalb&Huttenlocher, IJCV’05] [Sigal&Black, CVPR’06], [Eichner&Ferrari, BMVC’09], [Andriluka et al., CVPR’09]

  7. Related Work • 2D-Tracklet Detection • 2D-to-3D Lifting People Tracking [Wu&Nevatia, IJCV’07] [Andriluka et al., CVPR’08] [Breitenstein et al., ICCV’09] 3D Pose Estimation [Urtasun et al., CVPR’06] [Rogez et al., CVPR’08] [Gall et al., IJCV’10]

  8. 1st stage:Single Frame Detection • Basic Pictorial Structure Model p(Lm|Dm)∝p(Dm|Lm)p(Lm) Lm={lm0,lm1,…, lmN} lmi={xmi, ymi,Өmi, smi} Dm:single frame image evidence m: current frame N: different part x , y: image position Ө: absolute orientation s: scale

  9. p(Dm|Lm)= • p(Lm)=p(lm0) K: the set of edges representing kinematic relationships between parts Local feature Adaboost Orientation K Estimated pose Likelihood of part N

  10. Combining Pictorial Structure Model • Model is composed of 8 body parts • A dataset of 2336 images for training ,496 for validation and testing • Assuming viewpoint annotations, bounding box and body joints

  11. Two strategies for combining the output of the bank of detector • For detection: train SVM classifier using output of PS models at features • For viewpoint classification: train 1-vs-all SVM classifier for each of the 8 viewpoints

  12. Results:single frame detection

  13. Without training dataset(top) With training dataset(middle) Combining the output of viewpoint-specific detectors with a linear SVM(bottom)

  14. Viewpoint classification accuracy: 42.2% in classification of one of the 8 viewpoint • False positives:

  15. 2nd stage:2D-Tracklet Detection • Find consistent sequences of 2D hypotheses with Viterbi algorithm

  16. Box hypotheses Hm=[hm1,…,hmN] hmi={hxm1,hym1,h s m1} • The transition probabilities between states hmiand hm-1,j: ptrans(hmi,hm-1,j)=N(hmi|hm-1,j,pos) . N(dapp(hmi,hm-1,j)|0,σapp2) pos=diag(σx2, σy2, σs2), σx=σy=5, σs=0.1 σapp=0.05 dapp(hmi,hm-1,j) is the Euclidean distance between RGB color histograms computed for the bounding rectangle of each hypothesis

  17. People detection based on single frame(top) Tracklets found by 2D tracking algorithm(bottom)

  18. 3rd stage:2D-to-3D Lifting • Bayesian formulation p(Q1:M|ἐ1:M)∝p(ἐ1:M |Q1:M)p(Q1:M) Qm={qm, Фm,hm} qm: the parameters of body joints Фm: the rotation of the body in world coordinate hm={hxm1,hym1,hscale m1}: the position and scale of the person projected to the image ἐm: evidence from stage 1 and 2

  19. p(ἐm|Qm)=p(wm|Qm)p(Dm|Qm) likelihood of 3D parameter • Projection of 3D pose evaluated under 2D body part posteriors: p(Dm|Qm)= • Restricted orientation to be close to estimated viewpoint: p(wm|Qm)=N(wm|Фm,σw2)

  20. Find local maxima with scaled conjugate gradients • Initialization : 2D to 3D lifting by finding sequences of 3D exemplars that fit well to 2D tracklet observation

  21. Initial pose sequence after 2D-to-3D lifting(top) Pose sequence after optimization of the 3D pose posterior(bottom)

  22. Results: 2D-to-3D lifting

  23. Subject S2/Camera C1(top) Subject S2/Camera C1(bottom) 2D results are in pixels and 3D results are in millimeters

  24. Conclusion • Novel approach to monocular 3D human pose estimation and tracking • Leverage recent results in people detection, 2D pose estimation and human motion modeling • State-of-the-art performance in lab setting • Applicable in uncontrolled street conditions

  25. Future Work • Broader class of motions and more detailed evaluation

More Related