301 likes | 530 Views
Monocular 3D Pose Estimation and Tracking by Detection. Mykhaylo Andriluka 1,2 , Stefan Roth 1 , Bernt Schiele 1,2 , 1 Department of Computer Science, TU Darmstadt 2 MPI Informatics, Saarbrucken. Outline. Motivation Approach Related Work Result Conclusion Future Work. Motivation.
E N D
Monocular 3D Pose Estimation and Tracking by Detection Mykhaylo Andriluka1,2, Stefan Roth1, Bernt Schiele1,2, 1Department of Computer Science, TU Darmstadt 2MPI Informatics, Saarbrucken
Outline • Motivation • Approach • Related Work • Result • Conclusion • Future Work
Motivation • Goal : find people and estimate their poses in 3D from monocular potentially moving camera
Challenge: • Real world scenarios, such as crowded street scene • Tracking of multiple people • Ambiguity in 2D to 3D lifting
Approach • Three stages • Two contribution • Novel approach to human pose estimation • New pedestrian detection approach Single Frame Detection 2D-Tracklet Detection 2D-to-3D Lifting
Related Work • Single Frame Detection People Detection [Leibe et al., CVPR’05] [Dalal&Triggs, CVPR’05] [Wojek et al., CVPR’09] 2D Pose Estimation [Felzenszwalb&Huttenlocher, IJCV’05] [Sigal&Black, CVPR’06], [Eichner&Ferrari, BMVC’09], [Andriluka et al., CVPR’09]
Related Work • 2D-Tracklet Detection • 2D-to-3D Lifting People Tracking [Wu&Nevatia, IJCV’07] [Andriluka et al., CVPR’08] [Breitenstein et al., ICCV’09] 3D Pose Estimation [Urtasun et al., CVPR’06] [Rogez et al., CVPR’08] [Gall et al., IJCV’10]
1st stage:Single Frame Detection • Basic Pictorial Structure Model p(Lm|Dm)∝p(Dm|Lm)p(Lm) Lm={lm0,lm1,…, lmN} lmi={xmi, ymi,Өmi, smi} Dm:single frame image evidence m: current frame N: different part x , y: image position Ө: absolute orientation s: scale
p(Dm|Lm)= • p(Lm)=p(lm0) K: the set of edges representing kinematic relationships between parts Local feature Adaboost Orientation K Estimated pose Likelihood of part N
Combining Pictorial Structure Model • Model is composed of 8 body parts • A dataset of 2336 images for training ,496 for validation and testing • Assuming viewpoint annotations, bounding box and body joints
Two strategies for combining the output of the bank of detector • For detection: train SVM classifier using output of PS models at features • For viewpoint classification: train 1-vs-all SVM classifier for each of the 8 viewpoints
Without training dataset(top) With training dataset(middle) Combining the output of viewpoint-specific detectors with a linear SVM(bottom)
Viewpoint classification accuracy: 42.2% in classification of one of the 8 viewpoint • False positives:
2nd stage:2D-Tracklet Detection • Find consistent sequences of 2D hypotheses with Viterbi algorithm
Box hypotheses Hm=[hm1,…,hmN] hmi={hxm1,hym1,h s m1} • The transition probabilities between states hmiand hm-1,j: ptrans(hmi,hm-1,j)=N(hmi|hm-1,j,pos) . N(dapp(hmi,hm-1,j)|0,σapp2) pos=diag(σx2, σy2, σs2), σx=σy=5, σs=0.1 σapp=0.05 dapp(hmi,hm-1,j) is the Euclidean distance between RGB color histograms computed for the bounding rectangle of each hypothesis
People detection based on single frame(top) Tracklets found by 2D tracking algorithm(bottom)
3rd stage:2D-to-3D Lifting • Bayesian formulation p(Q1:M|ἐ1:M)∝p(ἐ1:M |Q1:M)p(Q1:M) Qm={qm, Фm,hm} qm: the parameters of body joints Фm: the rotation of the body in world coordinate hm={hxm1,hym1,hscale m1}: the position and scale of the person projected to the image ἐm: evidence from stage 1 and 2
p(ἐm|Qm)=p(wm|Qm)p(Dm|Qm) likelihood of 3D parameter • Projection of 3D pose evaluated under 2D body part posteriors: p(Dm|Qm)= • Restricted orientation to be close to estimated viewpoint: p(wm|Qm)=N(wm|Фm,σw2)
Find local maxima with scaled conjugate gradients • Initialization : 2D to 3D lifting by finding sequences of 3D exemplars that fit well to 2D tracklet observation
Initial pose sequence after 2D-to-3D lifting(top) Pose sequence after optimization of the 3D pose posterior(bottom)
Subject S2/Camera C1(top) Subject S2/Camera C1(bottom) 2D results are in pixels and 3D results are in millimeters
Conclusion • Novel approach to monocular 3D human pose estimation and tracking • Leverage recent results in people detection, 2D pose estimation and human motion modeling • State-of-the-art performance in lab setting • Applicable in uncontrolled street conditions
Future Work • Broader class of motions and more detailed evaluation