Monocular 3D Pose Estimation and Tracking by Detection

Monocular 3D Pose Estimation and Tracking by Detection Mykhaylo Andriluka1,2, Stefan Roth1, Bernt Schiele1,2, 1Department of Computer Science, TU Darmstadt 2MPI Informatics, Saarbrucken

Outline • Motivation • Approach • Related Work • Result • Conclusion • Future Work

Motivation • Goal : find people and estimate their poses in 3D from monocular potentially moving camera

Challenge: • Real world scenarios, such as crowded street scene • Tracking of multiple people • Ambiguity in 2D to 3D lifting

Approach • Three stages • Two contribution • Novel approach to human pose estimation • New pedestrian detection approach Single Frame Detection 2D-Tracklet Detection 2D-to-3D Lifting

Related Work • Single Frame Detection People Detection [Leibe et al., CVPR’05] [Dalal&Triggs, CVPR’05] [Wojek et al., CVPR’09] 2D Pose Estimation [Felzenszwalb&Huttenlocher, IJCV’05] [Sigal&Black, CVPR’06], [Eichner&Ferrari, BMVC’09], [Andriluka et al., CVPR’09]

Related Work • 2D-Tracklet Detection • 2D-to-3D Lifting People Tracking [Wu&Nevatia, IJCV’07] [Andriluka et al., CVPR’08] [Breitenstein et al., ICCV’09] 3D Pose Estimation [Urtasun et al., CVPR’06] [Rogez et al., CVPR’08] [Gall et al., IJCV’10]

1st stage:Single Frame Detection • Basic Pictorial Structure Model p(Lm|Dm)∝p(Dm|Lm)p(Lm) Lm={lm0,lm1,…, lmN} lmi={xmi, ymi,Өmi, smi} Dm:single frame image evidence m: current frame N: different part x , y: image position Ө: absolute orientation s: scale

p(Dm|Lm)= • p(Lm)=p(lm0) K: the set of edges representing kinematic relationships between parts Local feature Adaboost Orientation K Estimated pose Likelihood of part N

Combining Pictorial Structure Model • Model is composed of 8 body parts • A dataset of 2336 images for training ,496 for validation and testing • Assuming viewpoint annotations, bounding box and body joints

Two strategies for combining the output of the bank of detector • For detection: train SVM classifier using output of PS models at features • For viewpoint classification: train 1-vs-all SVM classifier for each of the 8 viewpoints

Results:single frame detection

Without training dataset(top) With training dataset(middle) Combining the output of viewpoint-specific detectors with a linear SVM(bottom)

Viewpoint classification accuracy: 42.2% in classification of one of the 8 viewpoint • False positives:

2nd stage:2D-Tracklet Detection • Find consistent sequences of 2D hypotheses with Viterbi algorithm

Box hypotheses Hm=[hm1,…,hmN] hmi={hxm1,hym1,h s m1} • The transition probabilities between states hmiand hm-1,j: ptrans(hmi,hm-1,j)=N(hmi|hm-1,j,pos) ． N(dapp(hmi,hm-1,j)|0,σapp2) pos=diag(σx2, σy2, σs2), σx=σy=5, σs=0.1 σapp=0.05 dapp(hmi,hm-1,j) is the Euclidean distance between RGB color histograms computed for the bounding rectangle of each hypothesis

People detection based on single frame(top) Tracklets found by 2D tracking algorithm(bottom)

3rd stage:2D-to-3D Lifting • Bayesian formulation p(Q1:M|ἐ1:M)∝p(ἐ1:M |Q1:M)p(Q1:M) Qm={qm, Фm,hm} qm: the parameters of body joints Фm: the rotation of the body in world coordinate hm={hxm1,hym1,hscale m1}: the position and scale of the person projected to the image ἐm: evidence from stage 1 and 2

Find local maxima with scaled conjugate gradients • Initialization : 2D to 3D lifting by finding sequences of 3D exemplars that fit well to 2D tracklet observation

Initial pose sequence after 2D-to-3D lifting(top) Pose sequence after optimization of the 3D pose posterior(bottom)

Results: 2D-to-3D lifting

Subject S2/Camera C1(top) Subject S2/Camera C1(bottom) 2D results are in pixels and 3D results are in millimeters

Conclusion • Novel approach to monocular 3D human pose estimation and tracking • Leverage recent results in people detection, 2D pose estimation and human motion modeling • State-of-the-art performance in lab setting • Applicable in uncontrolled street conditions

Future Work • Broader class of motions and more detailed evaluation

Monocular 3D Pose Estimation and Tracking by Detection

Monocular 3D Pose Estimation and Tracking by Detection

Presentation Transcript

Topic Detection and Tracking

Object Detection and Tracking

Pose Estimation and Segmentation of People in 3D Movies

Joint Eye Tracking and Head Pose Estimation for Gaze Estimation

Why pose estimation?

Articulated People Detection and Pose Estimation: Reshaping the Future

Pose Estimation

Pose Estimation

3d Pose Detection

3D Human Body Pose Estimation using GP-LVM

Face Detection, Pose Estimation, and Landmark Localization in the Wild

POSE–CUT Simultaneous Segmentation and 3D Pose Estimation of Humans using Dynamic Graph Cuts

Face and Pose Tracking

Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

2D Tracking to 3D Reconstruction of Human Body from Monocular Video

Pedestrians Detection and Tracking

Database-Based Hand Pose Estimation

Topics Detection and Tracking

Human Pose detection

3D HUMAN BODY POSE ESTIMATION BY SUPERQUADRICS

Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization

Fast Face Detection with Precise Pose Estimation