Video-Based People Tracking Brubaker, Sigal and Fleet Presented by Patrick Davis
Introduction • The process to track people… • Process the image • Get the skeleton • Try to guess the posture/movement • Challenges • Things that cover your bones • Muscle • Fat • Cloths • Thing that clutter the image • Lots of Formulas… • I will not cover those • Lots of Models used to simplify processing
Xbox Kennect • Even though this paper was written in 2010, the paper does not discuss the Kennect. • Fair warning • If you ask about the Kennect or how it works, I will not know the answer. • Most of what the paper is concerned with is one camera processing, the Kennect is not just one camera
Tracking as an Inference • Tracking is boiled down to a set of sequences of probable poses • The formulas are basically what they call likelihood • Likelihood is used to approximate the current body position. • The result is then combined to find the most probable motion
Posing with Kinematic Parameterization • The skeleton can be treated as a structure • Think of a tree with hinges on each of the branches • Each type of hinge can move a certain way • Degrees of Freedom • This allows a simple way of connecting arms to shoulder in a logical way • This can lead to Gimbal Lock • Gimbal lock is the loss of one degree of freedom in a three dimensional space.
Forming the Body Image • Parameters can be used to improve posture estimation • Computationally expensive • SCAPE model uses a small number of parameters and replaces it with a mesh. Is only practical with offline processing. • Clothing is assumed to be tight fitting • Parameters can be entered via calibration targets
Image Measurements • 2D images • If one can identify the joints and track the joint location they can infer the likelihood of the current position. • This can be simplified with some pre initialization. • Creating a logical green screen • by removing the background we can more easily see the foreground person. • Requires the background to be static • Appearance • Processing foreground is harder because of shading differences but this is helped by the likelihood of the pose. • Edges and Gradient Based Features • Where the edge detection is used to get the edge and the distance between points. Once this is achieved later calculations are only measured at edge points
Motion • Joint Limits • While your knees do not bend forward there is not enough knowledge about joint limits alone to infer the body position • Smoothness and linear Dynamical models • Human motion is smooth theoretically. Each pose is equal to the previous pose plus some noise (Markov model) • Activity Specific Models • If you know the type of motion being tracked (or person) you can apply stronger models to the movement. (Principal Component Analysis is used to approximate the pose based on mean poses). • We can tell a computer what sitting in a chair looks like and the computer can more easily recognize someone sitting in a chair • Physics-based Motion Models • These model forces such as muscle forces and gravity to put constraints on the allowed motions. These can get very complex very quickly. • Just think of all the forces on your body as you sat down in your chair.
Inference • The goal is to compute the approximation of the likely body position • This can be done by tracking the movement of each pixel • Markov Chain Monte Carlo Filtering • Produce a sequence of particles (variables) that as time (or frames) the particles move closer to the target formation.
Initialization and Failure Recovery • Discriminative Methods for Pose Estimation • These are usually derived from a set of training examples that are assumed to be fair samples from the joint distribution over states and measurements. • Nearest Neighbor • Given a set of known poses the body is mapped to the closest example • Drawbacks • Large Training Sets • All training sets must be stored and retrieved • It can produce ambiguities (such as a difference between sitting down in a chair and squats at the gym)