Segmentation and Tracking of Multiple Humans in Crowded Environments

Segmentation and Tracking of Multiple Humans in Crowded Environments Tao Zhao, Ram Nevatia, Bo WuIEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 7, JULY 2008

Outline • Introduction • Overview • Probabilistic modeling • Computing MAP by efficient MCMC • Experimental results • Conclusion

Introduction • Segmentation and tracking of multiple humans in crowded situations is made difficult by interobject occlusion.

Introduction • The method is feasible for a crowed scene: • persistent and temporarily heavy occlusion • Do not require that humans isolated when they first enter the scene. • More complex shape models are needed. • Joint reasoning about the collection of objects is needed..

Introduction • Main features of this work: • A three-dimensional part-based human body model which enables the segmentation and tracking of humans in 3D and the inference of interobject occlusion naturally. • A Bayesian framework that integrates segmentaion and tracking based on a joint likelihood for the appearance of multiple objects.

Introduction • The design of an efficient Markov chain dynamics, directed by proposal probabilities based on image cues. • The incorporation of a color-based background model in a mean-shift tracking step.

Overview • The prior models: • Background model: • Based on a background model, the foreground blobs are extracted as the basic observation. • 3D human shape model: • Since the hypotheses are in 3D, occlusion reasoning is straightforward. • Camera model & Ground Plane • Multiple 3D human hypotheses are projected onto the image plane and matched with the foreground blobs.

The tracks are used to propose human hypothesis in the next frame. Segment the foreground blobs into multiple humans and associate the segmented humans with the existing trajectories. Overview • The segmentation and tracking are integrated in a unified framework and interoperate along time:

Overview • We formulate the problem as one of Bayesian inference to find the best interpretation given the image observations, the prior model, and the estimates from the previous frame analysis. • That is the maximun a posteriori (MAP) estimation.

Overview • The state to be estimated at each frame: • The number of objects • Their correspondences to the objects in the previous frame (if any). • Their parameters (for example, position) • Uncertainty of the parameters • …

Probabilistic modeling • Our goal is to estimate the state at time t, (t), given the image observation, I(1),…, I(t)  : the state of the objects.: the solution space.

Probabilistic modeling • a state containing n objects can be written aswhere ki is the unique identity of the ith object whose parameters are mi and n is the solution space of exactly n objects. • The entire solution space is

3D human shape model • The parameter of an individual human, m, are defined based on a 3D human shape model. • Do not attempt to capture the detailed shape and articulation parameters of the human body. Head, torso, and legs, with fixed spatial relationship.

3D human shape model • The parameters (mi) to describe 3D human hypothesis: • size (hi): 3D height of the model, it also control the overall scaling of the object in the three directions. • thickness (fi): captures extra scaling in the horizontal directions. • position(uior (xi,yi)): the image position of the head.

3D human shape model • orientation (oi): 3D orientation of the body • Orientations of the models are quantized into few levels for computation efficiency. • inclination(ii): 2D inclination of the body • There is the chance that the body may be inclined slgithly.

Object appearance model • We use a color histogram of the object, defined within the object shape. • It help establish correspondence in tracking because it is insensitive to the nonrigidity of human motion. • There exists an efficient algorithm, for example, the mean-shift technique, to optimize a histogram-based object function.

Background appearance model • The probability of pixel j being from the background is

The prior distribution • The first term : • is independent of time and is defined by • Si is the projected image of the ith object and |Si| is its area.

The prior distribution • P(ofrontal)=P(oprofile)=1/2 • P(xi,yi) is a uniform distribution in the region where a human head is plausible • P(hi) is a Gaussian distribution N(h,h2) truncated in the range of [hmin,hmax] • P(fi) is a Gaussian distribution N(f,f2) truncated in the range of [fmin,fmax] • P(ii) is a Gaussian distribution N(i,i2)

The prior distribution • the second term • We approximate it by • We rearrange (t) and (t-1) as such that one of is true.

The prior distribution • Passoc • We assume that the position and the inclination of an object follow constant velocity models with Gaussian noise.

The prior distribution • The height and thickness follow a Gaussian distribution. • We use Kalman filters for temporal estimation. • Pnew & Pdead • the likelihood of the initialization of a new track • the likelihood of the termination of a existing track • They are set empirically according to the distance of the object to the entrance/exits.

Joint image likelihood for multiple objects and the background • The visible part of object ( ): • determined by the depth order of all of the objects, which can be inferred from their 3D position and the camera model. • Non object region ( )

Background exclusion:the likelihood favors difference in an object hypothesis from the background. Object attraction:this likelihood favors its similarity to its corresponding object in the previous frame. Joint image likelihood for multiple objects and the background • The joint likelihood P(I|) consists of two terms: • The first term:

Joint image likelihood for multiple objects and the background • di is the color histogram of the background image within the visibility mask of object i. • piis the color histogram of the object. • is the Bhattachayya coefficient, which reflects the similarity of the two histogram.

The likelihood penalizes the difference from the background model. Joint image likelihood for multiple objects and the background • The second term is: • ej=log(Pb(Ij)) is the probability of belonging to the background model

Computing MAP by efficient MCMC • Computing the MAP is an optimization problem. • Optimization is challenging: • An unknown number of objects, the solution space contains subspaces of varying dimension. • Includes both discrete variables and continuous variable. • we adapt a data-driven Markov chain Monte Carlo (MCMC) approach to explore this complex solution space.

Computing MAP by efficient MCMC • MCMC method with jump/diffusion dynamics to sample the posterior probability. • Jump: cause the Markov chain to move between subspaces with different dimension and traverse the discrete variables. • Diffusions: make the Markov chain sample continuous variables. • In the process of sampling, the best solution is recorded and the uncertainty associated with the solution is also obtained.

Computing MAP by efficient MCMC

Computing MAP by efficient MCMC • MCMC method: • We want to design a Markov chain with stationary distribution . • At the gth iteration, we sample a candidate state ’ from a proposal distribution q(g| g-1). • If the candidate state ’ is accepted, g= ’ . • Otherwise, g= g-1.

Computing MAP by efficient MCMC • Markov chain constructed in this way has its stationary distribution equal to P(), independent of the choice of the proposal probability q() and the initial state 0. • The choice of the proposal probability q() can affect the efficiency of MCMC significantly. • Using more informed proposal probabilities, for example, as in the data-driven MCMC, will make the Markov chain traverse the solution space more efficiently. Therefore, the proposal distribution is written as q(g| g-1, I).

Markov chain dynamic • The dynamics correspond to the proposal distribution with a mixture densitywhere A is the set of all dynamic = {add, remove, establish, break, exchange, diff} • We assume that we have the sample in the (g-1)th iteration ,and now propose a candidate ’ for the gth iteration.

Markov chain dynamic • Dynamics: • object hypothesis addition • Sample the parameter of a new human hypothesis (kn+1,mn+1) and add it to g-1. • object hypothesis removal • establish correspondence

Markov chain dynamic • break correspondence • exchange identity • Parameter update

Experimental results • Evaluation on an outdoor scene

Experimental results • There are 20 occlusions events overall, nine of which are heavy occlusions. • We use 500 iterations per frame. • Trajectory-based errors: • Trajectories of three objects are broken once (ID 28 -> ID 35, ID 31 -> ID 32, ID 30 -> ID 41) • Trajectories initialization: • Some start when the objects are only partial inside. • Only the initialization of three objects (object 31, 50, 52) are noticeably delayed. • Partially occlusion and/or the lack of contrast with the background are the causes of the delays. • The detection rate and the false the false-alarm are 98.13 and 0.27 percent.

Conclusion • A principled approach to simultaneously detect and track humans in a crowed scene. • We formulate the problem as a Bayesian MAP estimation problem. • The inference is performed by an MCMC-based approach to explore the joint solution space. • The success lies in the integration of the top-down Bayesian formulation following the image formation process and the bottom-up features that are directly extracted from images.

Segmentation and Tracking of Multiple Humans in Crowded Environments

Segmentation and Tracking of Multiple Humans in Crowded Environments

Presentation Transcript

expertise effects in multiple-target tracking

HUMANS AND NON-HUMANS

Improving Flash Food Prediction in Multiple Environments

Multiple Camera Object Tracking

A Robotic Wheelchair for Crowded Public Environments

Tracking Humans using Multiple pairs of PTZF Cameras and Wide-Angle Cameras

Tracking with Local Spatio -Temporal Motion Patterns in Extremely Crowded Scenes

Silhouette Segmentation in Multiple Views

Crowded Hallways

Detection and Segmentation of Bird Song in Noisy Environments

Information Theoretic Measures: Object Segmentation and Tracking

Motion Object Segmentation, Recognition and Tracking

PROFILING, TRACKING AND REPORTING Profiling and Segmentation

Multiple Camera Tracking of Interacting and Occluded Human Motion

MULTIPLE MEMORY SYSTEM IN HUMANS QUESTIONS:

Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Development of Giant Unilamellar Vesicles for the Study of Crowded Protein Environments

Interactive Segmentation for Manipulation in Unstructured Environments

Tracking Management Effectiveness in Multiple Sites

Learning to Navigate Through Crowded Environments

Multiple Object Tracking in Deep Learning