Tracking in Large Crowds Scenes

Why is it difficult? What is it good for? Two completely different approaches to this problem Presented by Limor & Muly Gottlieb Tracking in Large Crowds Scenes

Tracking in Crowds – The Challenge Particularly hard instance of the Tracking problem Small number of pixels on target Many interactions among targets Many occlusions “Normal” appearance and shape models don’t scale well to large crowd scenes Background subtraction and other pre-processing techniques are not good enough here

Tracking in Crowds - Motivation Many real-life scenes are crowded Security applications – surveillance cameras Sporting events, rallies, etc. Non human crowds (bees, ants etc.) Generally less treated than general tracking

Tracking in Crowds – The approaches We will see two completely different approaches One is feature-based, the other is appearance-based One is suitable only for human crowds, the other for any crowded scene One uses a probabilistic approach, the other uses force-fields Many other differences (later)

Floor Fields for Tracking in High Density CrowdScenes Saad Ali, Mubarak ShahECCV 2008 Presented by Muly & Limor Gottlieb(Some slides are adapted from the Authors ECCV presentation)

Approach Observation: “Behavior of an individual in a crowded situation is a function of collective behavioral patterns evolving from the space-time interaction of a large number of individuals among themselves, and with the structure of the scene” Means: Natural crowd flow and scene constraints influencing the behavior of a person in a dense crowd, can be used as priors to force high level direction for tracking purposes.

Approach Overview • Treat the crowd flow as a collection of mutually interacting particles  • Reasonable because when people are densely packed individual movement is restricted. • Model instantaneous movement of the individual with a matrix of preferences. • Probability to move in a certain direction takes into consideration multiple sources of information: • Target individual appearance • Crowed flow • Scene Structure

Approach Overview – Cont’ The concept of Floor Fields: Model the interaction between individuals and their preferred direction of movement by transforming the long ranged forces into local ones. For instance: • A long range force that influences an individual to move towards the exit door can be converted into a local force. Probability of move of an individual depends on the strength of the floor field in his/her neighborhood.

Agenda • Tracking framework • Static Floor Field • Dynamic Floor Field • Boundary Floor Field • Recap • Results

Tracking Framework • Treat the crowed as a collection of mutually interacting particles: • Image space is discretized into cells • Each cell contains a single particleOxi. • A particle represents all the pixels in the cell • The target individual is represented by a set of particles P = [. . . , oxi, . . .]

Tracking Framework – Cont’ Movement computation: • The target moves from one cell to the next according to a transition probability that determines the likely direction of the motion. • This transition probability is associated with the centroid, although computation uses information from all the particles in the set P. • The transition probability is determined by two factors: • Similarity between the appearance templates at the current location and the next. • The influence generated by the floor fields.

Tracking Framework – Cont’ The probability to move from cell ito a neighboring cell j is computed using the following formula: kD, kS, kB –are the weights of the fields respectively. Normalization constant Similarity measurement DFF SFF BFF

Static Floor Field (SFF) • Captures regions of the scene which are more attractive in nature, for instance: • exit locations • Captures static properties of the scene. • Computed only once during the learning period. Computation involves: • Point flow field • Sink seeking

SFF – Point Flow Field Point flow: instantaneous motion at any locations • Optical flow is computed between consecutive frames of the first M video frames. • for each cell (or pixel) i, a point flow vector Zi = (Xi, Vi) is computed, includes both the locationXi = (xi, yi) and the optical flow vector Vi = (vxi , vyi ).(Vi is the mean optical flow of the first M frames). • All flow vectors averaged over M frames form the ‘Point Flow Field’ which represents the smoothed motion information of the video in that interval and assists in computing the dominant properties of the scene.

SFF – Sink Seeking • The idea behind this: behavior of large crowds in locations such as sporting events, and train-stations can be described as goal directed Attractive Regions = Sinks • If we have the knowledge about the sinks: • For any point in the scene, we can compute tendency of the individual at that point to move towards the sink. • Point-flow field is used to discover the sinks in the scene.

SFF – Sink Seeking process • Local Force = Function of shortest distance to sink in terms of the appropriate distance metric. The sink path The sink Point flow state

Sink Seeking Process – cont’ The process: For each point in the point-flow field: The state of point i is : • Compute Zi,t+1 : • Check if all neighbors weights are below a threshold  stop the process, sink is found. New position depends on the location and velocity at the previous state Neighbor weights (kernel density method) New velocity depends on the previous velocity and the observed velocities of its neighbors At the end of the process we have one sink path per point in the point-flow field

SFF Generation For each point in the image space: • Place the value of ‘sink steps’ of that location(Sink Steps: The number of steps taken during the sink-seeking process to reach the sink), this generates the SFF. • To compute the term Sij , we use the difference between the SFF value at point i and the SFF value in point j.This acts as “steepest decent” – high Sijvalues indicate high probability to move from cell i to cell j and vice-versa.

SFF Examples

Dynamic Floor Field (DFF) Objective: • determine the behavior of the crowd around the individual being tracked. • Instantaneous information about the direction of motion. • Based on particle advection.

DFF – Cont’ The process: • compute optical flow between consecutive frames and stack them together. • Count number of particles that pass through location i and j (pixel advection). • Number of common particles = Strength of association between i and j (Dij).

DFF Example The peak represents the location where most particles end up if they pass through the cell containing the yellow particle.

Boundary Floor Field (BFF) • capture influences generated by barriers or walls in the scene (repulsive in nature). The process: • Compute crowd flow segmentation (based on previous algorithm by the authors). • use segmentation map and compute an edge map by retaining only the boundary pixels of each segment. • Compute the shortest distance between the wall/barrier and each pixel (when distance is larger than a threshold, the barrier effect vanishes). • The difference between the values in cell j and i represent the measure of Bij

BFF Example Segmentation map Edge map

Recap The probability to move from cell ito a neighboring cell j is computed using the following formula: kD, kS, kB –are the weights of the fields respectively. Normalization constant Similarity measurement DFF SFF BFF

Results VIDEO!

Unsupervised Bayesian Detection of Independent Motion in Crowds Gabriel J. Brostow and Roberto Cipolla University of Cambridge CVPR 06 Presented by Limor & Muly Gottlieb

Introduction • Goal: Detection of individual entities in crowd scenes • Method: Tracking features between frames (optical-flow), and clustering the features into groups representing different individuals • Main underlying observation: “A pair of points that appears to move together is likely to be part of the same individual” • No appearance model used – Only the data on motion itself is used for detection and tracking in dense scenes

Approach • Feature based tracking – using the available features in each frame (no appearance model) • Automated clustering of features in an unsupervised Bayesian framework

Swift algorithm overview • Find useful features (what features to use? What is considered a “useful ” feature?) • Track the features forward and backward in time as long as they exist (optical-flow) – to assemble trajectories • Probabilistically cluster the trajectories • Find a probabilistic framework for considering pair-wise clusters – implement it and decide on discriminant function • Find an efficient way to use pair-wise decisions to build clusters (without computing ALL combinations!)

Agenda • Bayesian framework • Finding good features • Spatial prior • Coherent motion likelihood • Evidence • Discriminant function • Results

Bayesian Framework • Each image feature x traces out a trajectory X • We want to find the most probable clustering of features {x} into clusters Ci (very large number of possibilities!) • We mark in Xcithe trajectories associated with a single cluster Ci: • Define some distance-measure (later defined) between two trajectories: • Seek for the probability of merging two clusters given their distance measure: • Main assumption: pair-wise decisions for or against merging clusters will reveal clustering

Bayesian Framework cont’ • We want to compute: • Recall Bayes law: • So using Bayes law we get: Spatial Prior Coherent Motion Likelihood • Need to model each of the three terms • Need to find some threshold or discriminant function for decision • Making it tractable: using single pair-wise pass Evidence (normalization term)

Finding Good Features • Appearance is not used in order to evaluate the results using only the motion itself (combining both: possible future work) • For a detected feature to be useful, it must be tracked with high degree of confidence both forward and backward in time • Start with finding corners in each frame (using Rosten-Drummond 06’ and Tomasi-Kanade 91’) • Track all features for two frames forward (using hierarchical optical-flow algorithm Lucas-Kanade 81’) • For each frame f we consider the detected corners Df

Finding Good Features cont’ • For a feature to be “good” - Independent feature finding in two subsequent frames needs to agree in accuracy of up to one pixel • Define a function W(Df, n) which returns the image coordinates of projecting corners in Dfalong their optical-flow paths forward for n frames • Take only features which satisfy

Finding Good Features cont’ • Each feature which satisfies the above equation is treated as a “good” feature x, and then tracked forward in all subsequent frames until lost, and then backwards in previous frames until lost (in a second pass over the video in reverse) – to yield a trajectory X • To compare two trajectories X and Y which respectively extend in time over we consider only the overlapping range of frames:

Spatial Prior • In each frame f we take all active trajectories X and sample them in +/- 30 frames • If a trajectory data runs out we extrapolate last known velocity • We compute the maximal Euclidian distance between each pair of trajectories: • We build a distance tree, and split the tree into c clusters (chosen manually as 3-5 times the number of bodies that could fit in the view) • The prior is computed as:

Coherent Motion Likelihood • From clusters Ciand Cj we take all trajectories Xciand Xcj(all original samples, unlike the prior) • We seek for that relates the two sets of trajectories in proportion to the probability that all points moved together on one body (which is exactly what we want to model in )

Coherent Motion Likelihood cont’ • We use the assumption that two individual features Xu and Xv are more likely to come from the same body if the variance in distance between them is small: • Using as a measure of the probability that Xu and Xv moved together rigidly, we can compute the likelihood term over all points in Xciand Xcj:

Evidence Term • The normalization term should represent the unconditional probability of observing all features Xciand Xcjmoving together rigidly, among all other clustering possibilities of clusters moving together rigidly • To approximate this we compute the ratio between over and between the same sumbut over

Evidence Term cont’ • This normalization term represents the fraction of “good” feature-to-feature pairings just between cluster i and j , to the number found throughout the whole network of X’s • We get:

Tracking in Large Crowds Scenes

Tracking in Large Crowds Scenes

Presentation Transcript

CRIME SCENES

Constitutions: send in the crowds

Avoiding the crowds

Tracking with Local Spatio -Temporal Motion Patterns in Extremely Crowded Scenes

Tracking Pedestrians Using Local Spatio-Temporal Motion Patterns in Extremely Crowded Scenes

Estimating crowds

Swarms and Crowds

Tracking Participation in Large Classes

Casket Scenes

Othello Scenes

Creating Scenes

Map Scenes

Crime Scenes

Crowds and Rioting

Data Management of Large 3D Urban Scenes

Background Caches for Large Outdoor Scenes

Specializes in the tracking of large whales by satellite

Spring scenes

Anonymity – Crowds

MEMORABLE SCENES

opening scenes