1 / 42

VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING

VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING. Alex Leykin Indiana University. PhD Thesis by:. Motivation. Automated tracking and activity recognition is missing from marketing research Hardware is already there

darin
Download Presentation

VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS:A VIDEO MINING SYSTEM FOR RETAIL MARKETING Alex Leykin Indiana University PhD Thesis by:

  2. Motivation • Automated tracking and activity recognition is missing from marketing research • Hardware is already there • Visual information can reveal a lot about human interactions with each other • Help in making intelligent marketing decisions

  3. Goals • Process visual information to get a formal representation of human locations (Visual Tracking) • Extract semantic information from the tracks (Activity Analysis)

  4. Related Work: Detection and Tracking • Yacoob and Davis “Learned models for estimation of rigid and articulated human motion from stationary or moving camera” IJCV 2000 • Zhao and Nevatia “Tracking multiple humans in crowded environment” CVPR 2004 • Haritaoglu, Harwood, and Davis “W-4: Real-time surveillance of people and their activities” PAMI 2000 • J. Deutscher, B. North, B. Bascle and A. Blake “Tracking through singularities and discontinuities by random sampling”, ICCV 1999 • A. Elgammal and L. S. Davis, “Probabilistic Framework for Segmenting People Under Occlusion”, ICCV 2001. • M. Isard, J. MacCormick, “BraMBLe: a Bayesian multiple-blob tracker”, ICCV 2001

  5. Related Work: Activity Recognition • Haritaoglu and Flickner “Detection and tracking of shopping groups in stores” CVPR 2001 • Oliver, Rosario, and Pentland “A bayesian computer vision system for modeling human interactions” PAMI 2000 • Buzan, Sclaroff, and Kollios “Extraction and clustering of motion trajectories in video” ICPR 2004 • Hongeng, Nevatia, and Bremond “Video-based event recognition: activity representation and probabilistic recognition methods” CVIU 2004 • Bobick and Ivanov “Action recognition using probabilistic parsing” CVPR 1998

  6. System Components

  7. Background Modeling • Color • μRGB • Ilow • Ihi codebook codeword ………..

  8. If there is no match if codebook is saturated then pixel is foreground else create new codeword Else update the codeword with new pixel information If >1 matches then merge matching codewords Adaptive Background Update • Match pixel p to the codebook b I(p) > Ilow I(p) < Ihigh (RGB(p)∙ μRGB) < TRGB t(p)/thigh > Tt1 t(p)/tlow > Tt2

  9. Background Subtraction

  10. Head Detection Vanishing Point Projection (VPP) Historgram Vanishing Point in Z-direction

  11. Camera Setup • Two camera types Perspective Spherical • Mixtures of indoor and outdoor scenes • Color and thermal image sensors • Varying lighting conditions (daylight, cloud cover, incandescent, etc.)

  12. Camera Modeling Perspective Projection Spherical Projection Lat Y y X x [Xc, Yc, Zc] Lon [Xc, Yc, Zc] Z Y X Z X, Y, Z from: [sx; sy; s] = P [X; Y; Ż; 1] using SVD Where P, is the 3x4 projection matrix X = cos(θ) tan(π-φ)(Zc-Ż) Y = sin(θ) tan(π-φ)(Zc-Ż) Z = Ż Assumption: floor plane Zf = 0

  13. Tracking Goal: find a correspondence between the bodies, already detected in the current frame with the bodies which appear in the next frame. ? ? ? Apply Markov Chain Monte Carlo (MCMC) to estimate the next state xt-1 xt ? Add body Delete body Recover deleted Change Size Move zt

  14. Location of each pedestrian is estimated probabilistically based on: Current image Previous state of the system Physical constraints observation likelihood Tracking The goal of our tracking system is to find the candidate state x´(a set of bodiesalong with their parameters) which, given the last known state x, will best fitthe current observation z P(x’| z, x) = L(z|x’) · P(x’{x}) state prior probability

  15. body coordinatesare weighted uniformlywithin the rectangular region R of the floor map. U(x)R and U(y)R  variation from Kalman predicted position d(xt, x’t−1) and d(y, y’t−1) Tracking: Priors Constraintson the body parameters: N(hμ, hσ2) and N(wμ,wσ2)body width andheight Temporal continuity: d(wt, wt−1) and d(ht, ht−1) variation from the previous size N(μdoor, σdoor) distance to the closest door (for new bodies)

  16. Tracking Likelihoods: Distance weight plane Problem: blob trackers ignore blob position in 3D (see Zhao and Nevatia CVPR 2004) Solution: employ “distance weight plane” Dxy = |Pxyz, Cxyz| where P and C are world coordinates of the camera and reference point correspondingly and

  17. Tracking Likelihoods: Z-buffer 0 = background, 1=furthermost body, 2 = next closest body, etc

  18. Tracking Likelihoods: Color Histogram Color observation likelihood is based on the Bhattacharya distance between candidate and observed color histograms Implementation of z-buffer (Z) and distance weight plane (D) allows to compute multiple-body configuration with one computationally efficient step. Let: I - set of all blob pixels O- set of body pixels

  19. H t t-1 Tracking: Anisotropic Weighted Mean Shift Classic Mean-Shift Our Mean-Shift t

  20. Actors and events • Shopper groups are formed by individual shoppers who shop together for some amount of time • More than fleeting crossing of paths • Dwelling together • Splitting and uniting after a period of time

  21. Swarming • Shopper groups detected based on “swarming” idea in reverse • Swarming is used in graphics to generate flocking behaviour in animations. • Rules define flocking behaviour: • Avoid collisions with the neighbors. • Maintain fixed distance with neighbors • Coordinate velocity vector with neighbors.

  22. Tracking Customer Groups We treat customers as swarming agents, acting according to simple rules (e.g. stay together with swarm members) Customer groups

  23. Terminology • Actors: shoppers (bodies detected in tracking) • (x, y, id) • Swarming events defined as short time activity sequences of multiple agents interacting with each other. • Could be fleeting (crossing paths) • Later analysis sorts this out and ignores chance encounters.

  24. Swarming The actors that best fit this model signal a Swarming Event Multiple swarming events are further clustered with fuzzy weights to find out shoppers in the same group over long periods.

  25. Two actors come sufficiently close according to some distance measure: Relative position pi=(xi, yi) of actor i on the floor Body orientations αi Dwelling state δi={T,F}. Event detection Distance between two agents is a linear combination of co-location, co-ordination and co-dwelling

  26. Event detection Perform agglomerative clustering of actors a into clusters C • Initialize: N singleton clusters • Do: merge two closest clusters • While not: validity index I reaches its maximum I consists of isolation Ini and compactness Inc Ini = isolation Inc = compactness

  27. Event detection Final events # Iteration # Iteration

  28. Activity Detection • The shopper group detection is accomplished by clustering the short term events over long time periods. • The events could be separated in time, but they will be part of the same shopper group if the actors are the same (the first term).

  29. Activity detection • Higher level activities (shopper groups) detected using these events as building blocks over longer time periods • Some definitions: • Bei={bei} the set of all bodies taking part in an event ei. • τei and τej are the average times of events ei and ej happening.

  30. Define a measure of similarity between two events Activity detection Overlap between two sets of actors Separation in time

  31. Activity detection • Perform fuzzy agglomerative clustering • Minimize objective function • where wij are fuzzy weights • and asymmetric variants of Tukey’sbiweight estimators: • (.) is the loss function from robust statistics. • ψ(.) is the weight function • Adaptively choose only strong fuzzy clusters • Label remaining clusters as activities

  32. Results: Swarming activities detected in space-time • Dot location: average event location • Dot size: validity • Dots of same color: belong to same activity

  33. Group Detection Results

  34. Quantitative Results

  35. Tracking

  36. Group Detection Partially identified groups (≥2 people in the group Correctly identified) false positives Ground truth (manually determined) false negatives (groups missed)

  37. Qualitative Assesments • Longer paths provide better group detection (pval << 1) • Two-people groups are easiest to detect • Simple one-step clustering of trajectories is not sufficient for long-term group detection • Employee tracks pose a significant problem and have to be excluded • Several groups were missed by the operator in the initial ground truth • System caught groups missed by the human expert after inspection of results.

  38. BG subtraction based on codebook (RGB+thermal) Introduced head candidate selection method based on VPP histogram Resolving track initialization ambiguity and non-unique body-blob correspondence Informed jump-diffuse transitions in MCMC tracker Weight plane and z-buffer improve likelihood estimation Anisotropic mean-shift with obstacle model Two-layer formal framework high level activity detection Implemented robust fuzzy clustering to group events into activities Contributions

  39. Future Work • Improved Tracking (via feature points) • Demographical analysis • Focus of Attention • Sensor Fusion • Other Types of Swarming Activities

  40. Questions? Thank you!

More Related