1 / 116

The Science of Silly Walks

The Science of Silly Walks. Hedvig Sidenbladh Michael J. Black. Royal Inst. of Technology, KTH Stockholm Sweden. Department of Computer Science Brown University. http://www.nada.kth.se/~hedvig. http://www.cs.brown.edu/~black. Collaborators. David Fleet , Xerox PARC

edric
Download Presentation

The Science of Silly Walks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Science of Silly Walks Hedvig Sidenbladh Michael J. Black Royal Inst. of Technology, KTH Stockholm Sweden Department of Computer Science Brown University http://www.nada.kth.se/~hedvig http://www.cs.brown.edu/~black

  2. Collaborators David Fleet, Xerox PARC Nancy Pollard, Brown University Dirk Ormoneit and Trevor Hastie Dept. of Statistics, Stanford University Allan Jepson, University of Toronto

  3. The (Silly) Problem

  4. Inferring 3D Human Motion * Infer 3D human motion from 2D image properties. * No special clothing * Monocular, grayscale, sequences (archival data) * Unknown, cluttered, environment * Incremental estimation

  5. Singularities in viewing direction Unusual viewpoints Self occlusion Low contrast Why is it Hard?

  6. Clothing and Lighting

  7. Large Motions Limbs move rapidly with respect to their width. Non-linear dynamics. Motion blur.

  8. Ambiguities Where is the leg? Which leg is in front?

  9. Ambiguities Accidental alignment

  10. Ambiguities Occlusion Whose legs are whose?

  11. Need a constraining likelihood model that is also • invariant to variations in human appearance. 2. Need a prior model of how people move. 3. Need an effective way to explore the model space (very high dimensional) and represent ambiguities. Inference/Issues Bayesian formulation p(model | cues) = p(cues | model) p(model) p(cues)

  12. Simple Body Model * Limbs are truncated cones * Parameter vector of joint angles and angular velocities = f

  13. Key Idea #1 (Likelihood) 1. Use the 3D model to predict the location of limb boundaries (not necessarily features) in the scene. 2. Compute various filter responses steered to the predicted orientation of the limb. 3. Compute likelihood of filter responses using a statistical model learnedfrom examples.

  14. Example Training Images

  15. Edge Filters Normalized derivatives of Gaussians (Lindeberg, Granlund and Knutsson, Perona, Freeman&Adelson, …) Edge filter response steered to limb orientation: Filter responses steered to arm orientation.

  16. Distribution of Edge Filter Responses pon(F) poff (F) Likelihood ratio, pon/ poff , used for edge detection Geman & Jednyak and Konishi, Yuille, & Coughlan Object specific statistics

  17. Other Cues I(x, t) Motion I(x+u, t+1) Ridges

  18. Key Idea #2 (Likelihood) “Explain” the entire image. p(image | foreground, background) Generic, unknown, background Foreground person Foreground should explain what the background can’t.

  19. Likelihood Steered edge filter responses crude assumption: filter responses independent across scale.

  20. joint angles time Learning Human Motion * constrain the posterior to likely & valid poses/motions * model the variability 3D motion-capture data. * Database with multiple actors and a variety of motions. (from M. Gleicher)

  21. Alternative: * the data represents all we know * replace representation and learning with search. (search has to be fast) *De Bonnet & Viola, Efros & Leung, Efros & Freeman, Paztor & Freeman, Hertzmann et al, … Efros & Freeman’01 Key Idea #3 (Prior) Problem: * insufficient data to learn probabilistic model of human motion.

  22. Implicit Empirical Distribution • Off-line: • learn a low-dimensional model of every n-frame sequence of joint angles and angular velocities (Leventon & Freeman, Ormoneit et al, …) • project training data onto model to get small number of coefficients describing each time instant • build a tree structured representation

  23. “Textural” Model • On-line: Given an n-frame input motion • project onto low-dimensional model. • index in log time using the coefficients. • return the best k approximate matches (and form a “proposal” distribution). • sample from them and return the n+1st pose.

  24. Synthetic Walker * Colors indicate different training sequences.

  25. Synthetic Swing Dancer

  26. Posterior over model parameters given an image sequence. Temporal model (prior) Likelihood of observing the image given the model parameters Posterior from previous time instant Bayesian Formulation

  27. Key Idea #4 (Ambiguity) * Represent a multi-modal posterior probability distribution over model parameters - sampled representation - each sample is a pose and its probability - predict over time using a particle filtering approach. Samples from a distribution over 3D poses.

  28. Particle Filter Posterior Temporal dynamics sample sample normalize Posterior Likelihood

  29. Elbow bends What does the posterior look like? Shoulder: 3dof Elbow: 1dof

  30. Stochastic 3D Tracking Preliminary result * 2500 samples, multiple cues.

  31. Conclusions Inferring human motion, silly or not, from video is challenging. We have tackled three important parts of the problem: 1. Probabilistically modeling human appearance in a generic, yet useful, way. 2. Representing the range of possible motions using techniques from texture modeling. 3. Dealing with ambiguities and non-linearities using particle filtering for Bayesian inference.

  32. Learned Walking Model * mean walker

  33. Learned Walking Model * sample with small e

  34. Learned Walking Model * sample with moderate e

  35. Learned Walking Model (Silly-Walk Generator) * sample with very large e

  36. Tracking with Occlusion Preliminary result 1500 samples, ~2 minutes/frame.

  37. Moving Camera Preliminary result 1500 samples, ~2 minutes/frame.

  38. Ongoing and Future Work Hybrid Monte Carlo tracker (Choo and Fleet ’01) * analytic, differentiable, likelihood. Learned dynamics. Correlation across scale. Estimate background motion. Statistical models of color and texture. Automatic initialization. Training data and likelihood models to be available in the web.

  39. Lessons Learned * Probabilistic (Bayesian) framework allows - integration of information over time - modeling of priors * Particle filtering allows - multi-modal distributions - tracking with ambiguities and non-linear models * Learning image statistics and combining cues improves robustness and reduces computation

  40. Outlook 5 years: - Relatively reliable people tracking in monocular video. - Path is pretty clear. Next step: Beyond person-centric - people interacting with object/world … solve the vision problem. Beyond that: Recognizing action - goals, intentions, ... … solve the AI problem.

  41. Conclusions • * Generic, learned, model of appearance. • Combines multiple cues. • * Exploits work on image statistics. • * Use the 3D model to predict features. • * Principled way to chose filters. • * Model of foreground and background is • incorporated into the tracking framework. • exploits the ratio between foreground and background likelihood. • improves tracking.

  42. Motion Blur

  43. Requirements 1. Represent uncertainty and multiple hypotheses. 2. Model non-linear dynamics of the body. 3. Exploit image cues in a robust fashion. 4. Integrate information over time. 5. Combine multiple image cues.

  44. What Image Cues? Pixels? Temporal differences? Background differences? Edges? Color? Silhouettes?

  45. Brightness Constancy I(x, t+1) = I(x+u,t) + h Image motion of foreground as a function of the 3D motion of the body. Problem: no fixed model of appearance (drift).

  46. What do people look like? Changing background Varying shadows Occlusion Deforming clothing Low contrast limb boundaries What do non-people look like?

  47. Edges as a Cue? • Probabilistic model? • Under/over-segmentation, • thresholds, …

  48. Contrast Normalization? Lee, Mumford & Huang

  49. Contrast Normalization • Maximize difference between distributions • * e.g. Bhattarcharyya distance:

  50. Local Contrast Normalization

More Related