1 / 40

Inference in generative models of images and video

Inference in generative models of images and video. John Winn MSR Cambridge May 2004. Overview. Generative vs. conditional models Combined approach Inference in the flexible sprite model Extending the model. Generative vs. conditional models.

ladonna
Download Presentation

Inference in generative models of images and video

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inference in generative models of images and video John Winn MSR Cambridge May 2004

  2. Overview • Generative vs. conditional models • Combined approach • Inference in the flexible sprite model • Extending the model

  3. Generative vs. conditional models We have an image I and latent variables Hwhich we wish to infer, e.g. object position, orientation, class. There will also be other sources of variability, e.g. illumination, parameterised by θ. Generative model: P(H, θ, I) Conditional model: P(H, θ|I) or P(H|I)

  4. Conditional models use features Features are functions of I which aim to be informative about Hbut invariant to θ. Edge features Corner features Blob features

  5. Conditional models Using features f(I), train a conditional model e.g. using labelled data Example: Viola & Jones face recognition using rectangle features and AdaBoost

  6. Conditional models Advantages • Simple - only model variables of interest • Inference is fast - due to use of features and simple model Disadvantages • Non-robust • Difficult to compare different models • Difficult to combine different models

  7. Generative models A generative model defines a process of generating the image pixels Ifrom the latent variables Hand θ, giving a joint distribution over all variables: P(H, θ, I) Learning and inference carried out using standard machine learning techniques e.g. Expectation Maximisation, MCMC, variational methods. No features!

  8. Generative models Example: image modeled as layers of ‘flexible’ sprites.

  9. Generative models Advantages • Accurate – as the entire image is modeled • Can compare different models • Can combine different models • Can generate new images Disadvantages • Inference is difficult due to local minima • Inference is slower due to complex model • Limitations on model complexity

  10. Combined approach Use a generative model, but speed up inference using proposal distributions given by a conditional model. A proposal R(X) suggests a new distribution over some of the latent variables XH, θ. Inference is extended to allow accepting or rejecting the proposal e.g. depending on whether it improves the model evidence.

  11. Using proposals in an MCMC framework Generative model: textured regions combined with face and text models Conditional model: face and text detector using AdaBoost (Viola & Jones) Proposals for text and faces Accepted proposals From Tu et al, 2003

  12. Using proposals in an MCMC framework Generative model: textured regions combined with face and text models Conditional model: face and text detector using AdaBoost (Viola & Jones) Proposals for text and faces Reconstructed image From Tu et al, 2003

  13. Proposals in the flexible sprite model

  14. Flexible sprite model Set of images e.g. frames from a video x

  15. Flexible sprite model x

  16. Flexible sprite model f π Sprite shape and appearance x

  17. Flexible sprite model f π Sprite transform for this image (discretised) T m x Transformed mask instance for this image

  18. Flexible sprite model b f π Background T m x

  19. Inference method & problems • Apply variational inference with factorised Q distribution • Slow – since we have to search entire discrete transform space • Limited size of transform space e.g. translations only (160120). • Many local minima.

  20. Proposals in the flexible sprite model • We wish to create a proposalR(T). • Cannot use features of the image directly until object appearance found. • Use features of the inferred mask. π proposal T m

  21. Moment-based features Use the first and second moments of the inferred mask as features. Learn a proposal distribution R(T). C-of-G of mask True location Contour of proposal distribution over object location Can also use R to get a probabilistic bound on T.

  22. Iteration #1

  23. Iteration #2

  24. Iteration #3

  25. Iteration #4

  26. Iteration #5

  27. Iteration #6

  28. Iteration #7

  29. Results on scissors video. Original Reconstruction • On average, ~1% of transform space searched. • Always converges, independent of initialisation. Foreground only

  30. Beyond translation

  31. Extended transform space Original Reconstruction

  32. Extended transform space Original Reconstruction

  33. Extended transform space Learned sprite appearance Normalised video

  34. Corner features Learned sprite appearance Masked normalised image

  35. Corner feature proposals

  36. Preliminary results

  37. Future directions

  38. Extensions to the generative model Very wide range of possible extensions: • Local appearance model e.g. patch-based • Multiple layered objects • Object classes • Illumination modelling • Incorporation of object-specific models e.g. faces • Articulated models

  39. Further investigation of using proposals Investigate other bottom-up features, including: • Optical flow • Color/texture • Use of standard invariant features e.g. SIFT • Discriminative models for particular object classes e.g. faces, text

  40. b f π T m x N

More Related