1 / 32

Object Recognition

Object Recognition. Jeremy Wyatt. Plan. David Marr: the model based approach to vision Model based approaches: Geons, Model Fitting Appearance based approaches: PCA, SIFT, implicit shape model Psychological Evidence: View dependent vs. view independent recognition Summary: who is right?.

leone
Download Presentation

Object Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object Recognition Jeremy Wyatt

  2. Plan • David Marr: the model based approach to vision • Model based approaches: Geons, Model Fitting • Appearance based approaches: PCA, SIFT, implicit shape model • Psychological Evidence: View dependent vs. view independent recognition • Summary: who is right?

  3. Model based vision • David Marr was a brilliant young British vision researcher who defined a coherent approach to the study of vision during the 1970s • According to one tradition coming out of Marr’s work: • Vision is process of reconstructing the 3d scene from 2d information • The vision system has representations of 3d geometric structures • Visual pipeline • So selecting models and recovering their parameters from image data is a key task in vision Intensity image Primal sketch 2.5d sketch Model selection

  4. Model based vision • There is an infinite variety of objects. How do we represent, store and access models of them efficiently? • One suggestion was the use of a small library of 3d parts from which many complex models can be constructed • There are many schemes: generalised cylinders, Geons, Superquadrics • Vision researchers set about applying them

  5. Models vs Appearances • But they didn’t work very well … • By the early 1990s people were experimenting with statistical techniques, e.g. PCA • These learn a statistical summary of the appearance of each view of an object Appearance Model

  6. Appearance based recognition: SIFT • These statistical approaches characterise some aspects of the appearance of an object that can be used to recognise it • But this means they are (largely) view dependent, you have to learn a different statistical model for each different view • e.g. SIFT based recognition (David Lowe, UBC) • Find interest points in the scale space • Re-describe the interest points so that they are robust to: • Image translation, scaling, rotation • Partially invariant to illumination changes, affine and 3d projection changes

  7. Category level recognition (Thanks to Bastian Liebe)

  8. Category level recognition (Thanks to Bastian Liebe)

  9. Category level recognition (Thanks to Bastian Liebe)

  10. Constellation model (Thanks to Bastian Liebe)

  11. Constellation Model (Thanks to Bastian Liebe)

  12. Implicit Shape Model (Thanks to Bastian Liebe)

  13. Implicit Shape Model (Thanks to Bastian Liebe)

  14. Implicit Shape Model (Thanks to Bastian Liebe)

  15. Implicit Shape Model (Thanks to Bastian Liebe)

  16. Implicit Shape Model (Thanks to Bastian Liebe)

  17. Implicit Shape Model (Thanks to Bastian Liebe)

  18. Implicit Shape Model (Thanks to Bastian Liebe)

  19. Implicit Shape Model (Thanks to Bastian Liebe)

  20. Implicit Shape Model (Thanks to Bastian Liebe)

  21. Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts Aleš Leonardis and Sanja Fidler University of Ljubljana Faculty of Computer and Information Science Visual Cognitive Systems Laboratory Reproduced with permission

  22. Framework • Main properties of the framework: • Computational plausibility • Hierarchical representation • Compositionality (parts composed of parts) • Indexing & matching recognition scheme • Statistics driven learning (unsupervised learning) • Fast, incremental (continuous) learning

  23. Recognition: Indexing and matching motorcycle dog person car hypotheses verification image LEARN Gradually limiting the search

  24. Overview of the architecture • Starts with simple, local features and learns more and more complex compositions • Learns layer after layer to exploit the regularities in natural images as efficiently and compactly as possible • Builds computationally feasible layers of parts by selecting only the most statistically significant compositions of specific granularity • Learns lower layers in a category independent way (to obtain optimally sharable parts) and category specific higher layers which contain only a small number of highly generalizable parts for each category • New categories can efficiently and continuously be added to the representation without the need to restructure the complete hierarchy • Implements parts in a robust, layered interplay of indexing & matching

  25. Part based appearance recognition (Fidler & Leonardis 07)

  26. Results • Learned hierarchy for faces and cars (first three layers are the same; links show compositionality for each of the categories; spatial variability of parts is not shown)

  27. Part based appearance recognition (Fidler & Leonardis 07)

  28. Results - Detections

  29. Results - Specific categories, faces • Detection of Layer5 parts

  30. Results - Specific categories, faces

  31. Evidence from biology • Is human object recognition view dependent? • Shepherd & Miller • Pinker & Tarr • There is a quite a large body of experimental data that supports the view dependent camp. • Appearance based approaches fit neatly with this camp.

  32. Summary • This is not a resolved debate • There is evidence for both sides • Structural 3d information is almost certainly extracted by the brain too • Model based: how do we extract good enough low level features (e.g. a depth map)? • Appearance based: only seems to be good for recognition, which is a small part of the vision problem.

More Related