1 / 44

Grammars in computer vision

Grammars in computer vision. Presented by: Thomas Kollar. Slides courtesy of Song-Chun Zhu. Context in computer vision. Outside the object (contextual features). Inside the object (intrinsic features). Object size. Pixels. Parts. Global appearance. Global context. Local context.

arch
Download Presentation

Grammars in computer vision

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

  2. Context in computer vision Outside the object (contextual features) Inside the object (intrinsic features) Object size Pixels Parts Global appearance Global context Local context Kruppa & Shiele, (03), Fink & Perona (03) Carbonetto, Freitas, Barnard (03), Kumar, Hebert, (03) He, Zemel, Carreira-Perpinan (04), Moore, Essa, Monson, Hayes (99) Strat & Fischler (91), Torralba (03), Murphy, Torralba & Freeman (03) Agarwal & Roth, (02), Moghaddam, Pentland (97), Turk, Pentland (91),Vidal-Naquet, Ullman, (03) Heisele, et al, (01), Agarwal & Roth, (02), Kremp, Geman, Amit (02), Dorko, Schmid, (03) Fergus, Perona, Zisserman (03), Fei Fei, Fergus, Perona, (03), Schneiderman, Kanade (00), Lowe (99) Etc.

  3. Guzman (SEE), 1968 Noton and Stark 1971 Hansen & Riseman (VISIONS), 1978 Barrow & Tenenbaum 1978 Brooks (ACRONYM), 1979 Marr, 1982 Ohta & Kanade, 1978 Yakimovsky & Feldman, 1973 Why grammars? [Ohta & Kanade 1978]

  4. Why grammars?

  5. Why grammars?

  6. Which papers? • F. Han and S.C. Zhu, Bottom-up/Top-down Image Parsing with Attribute Grammar, 2005. • ZijianXu; A hierarchical compositional model for representation and sketching of high-resolution human images, PhD Thesis 2007. • Song-Chun Zhu and David Mumford; A stochastic grammar of images, 2007. • L. Lin, S. Peng, J. Porway, S.C. Zhu, and Y. Wang, An empirical study of object category recognition: sequential testing with generalized samples, 2007.

  7. Datasets

  8. Large-scale image labeling

  9. Our Goal:

  10. Three projects using and-or graphs • Modeling an environment with rectangles. • Creating sketches

  11. Commonalities • Use context sensitive grammars Called And-Or graphs in these papers Provides top-down and bottom-up influence Most are generative all the way to the pixel level • Configuration matters E.g. they don’t assume independence given the parent These can take the form of a MRF

  12. Challenges • Objects have large within-category variations • Scenes have variation

  13. Challenges • Describing people has variation

  14. Grammar definition

  15. And-or graphs

  16. Modeling with rectangles

  17. Modeling with rectangles

  18. Six production rules

  19. Two examples

  20. Three phases • Bottom-up detection • Compute edge segments and a number of vanishing points. These vanishing points are grouped into a line set and rectangle hypotheses are found using RANSAC, generating a number of rectangles from a bottom up proposal. • Initialize the terminal nodes greedily • Pick the most promising hypotheses with heaviest weight by increase in posterior probability. • Incorporate top-down influence • Each step of the algorithm picks the most promising proposal among the 5 candidate rules by increase in posterior probability. • When a new non-terminal node is accepted (1) insert and create a new proposal (2) reweight the proposals (3) pass attributes between the node and parent.

  21. Probability Models • p(C_free) follows the primal sketch model. • p(G) is the probability of the parse tree • p(I | G) is the reconstruction likelihood

  22. Probability Models • p(l) is the probability of a rule • p(n | l) is the probability of the number of components given the type of rule. • p(X | l, n) is the probability of the geometry of A. • p(X(B) | X(A)) ensures regularities between the geometries (e.g. that aligned rectangles have almost the same shape). e.g. each square should look reasonable e.g. for the line rule, enforce that everything lines up

  23. Probability Models • Primal sketch model

  24. Inference: bottom-up detection of rectangles • RANSAC is run to propose a number of rectangles using vanishing points

  25. Inference: initialize terminal nodes • Input: candidate set of rectangles from previous phase • Output: a set of non-terminal nodes representing rectangles • While(not done): • re-compute weights • Greedily select the rectangle with the highest weight • Create a new non-terminal node in the grammar

  26. Inference: initialize terminal nodes • Input: non-terminal rectangles from previous step • Output: a parse graph • While (not done): • re-compute weights • Greedily select the highest weight candidate rule • Add rule to parse graph along with any top-down predictions. • Weights are computed similarly to before.

  27. Example of top-down/bottom-up inference

  28. Results

  29. Results

  30. Results

  31. Results

  32. ROC curve

  33. Generating sketches • Additional semantics

  34. Challenges • Geometric deformations clothes are very flexible • Photometric variabilities large variety of colors, shading and texture • Topological configurations combinatorial number of clothes designs

  35. Decomposing a sketch

  36. And-Or graph • “In a computing and recognition phase, we first activate some sub-templates in a bottom-up step. For example, we can detect the face and skin color to locate the coarse position of some components, which help to predict the positions of other components by context.”

  37. Sketch sub-parts

  38. Example grammar

  39. Sub-templates

  40. Probability model

  41. Overview of the algorithm

  42. Sketch results

  43. Sketch results

  44. Conclusions • Grammar-based model was presented for generating sketches. • Markov random fields at lowest level. • Top-down/bottom-up inference performed.

More Related