1 / 40

Structured Representation in Neural Systems Vision is Hard Why is Vision Hard?

Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang. Structured Representation in Neural Systems Vision is Hard Why is Vision Hard? Hierarchies of Reusable Parts Demonstration System: Reading License Plates

yosefu
Download Presentation

Structured Representation in Neural Systems Vision is Hard Why is Vision Hard?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsing Images with Context/Content Sensitive GrammarsEran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

  2. Structured Representation in Neural Systems • Vision is Hard • Why is Vision Hard? • Hierarchies of Reusable Parts • Demonstration System: Reading License Plates • Generalization: Face Detection

  3. Artificial Intelligence • Knowledge Engineering engineer everything, learn nothing • Learning Theory engineer nothing, learn everything • Both Lack Model

  4. Natural Intelligence • Strong Representation simulation and semantics • Hierarchy and Reusability ventral visual pathway, linguistics, compositionality

  5. Structured Representation in Neural Systems • Vision is Hard • Why is Vision Hard? • Hierarchies of Reusable Parts • Demonstration System: Reading License Plates • Generalization: Face Detection

  6. License plate images from Logan Airport Machines still can’t reliably read license plates

  7. Wafer ID’s Machines can’t read fixed-font fixed-scale characters as well as humans

  8. Super Bowl Machines can’t find the bad guys at the Super Bowl

  9. Structured Representation in Neural Systems • Vision is Hard • Why is Vision Hard? • Hierarchies of Reusable Parts • Demonstration System: Reading License Plates • Generalization: Face Detection

  10. same Empire style table twins Instantiation Vision is content sensitive

  11. Human Interactive Proofs “Clutter” Background is structured, and made of the same stuff!

  12. Structured Representation in Neural Systems • Vision is Hard • Why is Vision Hard? • Hierarchies of Reusable Parts • Demonstration System: Reading License Plates • Generalization: Face Detection

  13. Hierarchical of Reusable Parts e.g. animals, trees, rocks e.g. contours, intermediate objects “Bricks” e.g. linelets, curvelets, T-junctions e.g. discontinuities, gradient

  14. Hierarchy of Disjunctions of Conjunctions

  15. Hierarchy of Disjunctions of Conjunctions

  16. Hierarchy of Disjunctions of Conjunctions

  17. Hierarchy of Disjunctions of Conjunctions

  18. Hierarchy of Disjunctions of Conjunctions

  19. Hierarchy of Disjunctions of Conjunctions

  20. Hierarchy of Disjunctions of Conjunctions

  21. selected subgraph Interpretations and Probabilities Interpretation

  22. selected subgraph Interpretations and Probabilities Interpretation

  23. Interpretations and Probabilities GRAPHICAL MODEL (Markov) LIKELIHOOD RATIO (non-Markov) X

  24. Generative (Bayesian) Model

  25. Structured Representation in Neural Systems • Vision is Hard • Why is Vision Hard? • Hierarchies of Reusable Parts • Demonstration System: Reading License Plates • Generalization: Face Detection

  26. Test set: 385 images, mostly from Logan Airport Courtesy of Visics Corporation

  27. Architecture license plates license numbers (3 digits + 3 letters, 4 digits + 2 letters) plate boundaries, strings (2 letters, 3 digits, 3 letters, 4 digits) generic letter, generic number, L-junctions of sides characters, plate sides parts of characters, parts of plate sides

  28. Image interpretation Original Image Top object Top 10 objects Top 25 objects

  29. Image interpretation Top objects Test image

  30. Performance • 385 images • Six plates read with mistakes (>98%) • Approx. 99.5% characters read correctly • Zero false positives

  31. Efficient discrimination: Markov versus Content-Sensitive dist. Original image Zoomed license region Top object under Markov distribution Top object under content-sensitive distribution

  32. Efficient discrimination: testing objects against their parts Test image 9 active “8” bricks under whole model 1 active “8” brick under parts model

  33. Summary Vision is Content Sensitive Non-Markovian probability models Background is Structured, and Made of the Same Stuff Objects come equipped with their own background models

  34. Structured Representation in Neural Systems • Vision is Hard • Why is Vision Hard? • Hierarchies of Reusable Parts • Demonstration System: Reading License Plates • Generalization: Face Detection

  35. Plates Face Detection Rigid Deformable “Black/White” Data Model Intensity Model Hand-Crafted Probabilities Learned Probabilities

  36. Face Hierarchy

  37. 0.6 1 Sampling from Data Model

  38. Sampling faces from the distribution

  39. PATTERN SYNTHESIS = PATTERN RECOGNITION Ulf Grenander

More Related