1 / 24

Computer Vision, Part 2

Computer Vision, Part 2. Object recognition and scene “understanding”. What makes object recognition a hard task for computers? .

elom
Download Presentation

Computer Vision, Part 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Vision, Part 2 Object recognition and scene “understanding”

  2. What makes object recognition a hard task for computers?

  3. HMAX Riesenhuber, M. & Poggio, T. (1999),“Hierarchical Models of Object Recognition in Cortex”Serre, T., Wolf, L., Bileschi, S., Risenhuber, M., and Poggio, T. (2006),“Robust Object Recognition with Cortex-Like Mechanisms” • HMAX: A hierarchical neural-network model of object recognition. • Meant to model human vision at level of “immediate recognition” capabilities of ventral visual pathway, independent of attention or other top-down processes. • Also called “Standard Model” (because it incorporates the “standard model” of visual cortex) • Inspired by earlier “Neocognitron” model of Fukushima (1980)

  4. General ideas behind model • “Immediate” visual processing is feedforward and hierachical: low levels detect simple features, which are combined hierarchically into increasingly complex features to be detected • Layers of hierarchy alternate between “sensitivity” (to detecting features) and “invariance” (to position, scale, orientation) • Size of receptive fields increases along the hierarchy • Degree of invariance increases along the hierarchy

  5. The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.)

  6. The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) Image (gray-scale)

  7. The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) S1 layer Edge detectors Image (gray-scale)

  8. The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) Layers alternate between “specificity” and “invariance” over position, scale, orientation C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)

  9. The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) Layers alternate between “specificity” and “invariance” over position, scale, orientation S2 layer Prototypes (small image patches) C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)

  10. The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) C2 layer Max activation over each prototype Layers alternate between “specificity” and “invariance” over position, scale, orientation S2 layer Prototypes (small image patches) C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)

  11. The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) Classification layer Object or image classification C2 layer Max activation over each prototype Layers alternate between “specificity” and “invariance” over position, scale, orientation S2 layer Prototypes (small image patches) C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)

  12. The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) Classification layer Object or image classification C2 layer Max activation over each prototype Job of HMAX is to produce a higher-level representation of an image that will be useful for classification. Layers alternate between “specificity” and “invariance” over position, scale, orientation S2 layer Prototypes (small image patches) C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)

  13. S1 layer Edge detectors 4 orientations, 16 scales Image (gray-scale)

  14. One S1 receptive field: Etc.: 16 scales

  15. C1 layer Max activation over local S1 units (local position, scale) 4 orientations, 8 scales MAX MAX S1 layer Edge detectors 4 orientations, 16 scales Image (gray-scale)

  16. S2 layer Calculate similarity to prototype (radial basis function) 4 orientations, 8 scales … C1 layer Max activation over local S1 units (local position, scale) 4 orientations, 8 scales S2 unit: Calculate similarity to prototype for each “pooled” position in C1 layer.

  17. Prototypes (~1000, chosen from image collection, translated to C1 features) S2 layer Calculate similarity to prototype (radial basis function) 4 orientations, 8 scales … C1 layer Max activation over local S1 units (local position, scale) 4 orientations, 8 scales S2 unit: Calculate similarity to prototype for each “pooled” position in C1 layer.

  18. Prototypes (~1000, chosen from image collection, translated to C1 features) S2 layer Calculate similarity to prototype (radial basis function) 4 orientations, 8 scales … Similarity: Radial basis function: C1 layer Max activation over local S1 units (local position, scale) 4 orientations, 8 scales S2 unit: Calculate similarity to prototype for each “pooled” position in C1 layer.

  19. C2 layer Max activation over position, orientation, scale … MAX (1 value) MAX (1 value) S21 S22 … S2 layer Calculate similarity to prototype (radial basis function) 4 orientations, 8 scales …

  20. Support Vector Machine classification (e.g., dog / not dog) C2 layer Max over position, orientation, scale … .11 .78 .32

  21. Streetscenes “scene understanding” system(Bileschi, 2006) Use HMAX + SVM to identify object classes: Car, Pedestrian, Bicycle, Building, Tree

  22. How Streetscenes Works(Bileschi, 2006) 1. Densely tile the image with windows of different sizes. 2. C1 and C2 features are computed in each window. 3. The features in each window are given as input to each of five trained support vector machines 4. If any return a classification with score above a learned threshold, that object is said to be “detected” . …

  23. Object detection (here, “car”) with HMAX model (Bileschi, 2006)

  24. Sample of results from HMAX model (Serre et al., 2006)

More Related