250 likes | 383 Views
Computer Vision, Part 2. Object recognition and scene “understanding”. What makes object recognition a hard task for computers? .
E N D
Computer Vision, Part 2 Object recognition and scene “understanding”
HMAX Riesenhuber, M. & Poggio, T. (1999),“Hierarchical Models of Object Recognition in Cortex”Serre, T., Wolf, L., Bileschi, S., Risenhuber, M., and Poggio, T. (2006),“Robust Object Recognition with Cortex-Like Mechanisms” • HMAX: A hierarchical neural-network model of object recognition. • Meant to model human vision at level of “immediate recognition” capabilities of ventral visual pathway, independent of attention or other top-down processes. • Also called “Standard Model” (because it incorporates the “standard model” of visual cortex) • Inspired by earlier “Neocognitron” model of Fukushima (1980)
General ideas behind model • “Immediate” visual processing is feedforward and hierachical: low levels detect simple features, which are combined hierarchically into increasingly complex features to be detected • Layers of hierarchy alternate between “sensitivity” (to detecting features) and “invariance” (to position, scale, orientation) • Size of receptive fields increases along the hierarchy • Degree of invariance increases along the hierarchy
The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.)
The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) Image (gray-scale)
The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) S1 layer Edge detectors Image (gray-scale)
The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) Layers alternate between “specificity” and “invariance” over position, scale, orientation C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)
The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) Layers alternate between “specificity” and “invariance” over position, scale, orientation S2 layer Prototypes (small image patches) C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)
The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) C2 layer Max activation over each prototype Layers alternate between “specificity” and “invariance” over position, scale, orientation S2 layer Prototypes (small image patches) C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)
The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) Classification layer Object or image classification C2 layer Max activation over each prototype Layers alternate between “specificity” and “invariance” over position, scale, orientation S2 layer Prototypes (small image patches) C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)
The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) Classification layer Object or image classification C2 layer Max activation over each prototype Job of HMAX is to produce a higher-level representation of an image that will be useful for classification. Layers alternate between “specificity” and “invariance” over position, scale, orientation S2 layer Prototypes (small image patches) C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)
S1 layer Edge detectors 4 orientations, 16 scales Image (gray-scale)
One S1 receptive field: Etc.: 16 scales
C1 layer Max activation over local S1 units (local position, scale) 4 orientations, 8 scales MAX MAX S1 layer Edge detectors 4 orientations, 16 scales Image (gray-scale)
S2 layer Calculate similarity to prototype (radial basis function) 4 orientations, 8 scales … C1 layer Max activation over local S1 units (local position, scale) 4 orientations, 8 scales S2 unit: Calculate similarity to prototype for each “pooled” position in C1 layer.
Prototypes (~1000, chosen from image collection, translated to C1 features) S2 layer Calculate similarity to prototype (radial basis function) 4 orientations, 8 scales … C1 layer Max activation over local S1 units (local position, scale) 4 orientations, 8 scales S2 unit: Calculate similarity to prototype for each “pooled” position in C1 layer.
Prototypes (~1000, chosen from image collection, translated to C1 features) S2 layer Calculate similarity to prototype (radial basis function) 4 orientations, 8 scales … Similarity: Radial basis function: C1 layer Max activation over local S1 units (local position, scale) 4 orientations, 8 scales S2 unit: Calculate similarity to prototype for each “pooled” position in C1 layer.
C2 layer Max activation over position, orientation, scale … MAX (1 value) MAX (1 value) S21 S22 … S2 layer Calculate similarity to prototype (radial basis function) 4 orientations, 8 scales …
Support Vector Machine classification (e.g., dog / not dog) C2 layer Max over position, orientation, scale … .11 .78 .32
Streetscenes “scene understanding” system(Bileschi, 2006) Use HMAX + SVM to identify object classes: Car, Pedestrian, Bicycle, Building, Tree
How Streetscenes Works(Bileschi, 2006) 1. Densely tile the image with windows of different sizes. 2. C1 and C2 features are computed in each window. 3. The features in each window are given as input to each of five trained support vector machines 4. If any return a classification with score above a learned threshold, that object is said to be “detected” . …
Object detection (here, “car”) with HMAX model (Bileschi, 2006)
Sample of results from HMAX model (Serre et al., 2006)