1 / 36

On Visual Recognition

On Visual Recognition. Jitendra Malik UC Berkeley. Water. back. Grass. Tiger. Tiger. Sand. head. eye. legs. tail. mouse. shadow. From Pixels to Perception. outdoor wildlife. Object Category Recognition. Defining Categories. What is a “visual category”? Not semantic

Download Presentation

On Visual Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Visual Recognition Jitendra Malik UC Berkeley

  2. Water back Grass Tiger Tiger Sand head eye legs tail mouse shadow From Pixels to Perception outdoor wildlife

  3. Object Category Recognition

  4. Defining Categories • What is a “visual category”? • Not semantic • Working hypothesis: Two instances of the same category must have “correspondence” (i.e. one can be morphed into the other) • e.g. Four-legged animals • Biederman’s estimate of 30,000 basic visual categories

  5. Facts from Biological Vision • Timing • Abstraction/Generalization • Taxonomy and Partonomy

  6. Detection can be very fast • On a task of judging animal vs no animal, humans can make mostly correct saccades in 150 ms (Kirchner & Thorpe, 2006) • Comparable to synaptic delay in the retina, LGN, V1, V2, V4, IT pathway. • Doesn’t rule out feed back but shows feed forward only is very powerful

  7. As Soon as You Know It Is There, You Know What It Is Grill-Spector & Kanwisher, Psychological Science, 2005

  8. Abstraction/Generalization • Configurations of oriented contours • Considerable toleration for small deformations

  9. Attneave’s Cat (1954)Line drawings convey most of the information

  10. Taxonomy and Partonomy • Taxonomy: E.g. Cats are in the order Felidae which in turn is in the class Mammalia • Recognition can be at multiple levels of categorization, or be identification at the level of specific individuals , as in faces. • Partonomy: Objects have parts, they have subparts and so on. The human body contains the head, which in turn contains the eyes. • These notions apply equally well to scenes and to activities. • Psychologists have argued that there is a “basic-level” at which categorization is fastest (Eleanor Rosch et al). • In a partonomy each level contributes useful information fro recognition.

  11. Matching with Exemplars • Use exemplars as templates • Correspond features between query and exemplar • Evaluate similarity score Database of Templates Query Image

  12. Matching with Exemplars • Use exemplars as templates • Correspond features between query and exemplar • Evaluate similarity score Database of Templates Query Image Best matching template is a helicopter

  13. 3D objects using multiple 2D views View selection algorithm from Belongie, Malik & Puzicha (2001)

  14. Error vs. Number of Views

  15. Three Big Ideas • Correspondence based on local shape/appearance descriptors • Deformable Template Matching • Machine learning for finding discriminative features

  16. Three Big Ideas • Correspondence based on local shape/appearance descriptors • Deformable Template Matching • Machine learning for finding discriminative features

  17. Comparing Pointsets

  18. Shape Context Count the number of points inside each bin, e.g.: Count = 4 ... Count = 10 • Compact representation of distribution of points relative to each point (Belongie, Malik & Puzicha, 2001)

  19. Shape Context

  20. Geometric Blur(Local Appearance Descriptor) Berg & Malik '01 Compute sparse channels from image Extract a patch in each channel Apply spatially varying blur and sub-sample ~ Descriptor is robust to small affine distortions Geometric Blur Descriptor (Idealized signal)

  21. Three Big Ideas • Correspondence based on local shape/appearance descriptors • Deformable Template Matching • Machine learning for finding discriminative features

  22. Modeling shape variation in a category • D’Arcy Thompson: On Growth and Form, 1917 • studied transformations between shapes of organisms

  23. MatchingExample model target

  24. Handwritten Digit Recognition • MNIST 600 000 (distortions): • LeNet 5: 0.8% • SVM: 0.8% • Boosted LeNet 4: 0.7% • MNIST 60 000: • linear: 12.0% • 40 PCA+ quad: 3.3% • 1000 RBF +linear: 3.6% • K-NN: 5% • K-NN (deskewed): 2.4% • K-NN (tangent dist.): 1.1% • SVM: 1.1% • LeNet 5: 0.95% • MNIST 20 000: • K-NN, Shape Context matching: 0.63%

  25. 171 of 192 images correctly identified: 92 % EZ-Gimpy Results horse spade smile join canvas here

  26. Three Big Ideas • Correspondence based on local shape/appearance descriptors • Deformable Template Matching • Machine learning for finding discriminative features

  27. 83/400 79/400 Discriminative learning(Frome, Singer, Malik, 2006) • weights on patch features in training images • distance functions from training images to any other images • browsing, retrieval, classification

  28. want: image j image k image i triplets • learn from relative similarity compare image-to-imagedistances image-to-image distances based on feature-to-image distances

  29. dij dik xijk ... 0.5 -0.2 focal image version image k ... image i (focal) 0.2 0.8 0.2 - 0.8 image j 0.4 0.3 ... 0.3 0.4 =

  30. large-margin formulation • slack variables like soft-margin SVM • w constrained to be positive • L2 regularization

  31. Caltech-101 [Fei-Fei et al. 04] • 102 classes, 31-300 images/class

  32. retrieval results: retrieval example query image

  33. Caltech 101 classification results(see Manik Verma’s talks for the best yet..)

  34. 15 training/class, 63.2%

  35. Conclusion • Correspondence based on local shape/appearance descriptors • Deformable Template Matching • Machine learning for finding discriminative features • Integrating Perceptual Organization and Recognition

More Related