1 / 41

Recognition by Association: ask not “What is it?” ask “What is it like ?”

Recognition by Association: ask not “What is it?” ask “What is it like ?”. Tomasz Malisiewicz and Alyosha Efros CMU. CVPR’08. Understanding an Image. slide by Fei Fei, Fergus & Torralba. Object naming -> Object categorization. sky. building. flag. face. banner. wall.

Download Presentation

Recognition by Association: ask not “What is it?” ask “What is it like ?”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recognition by Association:ask not “What is it?” ask“What is it like?” Tomasz Malisiewicz and Alyosha Efros CMU CVPR’08

  2. Understanding an Image slide by Fei Fei, Fergus & Torralba

  3. Object naming -> Object categorization sky building flag face banner wall street lamp bus bus cars slide by Fei Fei, Fergus & Torralba

  4. Object categorization sky building flag face banner wall street lamp bus bus cars

  5. Classical View of Categories • Dates back to Plato & Aristotle 1. Categories are defined by a list of properties shared by all elements in a category 2. Category membership is binary 3. Every member in the category is equal

  6. Problems with Classical View • Humans don’t do this! • People don’t rely on abstract definitions / lists of shared properties (Rosch 1973) • e.g. Are curtains furniture? • Typicality • e.g. Chicken -> bird, but bird -> eagle, pigeon, etc. • Intransitivity • e.g. car seat is chair, chair is furniture, but … • Not language-independent • e.g. “Women, Fire, and Dangerous Things” category is Australian aboriginal language (Lakoff 1987) • Doesn’t work even in human-defined domains • e.g. Is Pluto a planet?

  7. Different views of same object can be visually dis-similar Car Chair Problems with Visual Categories • A lot of categories are functional

  8. Categorization in Modern Psychology • Prototype Theory (Rosch 1973) • One or more summary representations (prototypes) for each category • Humans compute similarity between input and prototypes • Exemplar Theory (Medin & Schaffer 1978, Nosofsky 1986, Krushke 1992) • categories represented in terms of remembered objects (exemplars) • Similarity is measured between input and all exemplars • think non-parametric density estimation 8

  9. Different way of looking at recognition Input Image Building Car Car Car Road

  10. Different way of looking at recognition

  11. What is the ultimate goal? • Parsing Images • A “what is it like?” machine • A kind of “visual memex”

  12. LabelMe Dataset 12,905 Object Exemplars 171 unique ‘labels’ http://labelme.csail.mit.edu/ Recognition as Association

  13. Our Contributions • Posing Recognition as Association • Use large number of object exemplars • Learning Object Similarity • Different distance function per exemplar • Recognition-Based Object Segmentation • Use multiple segmentation approach 13

  14. Measuring Similarity

  15. Exemplar Representation Segment from LabelMe

  16. Bounding Box Dimensions Pixel Area Centered Mask Obj ~Obj Shape

  17. top,bot,left,right boundary Interior: Bag-of-Words Textons Texture

  18. Mean Color Color std Color Histogram Color

  19. Bounding Box Dimensions top,bot,left,right boundary Interior: Bag-of-Words Pixel Area 1 Centered Mask Textons Absolute Position Mask Mean Color Color std Color Histogram .8 Top Height Obj !Obj Bot Height .42 0 Location

  20. Distance “Similarity” Functions • Positive Linear Combinations of Elementary Distances Computed Over 14 Features Building e Distance Function Building e

  21. Learning Object Similarity • Learn a different distance function for each exemplar in training set • Formulation is similar to Frome et al [1,2] [1] Andrea Frome, Yoram Singer, Jitendra Malik. "Image Retrieval and Recognition Using Local Distance Functions." In NIPS, 2006. [2] Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik. "Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification." In ICCV, 2007.

  22. Class 1 Class 2 Class 3 Non-parametric density estimation Shape Dimension Color Dimension

  23. Class 1 Class 2 Class 3 Non-parametric density estimation Shape Dimension Color Dimension

  24. Class 1 Class 2 Class 3 Non-parametric density estimation Shape Dimension Color Dimension

  25. Learning Distance Functions Dcolor 25 Focal Exemplar Dshape

  26. Learning Distance Functions “similar” side “dissimilar” side Decision Boundary Dcolor Don’t Care 26 Focal Exemplar Dshape

  27. Query Top Neighbors with Tex-Hist Dist Query Top Neighbors with Learned Dist Visualizing Distance Functions (Training Set)

  28. Visualizing Distance Functions (Training Set)

  29. Visualizing Distance Functions (Training Set)

  30. Visualizing Distance Functions (Training Set)

  31. person person person person person Visualizing Distance Functions (Training Set)

  32. Different Label on “similar” side of distance function Visualizing Distance Functions (Training Set) standing person woman person person person person 32

  33. Labels Crossing Boundary

  34. “Conventional” Recognition in Test Set • Compute the similarity between an input and all exemplars • All exemplars with D < 1 are “associated” with the input • Most occurring label from associations is propagated onto input • Association confidence score favors more associations and smaller distances 34

  35. Performance on labeling perfect segments (test set)

  36. Object Segmentation via Recognition • Generate Multiple Segmentations (Hoiem 2005, Russell 2006, Malisiewicz 2007) • Mean-Shift and Normalized Cuts • Use pairs and triplets of adjacent segments • Generate about 10,000 segments per image • Enhance training with bad segments • Apply learned distance functions to bottom-up segments

  37. Example Associations Bottom-Up Segments

  38. Quantitative Evaluation OS(A,B) = Overlap Score = intersection(A,B) / union(A,B) Object hypothesis is correct if labels match and OS > .5 *We do not penalize for multiple correct overlapping associations 38

  39. Toward Image Parsing 39

  40. Conclusion: Main Points • Object Association: defining an object in terms of a set of visually similar objects. Trying to get away from classes. • Learning per-examplar-distances: each object gets to decide on its own distance function. Suddenly, NN distances are meaningful! • Using multiple segmentations: partition the input image into manageable chunks than can then be matched

  41. Thank You Questions? 41

More Related