1 / 40

Using the Forest to see the Trees: A computational model relating features, objects and scenes

Using the Forest to see the Trees: A computational model relating features, objects and scenes. Antonio Torralba CSAIL-MIT. Joint work with. Aude Oliva, Kevin Murphy, William Freeman Monica Castelhano, John Henderson. SceneType 2 {street, office, …}. S. O 1. O 1. O 1. O 1. O 2. O 2.

thor
Download Presentation

Using the Forest to see the Trees: A computational model relating features, objects and scenes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using the Forest to see the Trees: A computational model relating features, objects and scenes Antonio Torralba CSAIL-MIT Joint work with Aude Oliva, Kevin Murphy, William Freeman Monica Castelhano, John Henderson

  2. SceneType 2 {street, office, …} S O1 O1 O1 O1 O2 O2 O2 O2 Object localization Local features L L L L Riesenhuber & Poggio (99); Vidal-Naquet & Ullman (03); Serre & Poggio, (05); Agarwal & Roth, (02), Moghaddam, Pentland (97), Turk, Pentland (91),Vidal-Naquet, Ullman, (03) Heisele, et al, (01), Agarwal & Roth, (02), Kremp, Geman, Amit (02), Dorko, Schmid, (03) Fergus, Perona, Zisserman (03), Fei Fei, Fergus, Perona, (03), Schneiderman, Kanade (00), Lowe (99) From objects to scenes I Image

  3. G Global gist features From scenes to objects SceneType 2 {street, office, …} S O1 O1 O1 O1 O2 O2 O2 O2 Object localization Local features L L L L I Image

  4. G Global gist features From scenes to objects SceneType 2 {street, office, …} S O1 O1 O1 O1 O2 O2 O2 O2 Object localization Local features L L L L I Image

  5. The context challenge What do you think are the hidden objects? 1 2 Biederman et al 82; Bar & Ullman 93; Palmer, 75;

  6. The context challenge What do you think are the hidden objects? Chance ~ 1/30000 Answering this question does not require knowing how the objects look like. It is all about context.

  7. G Global gist features From scenes to objects SceneType 2 {street, office, …} S Local features L L L L I Image

  8. Scene categorization Office Corridor Street Oliva & Torralba, IJCV’01; Torralba, Murphy, Freeman, Mark, CVPR 03.

  9. Place identification Office 610 Office 615 Draper street 59 other places… Scenes are categories, places are instances

  10. Supervised learning Vg , Office} { Vg , { Office} Classifier Vg , { Corridor} Vg , { Street} …

  11. Supervised learning Vg , Office} { Vg , { Office} Classifier Vg , { Corridor} Vg , { Street} Which feature vector for a whole image? …

  12. Global features (gist) First, we propose a set of features that do not encode specific object information Oliva & Torralba, IJCV’01; Torralba, Murphy, Freeman, Mark, CVPR 03.

  13. Global features (gist) First, we propose a set of features that do not encode specific object information V = {energy at each orientation and scale} = 6 x 4 dimensions 80 features | vt | PCA G Oliva & Torralba, IJCV’01; Torralba, Murphy, Freeman, Mark, CVPR 03.

  14. Example visual gists I I’ Global features (I) ~ global features (I’) Cf. “Pyramid Based Texture Analysis/Synthesis”, Heeger and Bergen, Siggraph, 1995

  15. Office 610 Office 617 Corridor 6b Corridor 6c Learning to recognize places We use annotated sequences for training • Hidden states = location (63 values) • Observations = vGt (80 dimensions) • Transition matrix encodes topology of environment • Observation model is a mixture of Gaussians centered on prototypes (100 views per place)

  16. Wearable test-bed v1 Kevin Murphy

  17. Wearable test-bed v2

  18. Place/scene recognition demo

  19. G Global gist features From scenes to objects SceneType 2 {street, office, …} S O1 O1 O1 O1 O2 O2 O2 O2 Object localization Local features L L L L I Image

  20. Global scene features predicts object location New image vg Image regions likely to contain the target

  21. Global scene features predicts object location Training set (cars) 1 } Vg , { X1 2 } Vg , { X2 The goal of the training is to learn the association between the location of the target and the global scene features 3 } Vg , { X3 4 } Vg , { X4 …

  22. Results for predicting the vertical location of people True Y Estimated Y Global scene features predicts object location vg X Results for predicting the horizontal location of people True X Estimated X

  23. The layered structure of scenes p(x) p(x2|x1) In a display with multiple targets present, the location of one target constraints the ‘y’ coordinate of the remaining targets, but not the ‘x’ coordinate.

  24. Global scene features predicts object location vg X Stronger contextual constraints can be obtained using other objects.

  25. 1

  26. 1

  27. Attentional guidance Saliency Local features Saliency models: Koch & Ullman, 85; Wolfe 94;Itti, Koch, Niebur, 98; Rosenholtz, 99

  28. Attentional guidance Saliency Local features Global features Scene prior TASK Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

  29. Attentional guidance Saliency Local features Object model Global features Scene prior TASK

  30. Comparison regions of interest Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

  31. 30% 20% 10% Comparison regions of interest Saliency predictions Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

  32. 30% 20% 10% Comparison regions of interest Saliency and Global scene priors Saliency predictions Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

  33. Comparison regions of interest 30% 20% 10% Saliency predictions Dots correspond to fixations 1-4 Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

  34. Comparison regions of interest 30% 20% 10% Saliency predictions Saliency and Global scene priors Dots correspond to fixations 1-4 Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

  35. 100 100 90 90 80 80 70 70 60 60 50 50 Results Scenes without people Scenes with people % of fixations inside the region 1 2 3 4 1 2 3 4 Fixation number Fixation number Contextual Region Saliency Region Chance level: 33 %

  36. Task modulation Saliency Local features Global features Scene prior TASK Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003

  37. Task modulation Saliency predictions Saliency and Global scene priors Mug search Painting search

  38. Discussion • From the computational perspective, scene context can be derived from global image properties and predict where objects are most likely to be. • Scene context considerably improves predictions of fixation locations. A complete model of attention guidance in natural scenes requires both saliency and contextual pathways

More Related