1 / 17

Capturing Human Insight for Visual Learning

Capturing Human Insight for Visual Learning. Kristen Grauman Department of Computer Science University of Texas at Austin. Frontiers in Computer Vision Workshop, MIT August 22, 2011.

ghazi
Download Presentation

Capturing Human Insight for Visual Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Capturing Human Insight for Visual Learning Kristen Grauman Department of Computer Science University of Texas at Austin Frontiers in Computer Vision Workshop, MIT August 22, 2011 Work with SudheendraVijayanarasimhan, Adriana Kovashka, Devi Parikh, Prateek Jain, Sung JuHwang, and Jeff Donahue

  2. Problem: how to capture human insight about the visual world? • Point+label “mold” restrictive • Human effort expensive Annotator [tiny image montage by Torralba et al.] The complex space of visual objects, activities, and scenes.

  3. Problem: how to capture human insight about the visual world? • Our approach: Ask: Actively learn Annotator Listen: Explanations, Comparisons, Implied cues,… [tiny image montage by Torralba et al.] The complex space of visual objects, activities, and scenes.

  4. Deepening human communication to the system ? ? What is this? Is it ‘furry’? What’s worth mentioning? How do you know? < ? What property is changing here? Do you find him attractive? Why? Which is more ‘open’? [Donahue & Grauman ICCV 2011; Hwang & Grauman BMVC 2010; Parikh & Grauman ICCV 2011, CVPR 2011; Kovashka et al. ICCV 2011]

  5. Soliciting rationales • We propose to ask the annotator not just what, but also why. Is her form perfect? Is the team winning? Is it a safe route? How can you tell? How can you tell? How can you tell?

  6. Soliciting rationales Spatial rationale Spatial rationale Annotation task: Is her form perfect? How can you tell? pointed toes balanced falling knee angled Attribute rationale Synthetic contrast example Influence on classifier Attribute rationale balanced Good form balanced falling pointed toes Bad form pointed toes knee angled knee angled Synthetic contrast example [Zaidan et al. HLT 2007] [Donahue & Grauman, ICCV 2011]

  7. Rationale results • Scene Categories: How can you tell the scene category? • Hot or Not: What makes them hot (or not)? • Public Figures: What attributes make them (un)attractive? Collect rationales from hundreds of MTurk workers. [Donahue & Grauman, ICCV 2011]

  8. Rationale results Mean AP [Donahue & Grauman, ICCV 2011]

  9. Learning what to mention • Issue: presence of objects != significance • Our idea: Learn cross-modalrepresentation that accounts for “what to mention” • Visual: • Texture • Scene • Color… TAGS: Cow Birds Architecture Water Sky Birds Architecture Water Cow Sky Tiles • Textual: • Frequency • Relative order • Mutual proximity Training: human-given descriptions

  10. Learning what to mention View y View x Importance-aware semantic space [Hwang & Grauman, BMVC 2010]

  11. Learning what to mention: results Visual only Words+ Visual Our method Query Image [Hwang & Grauman, BMVC 2010]

  12. Problem: how to capture human insight about the visual world? • Our approach: Ask: Actively learn Annotator Listen: Explanations, Comparisons, Implied cues [tiny image montage by Torralba et al.] The complex space of visual objects, activities, and scenes.

  13. Traditional active learning • At each cycle, obtain label for the most informative or uncertain example. [Mackay 1992, Freund et al. 1997, Tong & Koller 2001, Lindenbaum et al. 2004, Kapoor et al. 2007,…] Current Model Annotator Labeled data Unlabeled data ? Active Selection

  14. Challenges in active visual learning • Annotation tasks vary in cost and info • Multipleannotators working parallel • Massive unlabeled pools of data Annotator Current Model Labeled data $ $ $ $ $ Unlabeled data $ ? Active Selection [Vijayanarasimhan & Grauman NIPS 2008, CVPR 2009, Vijayanarasimhan et al. CVPR 2010, CVPR 2011, Kovashka et al. ICCV 2011]

  15. Current classifier 110 101 111 Actively selected examples Hash table Unlabeled data Sub-linear time active selection We propose a novel hashing approach to identify the most uncertain examples in sub-linear time. For 4.5 million unlabeled instances, 10 minutes machine time per iter, vs. 60 hours for a naïve scan. [Jain, Vijayanarasimhan, Grauman, NIPS 2010]

  16. Live active learning results on Flickr test set Outperforms status quo data collection approach [Vijayanarasimhan & Grauman, CVPR 2011]

  17. Summary • Humans are not simply “label machines” • Widen access to visual knowledge • New forms of input, often requiring associated new learning algorithms • Manage large-scale annotation efficiently • Cost-sensitive active question asking • Live learning: moving beyond canned datasets

More Related