visual element discovery as discriminative mode seeking
Download
Skip this Video
Download Presentation
Visual Element Discovery as Discriminative Mode Seeking

Loading in 2 Seconds...

play fullscreen
1 / 40

Visual Element Discovery as Discriminative Mode Seeking - PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on

Visual Element Discovery as Discriminative Mode Seeking. CMU CMU UCB. Carl Doersch , Abhinav Gupta, Alexei A. Efros. The need for mid-level representations. 6 billion images. 70 billion images. 1 billion images served daily.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Visual Element Discovery as Discriminative Mode Seeking' - kurt


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
visual element discovery as discriminative mode seeking

Visual Element Discovery as Discriminative Mode Seeking

CMU CMU UCB

Carl Doersch, Abhinav Gupta, Alexei A. Efros

the need for mid level representations
The need for mid-level representations

6 billion images

70 billion images

1 billion images served daily

10 billion images

60 hours uploaded per minute

:

From

Almost 90% of web traffic is visual!

discriminative patches
Discriminative patches
  • Visual words are too simple
  • Objects are too difficult
  • Something in the middle?

(Felzenswalb et al. 2008)

(Singh et al. 2012)

mid level visual elements
Mid-level “Visual Elements”
  • Simple enough to be detected easily
  • Complex enough to be meaningful
    • “Meaningful” as measured by weak labels

(Singh et al. 2012)

(Doersch et al. 2012)

mid level visual elements1
Mid-level “Visual Elements”
  • Doersch et al. 2012
  • Singh et al. 2012
  • Jain et al. 2013
  • Endres et al. 2013
  • Juneja et al. 2013

(Singh et al. 2012)

(Doersch et al. 2012)

  • Li et al. 2013
  • Sun et al. 2013
  • Wang et al. 2013
  • Fouhey et al. 2013
  • Lee et al. 2013
our goal
Our goal
  • Provide a mathematical optimization for visual elements
  • Improve performance of mid-level representations.
what if the labels are weak
What if the labels are weak?
  • E.g. image has horse/no-horse
  • (Or even weaker, like Paris/not-Paris)
  • Idea: Label these

all as “horse”

  • Problem: 10,000 patches per image, most of which are unclassifiable.
the weaker the label the bigger the problem
The weaker the label, the bigger the problem.

Task: Learn to classify Paris from Not-Paris

Paris

Also Paris

other approaches
Other approaches
  • Latent SVM:
    • Assumes we have one instance per positive image
  • Multiple instance learning
    • Not clear how to define the bags
what if the labels are weak1
What if the labels are weak?
  • Negatives are negatives, positives might not be positive
  • Most of our data can be ignored
  • First: how to cluster without clustering everything

(Singh et al. 2012)

(Doersch et al. 2012)

patch distances
Patch distances

Input

Nearest neighbor

Min distance:

2.59e-4

Max distance: 1.22e-4

discriminative mode seeking
Discriminative Mode Seeking
  • Find local optima of an estimate of the density ratio
  • Allow an adaptive bandwidth
  • Be extremely fast
    • Minimize the number of passes through the data
discriminative mode seeking1
Discriminative Mode Seeking
  • Mean shift: maximize (w.r.t. w)

w

Bandwidth

Patch Feature

Distance

Centroid

b

discriminative mode seeking2
Discriminative Mode Seeking

B(w) is the value of b satisfying:

discriminative mode seeking3
Discriminative Mode Seeking
  • Distance metric: Normalized Correlation

optimize

s.t.

optimization
Optimization
  • Initialization is straightforward
  • For each element, just keep around ~500 patches where wTx - b > 0
  • Trivially parallelizable in MapReduce.
  • Optimization is piecewise quadratic

s.t.

evaluation via purity coverage plot
Evaluation via Purity-Coverage Plot
  • Analogous to Precision-Recall Plot
low purity
Low Purity

Element 1

Element 2

Element 3

Element 4

Element 5

high purity low coverage
High purity, Low Coverage

Element 1

Element 2

Element 3

Element 4

Element 5

purity coverage curve
Paris

Not Paris

Purity-Coverage Curve

Purity

x1e4 pixels

Coverage

purity coverage curve1
Paris

Not Paris

Purity-Coverage Curve

Purity

x1e4 pixels

Coverage

purity coverage curve2
Purity-Coverage Curve
  • Coverage for multiple elements is simply the union.
purity coverage
This workPurity-Coverage

This work, no inter-element

SVM Retrained 5x (Doersch et al. 2012)

LDA Retrained 5x

LDA Retrained

Exemplar LDA (Hariharan et al. 2012)

Top 25 Elements

Top 200 Elements

1

0.98

0.96

0.94

0.92

Purity

0.9

0.88

0.86

0.84

0.82

0.8

0

0.1

0.2

0.3

0.4

0.5

0

0.2

0.4

0.6

0.8

Coverage (fraction of positive dataset)

Coverage (fraction of positive dataset)

results on indoor 67 scenes
Results on Indoor 67 Scenes

Kitchen

Grocery

Bowling

Bakery

Bathroom

Elevator

indoor67 error analysis
Indoor67: Error Analysis

Guess: staircase

Guess: grocery store

GT: corridor

Ground Truth (GT): deli

GT: laundromat

GT: museum

Guess: garage

Guess: closet

thank you
Thank you!

More results at

http://graphics.cs.cmu.edu/projects/discriminativeModeSeeking/

Paris Elements • Indoor 67 Elements

Indoor 67 Heatmaps• Source code (soon)

Guess: staircase

Guess: grocery store

GT: corridor

Ground Truth (GT): deli

GT: laundromat

GT: museum

Guess: garage

Guess: closet

ad