Visual element discovery as discriminative mode seeking
Download
1 / 40

Visual Element Discovery as Discriminative Mode Seeking - PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on

Visual Element Discovery as Discriminative Mode Seeking. CMU CMU UCB. Carl Doersch , Abhinav Gupta, Alexei A. Efros. The need for mid-level representations. 6 billion images. 70 billion images. 1 billion images served daily.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Visual Element Discovery as Discriminative Mode Seeking' - kurt


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Visual element discovery as discriminative mode seeking

Visual Element Discovery as Discriminative Mode Seeking

CMU CMU UCB

Carl Doersch, Abhinav Gupta, Alexei A. Efros


The need for mid level representations
The need for mid-level representations

6 billion images

70 billion images

1 billion images served daily

10 billion images

60 hours uploaded per minute

:

From

Almost 90% of web traffic is visual!


Discriminative patches
Discriminative patches

  • Visual words are too simple

  • Objects are too difficult

  • Something in the middle?

(Felzenswalb et al. 2008)

(Singh et al. 2012)


Mid level visual elements
Mid-level “Visual Elements”

  • Simple enough to be detected easily

  • Complex enough to be meaningful

    • “Meaningful” as measured by weak labels

(Singh et al. 2012)

(Doersch et al. 2012)


Mid level visual elements1
Mid-level “Visual Elements”

  • Doersch et al. 2012

  • Singh et al. 2012

  • Jain et al. 2013

  • Endres et al. 2013

  • Juneja et al. 2013

(Singh et al. 2012)

(Doersch et al. 2012)

  • Li et al. 2013

  • Sun et al. 2013

  • Wang et al. 2013

  • Fouhey et al. 2013

  • Lee et al. 2013


Our goal
Our goal

  • Provide a mathematical optimization for visual elements

  • Improve performance of mid-level representations.



What if the labels are weak
What if the labels are weak?

  • E.g. image has horse/no-horse

  • (Or even weaker, like Paris/not-Paris)

  • Idea: Label these

    all as “horse”

  • Problem: 10,000 patches per image, most of which are unclassifiable.


The weaker the label the bigger the problem
The weaker the label, the bigger the problem.

Task: Learn to classify Paris from Not-Paris

Paris

Also Paris


Other approaches
Other approaches

  • Latent SVM:

    • Assumes we have one instance per positive image

  • Multiple instance learning

    • Not clear how to define the bags


What if the labels are weak1
What if the labels are weak?

  • Negatives are negatives, positives might not be positive

  • Most of our data can be ignored

  • First: how to cluster without clustering everything

(Singh et al. 2012)

(Doersch et al. 2012)





Patch distances
Patch distances

Input

Nearest neighbor

Min distance:

2.59e-4

Max distance: 1.22e-4



Negative set

Paris

Not Paris

Negative Set


Negative set1

Paris

Not Paris

Negative Set


Density ratios

Paris

Not Paris

Density Ratios


Density ratios1

Paris

Not Paris

Density Ratios


Adaptive bandwidth

Positive

Negative

Adaptive Bandwidth

Bandwidth


Discriminative mode seeking
Discriminative Mode Seeking

  • Find local optima of an estimate of the density ratio

  • Allow an adaptive bandwidth

  • Be extremely fast

    • Minimize the number of passes through the data


Discriminative mode seeking1
Discriminative Mode Seeking

  • Mean shift: maximize (w.r.t. w)

w

Bandwidth

Patch Feature

Distance

Centroid

b


Discriminative mode seeking2
Discriminative Mode Seeking

B(w) is the value of b satisfying:


Discriminative mode seeking3
Discriminative Mode Seeking

  • Distance metric: Normalized Correlation

optimize

s.t.


Discriminative mode seeking4

Positive

Negative

Discriminative Mode Seeking

optimize

s.t.

w


Optimization
Optimization

  • Initialization is straightforward

  • For each element, just keep around ~500 patches where wTx - b > 0

  • Trivially parallelizable in MapReduce.

  • Optimization is piecewise quadratic

s.t.


Evaluation via purity coverage plot
Evaluation via Purity-Coverage Plot

  • Analogous to Precision-Recall Plot


Low purity
Low Purity

Element 1

Element 2

Element 3

Element 4

Element 5


High purity low coverage
High purity, Low Coverage

Element 1

Element 2

Element 3

Element 4

Element 5


Purity coverage curve

Paris

Not Paris

Purity-Coverage Curve

Purity

x1e4 pixels

Coverage


Purity coverage curve1

Paris

Not Paris

Purity-Coverage Curve

Purity

x1e4 pixels

Coverage


Purity coverage curve2
Purity-Coverage Curve

  • Coverage for multiple elements is simply the union.


Purity coverage

This work

Purity-Coverage

This work, no inter-element

SVM Retrained 5x (Doersch et al. 2012)

LDA Retrained 5x

LDA Retrained

Exemplar LDA (Hariharan et al. 2012)

Top 25 Elements

Top 200 Elements

1

0.98

0.96

0.94

0.92

Purity

0.9

0.88

0.86

0.84

0.82

0.8

0

0.1

0.2

0.3

0.4

0.5

0

0.2

0.4

0.6

0.8

Coverage (fraction of positive dataset)

Coverage (fraction of positive dataset)


Results on indoor 67 scenes
Results on Indoor 67 Scenes

Kitchen

Grocery

Bowling

Bakery

Bathroom

Elevator




Indoor67 error analysis
Indoor67: Error Analysis

Guess: staircase

Guess: grocery store

GT: corridor

Ground Truth (GT): deli

GT: laundromat

GT: museum

Guess: garage

Guess: closet


Thank you
Thank you!

More results at

http://graphics.cs.cmu.edu/projects/discriminativeModeSeeking/

Paris Elements • Indoor 67 Elements

Indoor 67 Heatmaps• Source code (soon)

Guess: staircase

Guess: grocery store

GT: corridor

Ground Truth (GT): deli

GT: laundromat

GT: museum

Guess: garage

Guess: closet



ad