Visual element discovery as discriminative mode seeking
This presentation is the property of its rightful owner.
Sponsored Links
1 / 40

Visual Element Discovery as Discriminative Mode Seeking PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on
  • Presentation posted in: General

Visual Element Discovery as Discriminative Mode Seeking. CMU CMU UCB. Carl Doersch , Abhinav Gupta, Alexei A. Efros. The need for mid-level representations. 6 billion images. 70 billion images. 1 billion images served daily.

Download Presentation

Visual Element Discovery as Discriminative Mode Seeking

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Visual Element Discovery as Discriminative Mode Seeking

CMU CMU UCB

Carl Doersch, Abhinav Gupta, Alexei A. Efros


The need for mid-level representations

6 billion images

70 billion images

1 billion images served daily

10 billion images

60 hours uploaded per minute

:

From

Almost 90% of web traffic is visual!


Discriminative patches

  • Visual words are too simple

  • Objects are too difficult

  • Something in the middle?

(Felzenswalb et al. 2008)

(Singh et al. 2012)


Mid-level “Visual Elements”

  • Simple enough to be detected easily

  • Complex enough to be meaningful

    • “Meaningful” as measured by weak labels

(Singh et al. 2012)

(Doersch et al. 2012)


Mid-level “Visual Elements”

  • Doersch et al. 2012

  • Singh et al. 2012

  • Jain et al. 2013

  • Endres et al. 2013

  • Juneja et al. 2013

(Singh et al. 2012)

(Doersch et al. 2012)

  • Li et al. 2013

  • Sun et al. 2013

  • Wang et al. 2013

  • Fouhey et al. 2013

  • Lee et al. 2013


Our goal

  • Provide a mathematical optimization for visual elements

  • Improve performance of mid-level representations.


Elements as Patch Classifiers


What if the labels are weak?

  • E.g. image has horse/no-horse

  • (Or even weaker, like Paris/not-Paris)

  • Idea: Label these

    all as “horse”

  • Problem: 10,000 patches per image, most of which are unclassifiable.


The weaker the label, the bigger the problem.

Task: Learn to classify Paris from Not-Paris

Paris

Also Paris


Other approaches

  • Latent SVM:

    • Assumes we have one instance per positive image

  • Multiple instance learning

    • Not clear how to define the bags


What if the labels are weak?

  • Negatives are negatives, positives might not be positive

  • Most of our data can be ignored

  • First: how to cluster without clustering everything

(Singh et al. 2012)

(Doersch et al. 2012)


Mean shift


Mean shift


Mean shift


Patch distances

Input

Nearest neighbor

Min distance:

2.59e-4

Max distance: 1.22e-4


Mean shift


Paris

Not Paris

Negative Set


Paris

Not Paris

Negative Set


Paris

Not Paris

Density Ratios


Paris

Not Paris

Density Ratios


Positive

Negative

Adaptive Bandwidth

Bandwidth


Discriminative Mode Seeking

  • Find local optima of an estimate of the density ratio

  • Allow an adaptive bandwidth

  • Be extremely fast

    • Minimize the number of passes through the data


Discriminative Mode Seeking

  • Mean shift: maximize (w.r.t. w)

w

Bandwidth

Patch Feature

Distance

Centroid

b


Discriminative Mode Seeking

B(w) is the value of b satisfying:


Discriminative Mode Seeking

  • Distance metric: Normalized Correlation

optimize

s.t.


Positive

Negative

Discriminative Mode Seeking

optimize

s.t.

w


Optimization

  • Initialization is straightforward

  • For each element, just keep around ~500 patches where wTx - b > 0

  • Trivially parallelizable in MapReduce.

  • Optimization is piecewise quadratic

s.t.


Evaluation via Purity-Coverage Plot

  • Analogous to Precision-Recall Plot


Low Purity

Element 1

Element 2

Element 3

Element 4

Element 5


High purity, Low Coverage

Element 1

Element 2

Element 3

Element 4

Element 5


Paris

Not Paris

Purity-Coverage Curve

Purity

x1e4 pixels

Coverage


Paris

Not Paris

Purity-Coverage Curve

Purity

x1e4 pixels

Coverage


Purity-Coverage Curve

  • Coverage for multiple elements is simply the union.


This work

Purity-Coverage

This work, no inter-element

SVM Retrained 5x (Doersch et al. 2012)

LDA Retrained 5x

LDA Retrained

Exemplar LDA (Hariharan et al. 2012)

Top 25 Elements

Top 200 Elements

1

0.98

0.96

0.94

0.92

Purity

0.9

0.88

0.86

0.84

0.82

0.8

0

0.1

0.2

0.3

0.4

0.5

0

0.2

0.4

0.6

0.8

Coverage (fraction of positive dataset)

Coverage (fraction of positive dataset)


Results on Indoor 67 Scenes

Kitchen

Grocery

Bowling

Bakery

Bathroom

Elevator


Results on Indoor 67 Scenes


Qualitative Indoor67 Results


Indoor67: Error Analysis

Guess: staircase

Guess: grocery store

GT: corridor

Ground Truth (GT): deli

GT: laundromat

GT: museum

Guess: garage

Guess: closet


Thank you!

More results at

http://graphics.cs.cmu.edu/projects/discriminativeModeSeeking/

Paris Elements • Indoor 67 Elements

Indoor 67 Heatmaps• Source code (soon)

Guess: staircase

Guess: grocery store

GT: corridor

Ground Truth (GT): deli

GT: laundromat

GT: museum

Guess: garage

Guess: closet


Some New Paris Elements


  • Login