Visual element discovery as discriminative mode seeking
This presentation is the property of its rightful owner.
Sponsored Links
1 / 40

Visual Element Discovery as Discriminative Mode Seeking PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on
  • Presentation posted in: General

Visual Element Discovery as Discriminative Mode Seeking. CMU CMU UCB. Carl Doersch , Abhinav Gupta, Alexei A. Efros. The need for mid-level representations. 6 billion images. 70 billion images. 1 billion images served daily.

Download Presentation

Visual Element Discovery as Discriminative Mode Seeking

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Visual element discovery as discriminative mode seeking

Visual Element Discovery as Discriminative Mode Seeking

CMU CMU UCB

Carl Doersch, Abhinav Gupta, Alexei A. Efros


The need for mid level representations

The need for mid-level representations

6 billion images

70 billion images

1 billion images served daily

10 billion images

60 hours uploaded per minute

:

From

Almost 90% of web traffic is visual!


Discriminative patches

Discriminative patches

  • Visual words are too simple

  • Objects are too difficult

  • Something in the middle?

(Felzenswalb et al. 2008)

(Singh et al. 2012)


Mid level visual elements

Mid-level “Visual Elements”

  • Simple enough to be detected easily

  • Complex enough to be meaningful

    • “Meaningful” as measured by weak labels

(Singh et al. 2012)

(Doersch et al. 2012)


Mid level visual elements1

Mid-level “Visual Elements”

  • Doersch et al. 2012

  • Singh et al. 2012

  • Jain et al. 2013

  • Endres et al. 2013

  • Juneja et al. 2013

(Singh et al. 2012)

(Doersch et al. 2012)

  • Li et al. 2013

  • Sun et al. 2013

  • Wang et al. 2013

  • Fouhey et al. 2013

  • Lee et al. 2013


Our goal

Our goal

  • Provide a mathematical optimization for visual elements

  • Improve performance of mid-level representations.


Elements as patch classifiers

Elements as Patch Classifiers


What if the labels are weak

What if the labels are weak?

  • E.g. image has horse/no-horse

  • (Or even weaker, like Paris/not-Paris)

  • Idea: Label these

    all as “horse”

  • Problem: 10,000 patches per image, most of which are unclassifiable.


The weaker the label the bigger the problem

The weaker the label, the bigger the problem.

Task: Learn to classify Paris from Not-Paris

Paris

Also Paris


Other approaches

Other approaches

  • Latent SVM:

    • Assumes we have one instance per positive image

  • Multiple instance learning

    • Not clear how to define the bags


What if the labels are weak1

What if the labels are weak?

  • Negatives are negatives, positives might not be positive

  • Most of our data can be ignored

  • First: how to cluster without clustering everything

(Singh et al. 2012)

(Doersch et al. 2012)


Mean shift

Mean shift


Mean shift1

Mean shift


Mean shift2

Mean shift


Patch distances

Patch distances

Input

Nearest neighbor

Min distance:

2.59e-4

Max distance: 1.22e-4


Mean shift3

Mean shift


Negative set

Paris

Not Paris

Negative Set


Negative set1

Paris

Not Paris

Negative Set


Density ratios

Paris

Not Paris

Density Ratios


Density ratios1

Paris

Not Paris

Density Ratios


Adaptive bandwidth

Positive

Negative

Adaptive Bandwidth

Bandwidth


Discriminative mode seeking

Discriminative Mode Seeking

  • Find local optima of an estimate of the density ratio

  • Allow an adaptive bandwidth

  • Be extremely fast

    • Minimize the number of passes through the data


Discriminative mode seeking1

Discriminative Mode Seeking

  • Mean shift: maximize (w.r.t. w)

w

Bandwidth

Patch Feature

Distance

Centroid

b


Discriminative mode seeking2

Discriminative Mode Seeking

B(w) is the value of b satisfying:


Discriminative mode seeking3

Discriminative Mode Seeking

  • Distance metric: Normalized Correlation

optimize

s.t.


Discriminative mode seeking4

Positive

Negative

Discriminative Mode Seeking

optimize

s.t.

w


Optimization

Optimization

  • Initialization is straightforward

  • For each element, just keep around ~500 patches where wTx - b > 0

  • Trivially parallelizable in MapReduce.

  • Optimization is piecewise quadratic

s.t.


Evaluation via purity coverage plot

Evaluation via Purity-Coverage Plot

  • Analogous to Precision-Recall Plot


Low purity

Low Purity

Element 1

Element 2

Element 3

Element 4

Element 5


High purity low coverage

High purity, Low Coverage

Element 1

Element 2

Element 3

Element 4

Element 5


Purity coverage curve

Paris

Not Paris

Purity-Coverage Curve

Purity

x1e4 pixels

Coverage


Purity coverage curve1

Paris

Not Paris

Purity-Coverage Curve

Purity

x1e4 pixels

Coverage


Purity coverage curve2

Purity-Coverage Curve

  • Coverage for multiple elements is simply the union.


Purity coverage

This work

Purity-Coverage

This work, no inter-element

SVM Retrained 5x (Doersch et al. 2012)

LDA Retrained 5x

LDA Retrained

Exemplar LDA (Hariharan et al. 2012)

Top 25 Elements

Top 200 Elements

1

0.98

0.96

0.94

0.92

Purity

0.9

0.88

0.86

0.84

0.82

0.8

0

0.1

0.2

0.3

0.4

0.5

0

0.2

0.4

0.6

0.8

Coverage (fraction of positive dataset)

Coverage (fraction of positive dataset)


Results on indoor 67 scenes

Results on Indoor 67 Scenes

Kitchen

Grocery

Bowling

Bakery

Bathroom

Elevator


Results on indoor 67 scenes1

Results on Indoor 67 Scenes


Qualitative indoor67 results

Qualitative Indoor67 Results


Indoor67 error analysis

Indoor67: Error Analysis

Guess: staircase

Guess: grocery store

GT: corridor

Ground Truth (GT): deli

GT: laundromat

GT: museum

Guess: garage

Guess: closet


Thank you

Thank you!

More results at

http://graphics.cs.cmu.edu/projects/discriminativeModeSeeking/

Paris Elements • Indoor 67 Elements

Indoor 67 Heatmaps• Source code (soon)

Guess: staircase

Guess: grocery store

GT: corridor

Ground Truth (GT): deli

GT: laundromat

GT: museum

Guess: garage

Guess: closet


Some new paris elements

Some New Paris Elements


  • Login