A generic model to compose vision modules for holistic scene understanding
Download
1 / 23

A generic model to compose vision modules for holistic scene understanding - PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on

A generic model to compose vision modules for holistic scene understanding. Adarsh Kowdle * , Congcong Li * , Ashutosh Saxena, and Tsuhan Chen Cornell University, Ithaca, NY, USA. * indicates equal contribution. Outline. Motivation Model Algorithm Results and Discussions Conclusions.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' A generic model to compose vision modules for holistic scene understanding' - reuel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
A generic model to compose vision modules for holistic scene understanding

A generic model to compose vision modules for holistic scene understanding

Adarsh Kowdle*, Congcong Li*,

Ashutosh Saxena, and Tsuhan Chen

Cornell University, Ithaca, NY, USA

* indicates equal contribution


Outline
Outline understanding

  • Motivation

  • Model

  • Algorithm

  • Results and Discussions

  • Conclusions

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen


Motivation

Motivation understanding


Motivation1
Motivation understanding

Scene Understanding

Vision tasks are highly related.

But, how do we connect them?

S

E

Object Detection

Depth Estimation

?

O

Event Categorization

D

Scene Categorization

L

Saliency Detection

Spatial Layout

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen


Motivation2
Motivation understanding

Li et al, CVPR’09

Hoiem et al, CVPR’08

Sudderth et al, CVPR’06

Saxena et al, IJCV’07

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen


Motivation3
Motivation understanding

  • A generic model which can treat each classifier as a “black-box” and compose them to incorporate the additional information automatically

S

E

?

O

D

L

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen


Motivation4
Motivation understanding

Visual attributes

Lampert et al, CVPR’09

Farhadi et al, CVPR’09

Wang et al, ICCV’09

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Ferrari et al, NIPS’07


Motivation5
Motivation understanding

  • Attributes for scene understanding?

  • A model which can compose the “black-box” classifiers and automatically exploit attributes for scene understanding

Bocce

“opencountry-like scene” attribute

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen


Motivation6
Motivation understanding

  • A model where the first layer is not trained to achieve the best independent performance, but achieve the best performance at the final output.

Cascaded classifier model (CCM)

Heitz, Gould, Saxena and Koller, NIPS’08

Features

φE(X)

φS(X)

φD(X)

φSal(X)

First level of classifiers

Depth

Saliency

Scene

Event

?

?

?

?

Second level of classifier

Event

Feed-forward

Final output

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen


Model

Model understanding


Model1
Model understanding

  • Proposed generic model enables composing “black-box” classifiers

  • Feedback results in the first layer learning “attributes” rather than labels

Features

φE(X)

φS(X)

φD(X)

φSal(X)

First level of classifiers

Depth

Saliency

Scene

Event

Feed-forward

Second level of classifier

Event

Feed-back

Final output

Attribute Learner

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen


Algorithm

Algorithm understanding


Algorithm1
Algorithm understanding

Optimization Goal

Features

φE(X(k))

φS(X(k))

φD(X(k))

φSal(X(k))

First level of classifiers

Event; θE

Depth; θD

Saliency; θSal

Scene; θS

TD

TSal

TS

TE

Feed-forward

Second level of classifier

Event; ωE

Feed-back

YE(k)

(Output)

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen


Algorithm2
Algorithm understanding

  • Our Solution: Motivated from Expectation – Maximization (EM) algorithm

    • Parameter Learning: fix the required outputs and estimate parameters

    • Latent Variable Estimation: fix the model parameters and estimate latent variables (first level outputs)

Features

φE(X(k))

φS(X(k))

φD(X(k))

φSal(X(k))

First level of classifiers

Event; θE

Depth; θD

Saliency; θSal

Scene; θS

θE

θSal

θS

θD

TD

TD

TSal

TSal

TS

TS

TE

TE

Feed-forward

Second level of classifier

Event; ωE

ωE

Feed-back

YE(k)

(Output)

YE(k)

(Output)

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen


Results and discussion

Results and Discussion understanding


Experiments
Experiments understanding

D

S

E

Sal

D

S

E

Sal

D

Scene Categorization

Oliva et al, IJCV’01

S

Event Categorization

Li et al, ICCV’07

D

D

S

S

E

Sal

E

Sal

E

Sal

Depth Estimation - Make3D

Saxena et al, IJCV’07

Saliency DetectionAchanta et al, CVPR’09

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen


Results
Results understanding

Improvement on every task with the same algorithm!

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen


Results visual improvements
Results: Visual improvements understanding

Depth Estimation

Original image

CCM [Heitz et. al]

Our proposed

Ground truth

Base – model

Saliency Detection

Original image

CCM [Heitz et. al]

Our proposed

Ground truth

Base – model

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen


Discussion – Attributes of the scene understanding

Maps of weights given to depth maps for scene categorization task

D

S

E

Sal

S

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen


Discussion – Attributes of the scene understanding

Weights given to event and scene attributes for event categorization

D

S

E

Sal

E

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen


Conclusions

Conclusions understanding


Conclusions1
Conclusions understanding

  • Generic model to compose multiple vision tasks to aid holistic scene understanding

    • “Black-box”

  • Feedback results in learning meaningful “attributes” instead of just the “labels”

  • Handles heterogeneous datasets

  • Improved performance for each of the tasks over state-of-art using the same learning algorithm

  • Joint optimization of all the tasks

    • Congcong Li, AdarshKowdle, AshutoshSaxena, and Tsuhan Chen, Feedback Enabled Cascaded Classification Models for Scene Understanding, NIPS 2010

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen


Thank you

Thank you understanding

Questions?


ad