Misc read presentation jonathan huang jch1@cs cmu edu 4 19 2006
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

Misc-read presentation: Jonathan Huang ([email protected]) 4/19/2006 PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on
  • Presentation posted in: General

Describing Visual Scenes using Transformed Dirichlet Processes Erik B. Sudderth, Antonio Torralba, William T. Freeman, and Alan S. Willsky. In Adv. in Neural Information Processing Systems, 2005. Misc-read presentation: Jonathan Huang ([email protected]) 4/19/2006. Paper Contributions.

Download Presentation

Misc-read presentation: Jonathan Huang ([email protected]) 4/19/2006

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Describing Visual Scenes using Transformed Dirichlet ProcessesErik B. Sudderth, Antonio Torralba, William T. Freeman, and Alan S. Willsky.In Adv. in Neural Information Processing Systems, 2005.

Misc-read presentation: Jonathan Huang ([email protected])

4/19/2006


Paper Contributions

  • An extension of the idea of using LDA on a visual bag-of-words by incorporating spatial structure into a generative model

  • An approach to handling uncertainty about the number of instances of an object class within a scene


Outline

  • Review Latent Dirichlet Allocation and application to visual scenes

  • Dirichlet Processes

  • Hierarchical Dirichlet Processes

  • Transformed Dirichlet Processes

  • Application to Visual Scenes

  • Results


Latent Dirichlet Allocation (LDA)

  • In LDA, every document/image is a mixture of topics, where the mixture proportions are drawn from a Dirichlet prior.

j ranges over the documents

i ranges over the words in each document


Latent Dirichlet Allocation (LDA)

Cow

Sky

Cow

Grass

Grass

Water


Some Questions

  • How do we choose the number of topics for LDA?

  • How can we put spatial structure into this model?


Outline

  • Review Latent Dirichlet Allocation and application to visual scenes

  • Dirichlet Processes

  • Hierarchical Dirichlet Processes

  • Transformed Dirichlet Processes

  • Application to Visual Scenes

  • Results


Dirichlet Distributions

  • The Dirichlet Distribution is defined on the K-dimensional simplex:

  • This can be thought of as a distribution on the space of distributions over random variables which can take K possible values.


Dirichlet Processes (DP)

  • The Dirichlet Process can be thought of as the infinite dimensional version of the Dirichlet Distribution. It is a distribution on the space of all distributions (a measure over measures if you prefer).

  • Definition of a Dirichlet Process:

    • The parameters to a DP are a positive number  and a base distribution G0 on some measurable space .

    • If a distribution G~DP(,G0), then for any partition (A1,…,AK) of ,

    • Intuitively, this means that a draw G from a DP wants to look like the base distribution G0. In fact, the expectation of DP(,G0) is exactly G0, and as  increases, it becomes more likely that G looks like G0.

  • Important fact: samples from a DP are discrete distributions with probability 1.


Dirichlet Processes (DP)

  • It is easier to think of the distribution we get by sampling from some G which is first sampled from a DP.

  • The Polya Urn sampling scheme (Blackwell/Macqueen 1973) gives a way to draw from G (where G is never directly specified). Given a sequence 1,2,…,i-1 of i.i.d. previous draws from G,

  • The Polya Urn scheme:

    • is important if we want to use MCMC in models with a Dirichlet Process.

    • Shows the clustering property of DPs


Chinese Restaurant Processes

  • The Polya urn scheme is closely related to the Chinese Restaurant Process.

  • Consider a restaurant with infinitely many tables

    • Customers i enter one at a time, choosing to either sit at a table with other customers, or to start a new table.

      • A customer starts a new table with probability proportional to , and sits at an old table with probability proportional to the number of people at that table.


DP Mixture Models

  • Infinite limit of mixture models as the # of mixture components tends to infinity.

  • Gaussian mixture model example:


DP Mixture Models (Inference)

  • There are various ways to do inference in these models which generally use MCMC or variational methods.

    • Inference is much easier when the base distribution G0 and the data model are conjugate to each other.

(Plot: DP fits as a function of iterations within a variational inference procedure, figure from Michael Jordan tutorial)

(Plot: DP fits as the number of points increases, figure from Michael Jordan tutorial)


Outline

  • Review Latent Dirichlet Allocation and application to visual scenes.

  • Dirichlet Processes

  • Hierarchical Dirichlet Processes

  • Transformed Dirichlet Processes

  • Application to Visual Scenes

  • Results


Hierarchical Dirichlet Processes (HDP)

  • What happens if we put a prior on a Dirichlet Process?

    • Why would we want to?

      • We might have a collection of related documents or images, each of which is a mixture of gaussians


Hierarchical Dirichlet Processes (HDP)

  • Chinese Restaurant Franchise

    • Now consider a franchise with infinitely many restaurants

    • People come into each restaurant as in the Dirichlet Process, but now:

      • The first person to sit at a table gets to choose a dish for all further people at that table to share.

    • All restaurants share the same set of (possibly infinite) dishes

    • Popular dishes get more popular under this distribution


Hierarchical Dirichlet Processes (HDP)

HDP Graphical Model

LDA Graphical Model

tji represents the ith table of the jth document

k_jt represents which dish is at table t for the jth document.


Outline

  • Review Latent Dirichlet Allocation and application to visual scenes.

  • Dirichlet Processes

  • Hierarchical Dirichlet Processes

  • Transformed Dirichlet Processes

  • Application to Visual Scenes

  • Results


Transformed Dirichlet Processes (TDP)

  • In the TDP, the global mixture components (the k’s) undergo a set of random transformations for each group (document/image).

LDA Graphical Model

HDP Graphical Model

TDP Graphical Model

  • This is a twist on the Chinese Restaurant Franchise:

    • Now, the first customer at a table not only gets to order a dish, but gets to season it in some way.


Outline

  • Review Latent Dirichlet Allocation and application to visual scenes.

  • Dirichlet Processes

  • Hierarchical Dirichlet Processes

  • Transformed Dirichlet Processes

  • Application to Visual Scenes

  • Results


TDP on Visual Scenes

  • Groups (Restaurants) correspond to training or test images

  • O is a fixed number of object categories

  • Every cluster (object class instantiation) has a “canonical” mean and variance given by k, and is allowed to translate by jt

LDA Graphical Model

HDP Graphical Model

TDP Graphical Model

Visual Scene TDP Graphical Model


Transformed Dirichlet Processes (TDP)

  • Gaussian Mixture example:


Local Image Features

  • SIFT descriptors are computed over local elliptical regions and vector quantized to form 1800 visual words.


Outline

  • Review Latent Dirichlet Allocation and application to visual scenes.

  • Dirichlet Processes

  • Hierarchical Dirichlet Processes

  • Transformed Dirichlet Processes

  • Application to Visual Scenes

  • Results


Results

  • Dataset:

    • 250 training images and 75 test images from the MIT-CSAIL database

    • Images contain buildings, side-views of cars, roads.

  • Training is semi-supervised, in the sense that some parts of each training image are labeled.

  • For Training: 100 rounds of blocked Gibbs-sampling.

  • For Testing: 50 rounds of blocked Gibbs-sampling with 10 random restarts.


Results

  • Remarks:

    • TDP can estimate the number of object instantiations in each scene

    • TDP “discovered” that buildings are large, and cars are small horizontal things.


Results


Conclusion

  • As claimed,

    • This method goes beyond bag-of-words models to use spatial information

    • And models the multiple instantiations of an object class within an image

  • The results might be more convincing if more than three object classes were considered?


Thanks!

  • References:

    • Erik B. Sudderth, Antonio Torralba, William T. Freeman, and Alan S. Willsky. Describing Visual Scenes using Transformed Dirichlet Processes. In Adv. in Neural Information Processing Systems, 2005.

    • Erik B. Sudderth, Antonio Torralba, William T. Freeman, and Alan S. Willsky. Depth from Familiar Objects. To appear in CVPR 2006.

    • Michael Jordan. Dirichlet Processes, Chinese Restaurant Processes and All That. NIPS 2005 tutorial slides.


  • Login