Misc-read presentation: Jonathan Huang (jch1@cs.cmu) 4/19/2006

Download Presentation

Misc-read presentation: Jonathan Huang (jch1@cs.cmu) 4/19/2006

Loading in 2 Seconds...

- 124 Views
- Uploaded on
- Presentation posted in: General

Misc-read presentation: Jonathan Huang (jch1@cs.cmu) 4/19/2006

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Describing Visual Scenes using Transformed Dirichlet ProcessesErik B. Sudderth, Antonio Torralba, William T. Freeman, and Alan S. Willsky.In Adv. in Neural Information Processing Systems, 2005.

Misc-read presentation: Jonathan Huang (jch1@cs.cmu.edu)

4/19/2006

- An extension of the idea of using LDA on a visual bag-of-words by incorporating spatial structure into a generative model
- An approach to handling uncertainty about the number of instances of an object class within a scene

- Review Latent Dirichlet Allocation and application to visual scenes
- Dirichlet Processes
- Hierarchical Dirichlet Processes
- Transformed Dirichlet Processes
- Application to Visual Scenes
- Results

- In LDA, every document/image is a mixture of topics, where the mixture proportions are drawn from a Dirichlet prior.

j ranges over the documents

i ranges over the words in each document

Cow

Sky

Cow

Grass

Grass

Water

- How do we choose the number of topics for LDA?
- How can we put spatial structure into this model?

- Review Latent Dirichlet Allocation and application to visual scenes
- Dirichlet Processes
- Hierarchical Dirichlet Processes
- Transformed Dirichlet Processes
- Application to Visual Scenes
- Results

- The Dirichlet Distribution is defined on the K-dimensional simplex:
- This can be thought of as a distribution on the space of distributions over random variables which can take K possible values.

- The Dirichlet Process can be thought of as the infinite dimensional version of the Dirichlet Distribution. It is a distribution on the space of all distributions (a measure over measures if you prefer).
- Definition of a Dirichlet Process:
- The parameters to a DP are a positive number and a base distribution G0 on some measurable space .
- If a distribution G~DP(,G0), then for any partition (A1,…,AK) of ,
- Intuitively, this means that a draw G from a DP wants to look like the base distribution G0. In fact, the expectation of DP(,G0) is exactly G0, and as increases, it becomes more likely that G looks like G0.

- Important fact: samples from a DP are discrete distributions with probability 1.

- It is easier to think of the distribution we get by sampling from some G which is first sampled from a DP.
- The Polya Urn sampling scheme (Blackwell/Macqueen 1973) gives a way to draw from G (where G is never directly specified). Given a sequence 1,2,…,i-1 of i.i.d. previous draws from G,
- The Polya Urn scheme:
- is important if we want to use MCMC in models with a Dirichlet Process.
- Shows the clustering property of DPs

- The Polya urn scheme is closely related to the Chinese Restaurant Process.
- Consider a restaurant with infinitely many tables
- Customers i enter one at a time, choosing to either sit at a table with other customers, or to start a new table.
- A customer starts a new table with probability proportional to , and sits at an old table with probability proportional to the number of people at that table.

- Customers i enter one at a time, choosing to either sit at a table with other customers, or to start a new table.

- Infinite limit of mixture models as the # of mixture components tends to infinity.
- Gaussian mixture model example:

- There are various ways to do inference in these models which generally use MCMC or variational methods.
- Inference is much easier when the base distribution G0 and the data model are conjugate to each other.

(Plot: DP fits as a function of iterations within a variational inference procedure, figure from Michael Jordan tutorial)

(Plot: DP fits as the number of points increases, figure from Michael Jordan tutorial)

- Review Latent Dirichlet Allocation and application to visual scenes.
- Dirichlet Processes
- Hierarchical Dirichlet Processes
- Transformed Dirichlet Processes
- Application to Visual Scenes
- Results

- What happens if we put a prior on a Dirichlet Process?
- Why would we want to?
- We might have a collection of related documents or images, each of which is a mixture of gaussians

- Why would we want to?

- Chinese Restaurant Franchise
- Now consider a franchise with infinitely many restaurants
- People come into each restaurant as in the Dirichlet Process, but now:
- The first person to sit at a table gets to choose a dish for all further people at that table to share.

- All restaurants share the same set of (possibly infinite) dishes
- Popular dishes get more popular under this distribution

HDP Graphical Model

LDA Graphical Model

tji represents the ith table of the jth document

k_jt represents which dish is at table t for the jth document.

- Review Latent Dirichlet Allocation and application to visual scenes.
- Dirichlet Processes
- Hierarchical Dirichlet Processes
- Transformed Dirichlet Processes
- Application to Visual Scenes
- Results

- In the TDP, the global mixture components (the k’s) undergo a set of random transformations for each group (document/image).

LDA Graphical Model

HDP Graphical Model

TDP Graphical Model

- This is a twist on the Chinese Restaurant Franchise:
- Now, the first customer at a table not only gets to order a dish, but gets to season it in some way.

- Review Latent Dirichlet Allocation and application to visual scenes.
- Dirichlet Processes
- Hierarchical Dirichlet Processes
- Transformed Dirichlet Processes
- Application to Visual Scenes
- Results

- Groups (Restaurants) correspond to training or test images
- O is a fixed number of object categories
- Every cluster (object class instantiation) has a “canonical” mean and variance given by k, and is allowed to translate by jt

LDA Graphical Model

HDP Graphical Model

TDP Graphical Model

Visual Scene TDP Graphical Model

- Gaussian Mixture example:

- SIFT descriptors are computed over local elliptical regions and vector quantized to form 1800 visual words.

- Review Latent Dirichlet Allocation and application to visual scenes.
- Dirichlet Processes
- Hierarchical Dirichlet Processes
- Transformed Dirichlet Processes
- Application to Visual Scenes
- Results

- Dataset:
- 250 training images and 75 test images from the MIT-CSAIL database
- Images contain buildings, side-views of cars, roads.

- Training is semi-supervised, in the sense that some parts of each training image are labeled.
- For Training: 100 rounds of blocked Gibbs-sampling.
- For Testing: 50 rounds of blocked Gibbs-sampling with 10 random restarts.

- Remarks:
- TDP can estimate the number of object instantiations in each scene
- TDP “discovered” that buildings are large, and cars are small horizontal things.

- As claimed,
- This method goes beyond bag-of-words models to use spatial information
- And models the multiple instantiations of an object class within an image

- The results might be more convincing if more than three object classes were considered?

- References:
- Erik B. Sudderth, Antonio Torralba, William T. Freeman, and Alan S. Willsky. Describing Visual Scenes using Transformed Dirichlet Processes. In Adv. in Neural Information Processing Systems, 2005.
- Erik B. Sudderth, Antonio Torralba, William T. Freeman, and Alan S. Willsky. Depth from Familiar Objects. To appear in CVPR 2006.
- Michael Jordan. Dirichlet Processes, Chinese Restaurant Processes and All That. NIPS 2005 tutorial slides.