Misc-read presentation: Jonathan Huang (jch1@cs.cmu) 4/19/2006

Describing Visual Scenes using Transformed Dirichlet ProcessesErik B. Sudderth, Antonio Torralba, William T. Freeman, and Alan S. Willsky.In Adv. in Neural Information Processing Systems, 2005. Misc-read presentation: Jonathan Huang (jch1@cs.cmu.edu) 4/19/2006

Paper Contributions • An extension of the idea of using LDA on a visual bag-of-words by incorporating spatial structure into a generative model • An approach to handling uncertainty about the number of instances of an object class within a scene

Outline • Review Latent Dirichlet Allocation and application to visual scenes • Dirichlet Processes • Hierarchical Dirichlet Processes • Transformed Dirichlet Processes • Application to Visual Scenes • Results

Latent Dirichlet Allocation (LDA) • In LDA, every document/image is a mixture of topics, where the mixture proportions are drawn from a Dirichlet prior. j ranges over the documents i ranges over the words in each document

Latent Dirichlet Allocation (LDA) Cow Sky Cow Grass Grass Water

Some Questions • How do we choose the number of topics for LDA? • How can we put spatial structure into this model?

Outline • Review Latent Dirichlet Allocation and application to visual scenes • Dirichlet Processes • Hierarchical Dirichlet Processes • Transformed Dirichlet Processes • Application to Visual Scenes • Results

Dirichlet Distributions • The Dirichlet Distribution is defined on the K-dimensional simplex: • This can be thought of as a distribution on the space of distributions over random variables which can take K possible values.

Dirichlet Processes (DP) • The Dirichlet Process can be thought of as the infinite dimensional version of the Dirichlet Distribution. It is a distribution on the space of all distributions (a measure over measures if you prefer). • Definition of a Dirichlet Process: • The parameters to a DP are a positive number  and a base distribution G0 on some measurable space . • If a distribution G~DP(,G0), then for any partition (A1,…,AK) of , • Intuitively, this means that a draw G from a DP wants to look like the base distribution G0. In fact, the expectation of DP(,G0) is exactly G0, and as  increases, it becomes more likely that G looks like G0. • Important fact: samples from a DP are discrete distributions with probability 1.

Dirichlet Processes (DP) • It is easier to think of the distribution we get by sampling from some G which is first sampled from a DP. • The Polya Urn sampling scheme (Blackwell/Macqueen 1973) gives a way to draw from G (where G is never directly specified). Given a sequence 1,2,…,i-1 of i.i.d. previous draws from G, • The Polya Urn scheme: • is important if we want to use MCMC in models with a Dirichlet Process. • Shows the clustering property of DPs

Chinese Restaurant Processes • The Polya urn scheme is closely related to the Chinese Restaurant Process. • Consider a restaurant with infinitely many tables • Customers i enter one at a time, choosing to either sit at a table with other customers, or to start a new table. • A customer starts a new table with probability proportional to , and sits at an old table with probability proportional to the number of people at that table.

DP Mixture Models • Infinite limit of mixture models as the # of mixture components tends to infinity. • Gaussian mixture model example:

DP Mixture Models (Inference) • There are various ways to do inference in these models which generally use MCMC or variational methods. • Inference is much easier when the base distribution G0 and the data model are conjugate to each other. (Plot: DP fits as a function of iterations within a variational inference procedure, figure from Michael Jordan tutorial) (Plot: DP fits as the number of points increases, figure from Michael Jordan tutorial)

Outline • Review Latent Dirichlet Allocation and application to visual scenes. • Dirichlet Processes • Hierarchical Dirichlet Processes • Transformed Dirichlet Processes • Application to Visual Scenes • Results

Hierarchical Dirichlet Processes (HDP) • What happens if we put a prior on a Dirichlet Process? • Why would we want to? • We might have a collection of related documents or images, each of which is a mixture of gaussians

Hierarchical Dirichlet Processes (HDP) • Chinese Restaurant Franchise • Now consider a franchise with infinitely many restaurants • People come into each restaurant as in the Dirichlet Process, but now: • The first person to sit at a table gets to choose a dish for all further people at that table to share. • All restaurants share the same set of (possibly infinite) dishes • Popular dishes get more popular under this distribution

Hierarchical Dirichlet Processes (HDP) HDP Graphical Model LDA Graphical Model tji represents the ith table of the jth document k_jt represents which dish is at table t for the jth document.

Transformed Dirichlet Processes (TDP) • In the TDP, the global mixture components (the k’s) undergo a set of random transformations for each group (document/image). LDA Graphical Model HDP Graphical Model TDP Graphical Model • This is a twist on the Chinese Restaurant Franchise: • Now, the first customer at a table not only gets to order a dish, but gets to season it in some way.

TDP on Visual Scenes • Groups (Restaurants) correspond to training or test images • O is a fixed number of object categories • Every cluster (object class instantiation) has a “canonical” mean and variance given by k, and is allowed to translate by jt LDA Graphical Model HDP Graphical Model TDP Graphical Model Visual Scene TDP Graphical Model

Transformed Dirichlet Processes (TDP) • Gaussian Mixture example:

Local Image Features • SIFT descriptors are computed over local elliptical regions and vector quantized to form 1800 visual words.

Results • Dataset: • 250 training images and 75 test images from the MIT-CSAIL database • Images contain buildings, side-views of cars, roads. • Training is semi-supervised, in the sense that some parts of each training image are labeled. • For Training: 100 rounds of blocked Gibbs-sampling. • For Testing: 50 rounds of blocked Gibbs-sampling with 10 random restarts.

Results • Remarks: • TDP can estimate the number of object instantiations in each scene • TDP “discovered” that buildings are large, and cars are small horizontal things.

Results

Conclusion • As claimed, • This method goes beyond bag-of-words models to use spatial information • And models the multiple instantiations of an object class within an image • The results might be more convincing if more than three object classes were considered?

Thanks! • References: • Erik B. Sudderth, Antonio Torralba, William T. Freeman, and Alan S. Willsky. Describing Visual Scenes using Transformed Dirichlet Processes. In Adv. in Neural Information Processing Systems, 2005. • Erik B. Sudderth, Antonio Torralba, William T. Freeman, and Alan S. Willsky. Depth from Familiar Objects. To appear in CVPR 2006. • Michael Jordan. Dirichlet Processes, Chinese Restaurant Processes and All That. NIPS 2005 tutorial slides.

Misc-read presentation: Jonathan Huang (jch1@cs.cmu) 4/19/2006

Misc-read presentation: Jonathan Huang (jch1@cs.cmu) 4/19/2006

Presentation Transcript