1 / 8

Variational Inference for the Indian Buffet Process

Variational Inference for the Indian Buffet Process. Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael and Yee Whye Teh AISTATS 2009 Presented by: John Paisley, Duke University, Dept. of ECE. Introduction.

owen
Download Presentation

Variational Inference for the Indian Buffet Process

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Variational Inference for the Indian Buffet Process Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael and Yee Whye Teh AISTATS 2009 Presented by: John Paisley, Duke University, Dept. of ECE

  2. Introduction • This paper provides variational inference equations for the stick-breaking construction of the Indian buffet process (IBP). In addition, bounds are given on truncated stick-breaking approximations of the IBP to the infinite stick-breaking IBP. • Outline of Presentation • Review of IBP and stick-breaking construction • Variational inference for the IBP • Truncation error bounds for variational inference • Results on a linear-Gaussian model for toy and real data

  3. Indian Buffet Process • First customer selects features • The ith customer selects feature k with probability , fraction of all customers selecting this feature. • The ith customer then selects new features. Below is the probability of the binary matrix Z. The top term is the probability of K dishes, bottom is for permutation.

  4. The Stick-Breaking Construction of the IBP • Rather than marginalizing out ~ , being the probability of selecting a dish, a stick-breaking construction can be used.* (Note: The above generative process is written by the presenter. The probability values are presented in the paper in decreasing order as below) • This stick-breaking representation is for this specific parameterization of the beta distribution. ~ * Y.W. The, D. Gorur & Z. Ghahramani (2007). Stick-breaking construction for the Indian buffet process. 11th AISTAT.

  5. VB Inference for the Stick-Breaking Construction Focus on inference for the parameters A lower bound approximation needs to be made for one of the terms. This is given at right, where the authors introduce a multinomial distribution, q, and optimize for this parameter (lower right). This is for the likelihood of z, the posterior of v is more complicated. Using this multinomial lower bound, “terms decompose independently for each vm and we get a closed form exponential family update.”

  6. Truncation Error for VB Inference Given a truncation of the stick-breaking construction at level K, how close are we to the infinite model? A bound is given using the same motivation as Ishwaran & James* in their calculation for the Dirichlet process. * H. Ishwaran & L.F. James (2001). Gibbs sampling methods for stick-breaking priors. JASA. After deriving approximations, an upper bound is, At right is a comparison of this bound with an estimation of this value using 1000 Monte Carlo simulations for N = 30, \alpha = 5.

  7. Results: Synthetic Data (lower left) Randomly generated data and calculated the log-likelihoods of test data using the inferred models as a function of time. This indicates that variational inference is both better and faster. (right) More information about speed for toy problem.

  8. Results: Two Real Datasets • Yale Faces: 721, 32 x 32 images of 14 people with different expressions and lighting. • Speech Data: 245 observations from 10 microphones and 5 speakers • At right, we can see that the variational inference methods outperforms and is faster than Gibbs sampling for the Yale Faces • Performance and speed is worse for the speech dataset. A reason is that the dataset is only 10 dimensional, while Yale is 1032-D. In this small dimensional case, inference is fast for MCMC and the VB approximation becomes apparent.

More Related