1 / 19

Stick-breaking Construction for the Indian Buffet Process

Stick-breaking Construction for the Indian Buffet Process. Yee Whye The, Dilan Gorur, and Zoubin Ghahramani, AISTATS 2007. Duke University Machine Learning Group Presented by Kai Ni July 27, 2007. Outline. Introduction Indian Buffet Process (IBP) Stick-breaking construction for IBP

reece
Download Presentation

Stick-breaking Construction for the Indian Buffet Process

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stick-breaking Construction for the Indian Buffet Process Yee Whye The, Dilan Gorur, and Zoubin Ghahramani, AISTATS 2007 Duke University Machine Learning Group Presented by Kai Ni July 27, 2007

  2. Outline • Introduction • Indian Buffet Process (IBP) • Stick-breaking construction for IBP • Slice samplers • Results

  3. Introduction • Indian Buffet Process (IBP) • A distribution over binary matrices consisting of N rows (objects) and an unbounded number of columns (features); • 1/0 in entry (i,k) indicates feature k present/absent from object i. • An example • Objects are movies –“Terminator 2”, “Shrek” and “Shanghai Knights”; • Features are –“action”, “comedy”, “stars Jackie Chan”; • The matrix can be [101; 010; 110].

  4. Relationship to CRP • IBP and CRP are both tools for defining nonparametric Bayesian models with latent variables. • CRP – Each object belongs to only one of infinitely many latent classes. • IBP – Each object can possess potentially any combination of infinitely many latent features. • Previous Gibbs sampler for IBP is based on CRP. In this paper the author derives a stick-breaking representation for the IBP, and develop efficient slice samplers.

  5. Indiant Buffet Process • Let Z be a random binary N x K matrix, and denote entry (I,k) in Z by zik. For each feature k let uk be the prior probability that feature k is present in an object. • Let be the strength parameter of the IBP, the full model is: • If we integrated out uk and taking the limit of K -> infinity, we obtain the IBP in the situation similar to CRP.

  6. For new features: Gibbs sampler for IBP

  7. Stick-breaking construction for IBP

  8. Derivation cdf for each u pdf for each u cdf for u(1) pdf for u(1)

  9. Relation to DP

  10. Stick-breaking for IBP (2) In truncated stick-breaking for IBP, let K* be the truncation level. We set u(k)=0 for k>K*, and zik=0 for k>K*.

  11. Slice Sampler • Using Adaptive rejection sampling (ARS) to deal with the truncation level. Introduce an auxiliary slice variable s with

  12. Sampling • 1. Update s: if new s makes K* becomes larger, we iteratively draw u(k) until u(K*’) > s. • 2. Update Z: given s, we only need to update zik for each i and k<=K*. • 3. Update for k = 1,…, K*. • 4. Update u(k) for k = 1, …, K*. 1 2 K* K*’ Decreasing u(k) Old s New s Range of uniform dist. for s 0

  13. Change of Representations • IBP – ignoring the ordering on features; • Stick-breaking IBP – enforcing an ordering with decreasing weights. • Stick-breaking -> IBP: Drop the stick lengths and the inactive features, leaving only the K+ active feature columns along with the corresponding parameters. • IBP -> stick-breaking: Draw both the stick lengths and order the features in decreasing stick lengths, introducing Ko inactive features until

  14. Semi-ordered Stick-breaking • uk+ on active features are unordered and draw from a CRP similar distribution: • The stick length on inactive feature is similar to the stick-breaking IBP • The auxiliary variable s determines how many inactive features need to add. Ko (unordered 1~K+) Min(u(k)) s Range of uniform dist. for s 0

  15. Results • Used the conjugate linear-Gaussian binary latent feature model for comparing the performance of the different samplers. Each data point is modeled using a spherical Gaussian with mean zi,:A and variance

  16. Demonstration • Apply semi-ordered slice sampler to 1000 examples of handwritten images of 3’s in the MNIST dataset.

  17. Conclusion • The author derived novel stick-breaking representations of the Indian buffet process. • Based on these representations, new MCMC samplers are proposed that are easy to implement and work on more general models than Gibbs sampling.

More Related