Hierarchical Beta Process and the Indian Buffet Process. by R. Thibaux and M. I. Jordan. Discussion led by Qi An. Outline. Introduction Indian buffet process (IBP) Beta process (BP) Connections between IBP and BP Hierarchical beta process (hBP) Application to document classification
by R. Thibaux and M. I. Jordan
Discussion led by
Each data is drawn from one mixture component
Number of mixture components is not set a prior
Distribution over partitions
Each data is associated with a set of latent Bernoulli variables
Cardinality of the set of features can vary
A “featural” description of objects
A natural way to define interesting topologies on cluster
May be appropriate for large number of clustersIntroduction
Beta process is a special case of independent increment process, or Levy process,
Levy process can be characterized by Levy measure. For beta process, it is
If we draw a set of points from a Poisson process with base measure v, then
As the representation shows, B is discrete with probability one.
When the base measure B0 is discrete: , then B has atoms at the same locations with
Here, Ω can be viewed as a set of potential features and the random measure B defines the probability that X can possess particular feature.
In Indian buffet process, X is the customer and its features are the dishes the customer taste.
It is proven that the observations from a beta process satisfy
The first customer will try Poi(γ) number of dishes (feature). After that , the new observation can taste previous dish j with probability and then try a number of new features
where is the total mass
As a result, beta process is a two-parameter (c, γ) generalization of the Indian buffet process.
This quantity becomes Poi(γ) if c0 (all customers share the same dishes) or Poi(n γ) if c∞ (no sharing).
Authors propose to generate an approximation, , of B
Let For each step n≥1
Consider a document classification problem. We have a training data set X, which is a list of documents. Each document is classified by one of n topics. We model a document by the set of words it contains. We assume document Xi,j is generated by including each word w independently with a probability pjw specific to topic j. These probabilities form a discrete measure Aj over all word space Ω. We can put a beta process BP(cj,B) prior on Aj.
Since we want the sharing across different topics, B has to be discrete. We thus put a beta process prior BP(c0,B0) on B, which allows sharing the same atoms among topics.
The HBP model can be summarized as:
This model can be solved with Monte Carlo inference algorithm.