1 / 18

Infinite Hierarchical Hidden Markov Models - PowerPoint PPT Presentation

Infinite Hierarchical Hidden Markov Models. AISTATS 2009. Katherine A. Heller, Yee Whye Teh and Dilan Görür Lu Ren [email protected] University Nov 23, 2009. Outline. Hierarchical structure learning for sequential data Hierarchical hidden Markov model (HHMM)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'Infinite Hierarchical Hidden Markov Models' - haig

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

AISTATS 2009

Katherine A. Heller, Yee WhyeTeh and DilanGörür

Lu Ren

[email protected] University

Nov 23, 2009

• Hierarchical structure learning for sequential data

• Hierarchical hidden Markov model (HHMM)

• Infinite hierarchical hidden Markov model (IHHMM)

• Inference and learning

• Experiment results and demonstrations

• Related work and extensions

Sequential data generated

The sampled “states” used to generate the data

• Consider to infer correlated observations over long periods in the observation sequence.

• Potential application: language multi-resolution structure learning, video structure discovery, activity detection etc.

Hierarchical Hidden Markov Models (HHMM)

Multiscale models of sequences where each level of the model is a separate HMM emitting lower level HMMs in a recursive manner.

The generative process of one HHMM example [2]

2. The entire set of parameters

With a fixed model structure, the model is characterized by the following parameters [1]

with

with

with

• 3. Representing the HHMM as a DBN [2]

• Simply assume all production states are at the bottom and the state of HMM at level and time is represented by .

• specifies the complete “path” from the root to the leaf state.

• Indicator variable control completion of the HHMM at level and time .

An HHMM represented as a DBM [2]

IHHMM: allows the HHMM hierarchy to have a potentially infinite number of levels.

• Observation: State:

• Also a state transition indicating variable is introduced:

• indicate whether there is a completion of the HHMM at level right before time ;

• indicate presence of a state transition from to

• The conditional probability of is:

• There is an opportunity to transition at level only if there was a transition at level .

• The property implied by the structure:

• The number of transitions at level before a transition at level occurs is geometrically distributed with a mean .

• This implies that the expected number of time steps for which a state at level persists in its current value is .

• The states at higher levels persist longer.

• The first non-transitioning level at time , has the distribution

• is geometrically distributed with parameter if all

• The IHHMM allows for a potentially infinite number of levels.

The generative process for given is similar to the HHMM:

For the levels down to , the state is generated according to

The emissions matrix:

for the levels

• The IHHMM is performed using Gibbs sampling and a modified forward-backtrack algorithm.

• It iterates between the following two steps:

• Sampling state values with fixed parameters for each level

• Compute forward messages from to :

replace with for

• Resample and along the backward pass from to :

• When the top level is reached, a new level above it will be created by setting all states with 1;

• If the level below the current top level has no state transitions, it becomes the new top level.

2. Sampling parameters given the current state:

• Parameters are initialized as draws from the Dirichlet priors;

• Posteriors are calculated based on the counts of state transitions and emissions in the previous step.

Predicting new observations given the current state of the IHHMM:

1. Assume the top level learned from the IHHMM is , then calculate the following recursions from to :

2. Compute the probability of observing from :

1. Data generated: sample samplesample

Sequential data generated

The sampled “states” used to generate the data

2. Demonstrate the model can capture the hierarchical structure

• The first data set consists of repeats of integers increasing from 1 to 7, followed by repetitions of integers decreasing from 5 to 1, repeated twice.

• The second data is the first one concatenated with another series of repeated increasing and decreasing sequences of integers.

• 7 states is used in the model at all levels.

b)

The predictive log probability of the next integer is calculated:

HMM: 0.25 IHHMM: 0.31 HHMM: 0.30 (for 2-4 levels)

3. Spectral data from Handel’s Hallelujah chorus

4. Alice in Wonderland letters data set.

The difference in log predictive likelihood between IHHMM and a one level HMM learned by Gibbs sampling

The difference in log predictive likelihood between IHHMM and a HMM learned by EM

• The mean differences in both plots are positive, demonstrating that the IHHMM gives superior performance on this data.

• The long tails signifies that there are letters which can be better predicted with the higher hierarchical levels.

Relation to the HHMM:

IHHMM is a nonparametric extension of the HHMM for an unbounded hierarchy depth;

The completion of an internal HHMM is governed by an independent process.

Other related work:

Probabilistic context free grammars with multi-scale structure learning;

Infinite HMM, infinite factorial HMM;

Future work:

Make the number of states at each level infinite as well as the infinite HMM;

Higher order Markov chains;

More efficient inference algorithms.

[1] S. Fine, Y. Singer, and N. Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32: 41-62, 1998.

[2] K. Murphy and M.A. Paskin. Linear time inference in hierarchical HMMs. In Neural Information Processing Systems, 2001.