- 220 Views
- Uploaded on
- Presentation posted in: General

Infinite Hierarchical Hidden Markov Models

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Infinite Hierarchical Hidden Markov Models

AISTATS 2009

Katherine A. Heller, Yee WhyeTeh and DilanGörür

Lu Ren

[email protected] University

Nov 23, 2009

Outline

- Hierarchical structure learning for sequential data
- Hierarchical hidden Markov model (HHMM)
- Infinite hierarchical hidden Markov model (IHHMM)
- Inference and learning
- Experiment results and demonstrations
- Related work and extensions

Multi-scale Structure

Sequential data generated

The sampled “states” used to generate the data

- Consider to infer correlated observations over long periods in the observation sequence.
- Potential application: language multi-resolution structure learning, video structure discovery, activity detection etc.

Hierarchical HMM (HHMM)

Hierarchical Hidden Markov Models (HHMM)

Multiscale models of sequences where each level of the model is a separate HMM emitting lower level HMMs in a recursive manner.

The generative process of one HHMM example [2]

Hierarchical HMM (HHMM)

2. The entire set of parameters

With a fixed model structure, the model is characterized by the following parameters [1]

with

with

with

- 3. Representing the HHMM as a DBN [2]
- Simply assume all production states are at the bottom and the state of HMM at level and time is represented by .
- specifies the complete “path” from the root to the leaf state.
- Indicator variable control completion of the HHMM at level and time .

Hierarchical HMM (HHMM)

An HHMM represented as a DBM [2]

Infinite Hierarchical HMM (IHHMM)

IHHMM: allows the HHMM hierarchy to have a potentially infinite number of levels.

- Observation: State:
- Also a state transition indicating variable is introduced:
- indicate whether there is a completion of the HHMM at level right before time ;
- indicate presence of a state transition from to
- The conditional probability of is:
- There is an opportunity to transition at level only if there was a transition at level .

Infinite Hierarchical HMM (IHHMM)

- The property implied by the structure:
- The number of transitions at level before a transition at level occurs is geometrically distributed with a mean .
- This implies that the expected number of time steps for which a state at level persists in its current value is .
- The states at higher levels persist longer.
- The first non-transitioning level at time , has the distribution
- is geometrically distributed with parameter if all
- The IHHMM allows for a potentially infinite number of levels.

Infinite Hierarchical HMM (IHHMM)

The generative process for given is similar to the HHMM:

For the levels down to , the state is generated according to

The emissions matrix:

for the levels

Inference and Learning

- The IHHMM is performed using Gibbs sampling and a modified forward-backtrack algorithm.
- It iterates between the following two steps:
- Sampling state values with fixed parameters for each level
- Compute forward messages from to :

replace with for

- Resample and along the backward pass from to :

Inference and Learning

- When the top level is reached, a new level above it will be created by setting all states with 1;
- If the level below the current top level has no state transitions, it becomes the new top level.

2. Sampling parameters given the current state:

- Parameters are initialized as draws from the Dirichlet priors;
- Posteriors are calculated based on the counts of state transitions and emissions in the previous step.

Predicting new observations given the current state of the IHHMM:

1. Assume the top level learned from the IHHMM is , then calculate the following recursions from to :

Inference and Learning

2. Compute the probability of observing from :

Experiment Results

1. Data generated: sample samplesample

Sequential data generated

The sampled “states” used to generate the data

Experiment Results

2. Demonstrate the model can capture the hierarchical structure

- The first data set consists of repeats of integers increasing from 1 to 7, followed by repetitions of integers decreasing from 5 to 1, repeated twice.
- The second data is the first one concatenated with another series of repeated increasing and decreasing sequences of integers.
- 7 states is used in the model at all levels.

b)

Experiment Results

The predictive log probability of the next integer is calculated:

HMM: 0.25 IHHMM: 0.31 HHMM: 0.30 (for 2-4 levels)

3. Spectral data from Handel’s Hallelujah chorus

Experiment Results

4. Alice in Wonderland letters data set.

The difference in log predictive likelihood between IHHMM and a one level HMM learned by Gibbs sampling

The difference in log predictive likelihood between IHHMM and a HMM learned by EM

- The mean differences in both plots are positive, demonstrating that the IHHMM gives superior performance on this data.
- The long tails signifies that there are letters which can be better predicted with the higher hierarchical levels.

Final discussions

Relation to the HHMM:

IHHMM is a nonparametric extension of the HHMM for an unbounded hierarchy depth;

The completion of an internal HHMM is governed by an independent process.

Other related work:

Probabilistic context free grammars with multi-scale structure learning;

Infinite HMM, infinite factorial HMM;

Future work:

Make the number of states at each level infinite as well as the infinite HMM;

Higher order Markov chains;

More efficient inference algorithms.

Cited References

[1] S. Fine, Y. Singer, and N. Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32: 41-62, 1998.

[2] K. Murphy and M.A. Paskin. Linear time inference in hierarchical HMMs. In Neural Information Processing Systems, 2001.