- By
**fiona** - Follow User

- 135 Views
- Uploaded on

Download Presentation
## Hidden Markov Models

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Hidden Markov Models

So far: considered systems for making a single decision

(e.g. discriminant functions or estimation of class-conditional densities.)

Now we consider: the problem of sequential decision making

Example: Automatic Speech Recognition (ASR).

In ASR, we need to determine a sequence of phonemes

(like vowels and consonants) that make up the observed

speech sound.

For this we will introduce Hidden Markov Models (HMMs):

P01760 Advanced Concepts in Signal Processing

First-order Markov Models

NOTE: First-order

depends only on

previous state

P01760 Advanced Concepts in Signal Processing

Markov Model State Transition Graph

P01760 Advanced Concepts in Signal Processing

Calculating the model probability

P01760 Advanced Concepts in Signal Processing

Calculating (cont)

P01760 Advanced Concepts in Signal Processing

Basic Markov Model: Example

P01760 Advanced Concepts in Signal Processing

Markov: Example 2

P01760 Advanced Concepts in Signal Processing

Hidden Markov Model

P01760 Advanced Concepts in Signal Processing

Hidden Markov model

This model shows all state transitions as being possible: not always the case.

P01760 Advanced Concepts in Signal Processing

Left-to-Right Models

P01760 Advanced Concepts in Signal Processing

Probability Parameters

P01760 Advanced Concepts in Signal Processing

3 central issues

P01760 Advanced Concepts in Signal Processing

Evaluation

P01760 Advanced Concepts in Signal Processing

Evaluation (cont)

P01760 Advanced Concepts in Signal Processing

Recursive calculation of P(VT)

Let us write P(VT) as:

However we don’t have to do the calculation in this order!

Re-ordering we get:

P01760 Advanced Concepts in Signal Processing

time

time

time

time

Recursive calculation of P(VT)Graphically we can illustrate this as follows (N.B. this is not a state transition diagram). We observe {v(1),v(2),v(3),…}.

P01760 Advanced Concepts in Signal Processing

HMM Forward Algorithm

P01760 Advanced Concepts in Signal Processing

HMM Forward Algorithm (cont)

P01760 Advanced Concepts in Signal Processing

Forward Algorithm Step

P01760 Advanced Concepts in Signal Processing

Evaluation Example

P01760 Advanced Concepts in Signal Processing

0

0

0.0011

0

0.0024

0.0052

0.09

0

1

0.0077

0.01

0

0

0.0002

0.2

0.0057

0

0.0007

0

Evaluation Example (cont)v1

v3

v2

v0

w0

0.2x0

Initial state

0.3x0.3

w1

0.1x0.1

w2

0.4x0.5

w3

t=0

1

2

3

4

P01760 Advanced Concepts in Signal Processing

Making Decisions

Given the ability to calculate the probability of an observed sequence. We can now compare different HMMs. This is just Bayesian Decision theory revisited!

Recall:

Hence given model θ1 and θ2 we select θ1 if:

Example: suppose θ1 = ‘y’-’e’-’s’ and θ2 = ‘n’-’o’. If we expect that the answer is more likely to be ‘yes’ we weight the priors accordingly.

P01760 Advanced Concepts in Signal Processing

An Alternative Recursion

Alternatively given:

We can re-ordering as:

P01760 Advanced Concepts in Signal Processing

The Backward Algorithm

P01760 Advanced Concepts in Signal Processing

Backward Algorithm

P01760 Advanced Concepts in Signal Processing

Decoding Problem

The problem is to choose the most likely state sequence, ωT, for a given observation sequence VT. Unlike the evaluation problem, this one is not uniquely defined. For example at time t we could find:

However this only finds the states that are individually most likely – hence the sequence, ωT, may not be viable.

P01760 Advanced Concepts in Signal Processing

Viterbi Algorithm

P01760 Advanced Concepts in Signal Processing

w1

w2

w3

t=0

1

2

3

4

Viterbi: is this possible?Optimal sequence for t = 1,2,3,4

Optimal sequence for t = 1,2,3

If not – why not?

P01760 Advanced Concepts in Signal Processing

Viterbi Algorithm

P01760 Advanced Concepts in Signal Processing

0

0.0004032

0

0

1

0

0.09

0.0027

0.00126

0.0063

0.01

0.000126

0

0

0

0.000504

0

0.0036

0.2

Decoding Examplev1

v3

v2

v0

w0

0.2x0

0x0

Initial state

0.3x0.3

0.09x0.3

w1

0.1x0.1

0.01x0.5

0.2x0.1

w2

0.4x0.5

w3

t=0

1

2

3

4

P01760 Advanced Concepts in Signal Processing

The Learning Problem (Briefly)

The 3rd problem is the most difficult. Aim: to learn the parameters, aij and bjkfrom a set of training data.

Obvious approach: Maximum Likelihood Learning

However we have a familiar problem:

That is: we must marginalize out the state sequences, ωT.

P01760 Advanced Concepts in Signal Processing

The Learning Problem (cont.)

Solution is similar to learning prior probability weights in MoGs (i.e. using EM a.k.a. Baum-Welch/Forward-Backward) we iteratively estimate the transition probabilities, and the emission probabilities,

The key ingredient is the following quantity:

i.e. it can be calculated from the Forward and Backward steps and the current estimates for and

P01760 Advanced Concepts in Signal Processing

Expected number of transitions from i→j

Expected number of occurrences of state j emitting vk

Expected number of transitions from i→anywhere

The Learning ProblemUpdating requires the estimated prob. of moving from state i to state j, hence:

Updating requires the estimated prob. of emitting visible symbol vk when in state j, hence:

Expected number of occurrences of state j

P01760 Advanced Concepts in Signal Processing

HMMs for speech recognition

- In ASR the observed data is usually a measure of the short term spectral properties of the speech. There are two popular approaches:
- Continuous Density observations – The finite states ω(t) are mapped into a continuous feature space using a MoG density model.
- VQ observations – the continuous feature space is discretized into a finite symbol set using vector quantization.

HMM for word 1

HMM for word 2

LPC feature analysis

&

Vector

Quantization

speech

signal

Select max.

output

word

.

.

.

HMM for word N

An example of an isolated word HMM recognition system:

P01760 Advanced Concepts in Signal Processing

Download Presentation

Connecting to Server..