139 Views

Download Presentation
## Hidden Markov Models

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Hidden Markov Models**So far: considered systems for making a single decision (e.g. discriminant functions or estimation of class-conditional densities.) Now we consider: the problem of sequential decision making Example: Automatic Speech Recognition (ASR). In ASR, we need to determine a sequence of phonemes (like vowels and consonants) that make up the observed speech sound. For this we will introduce Hidden Markov Models (HMMs): P01760 Advanced Concepts in Signal Processing**First-order Markov Models**NOTE: First-order depends only on previous state P01760 Advanced Concepts in Signal Processing**Markov Model State Transition Graph**P01760 Advanced Concepts in Signal Processing**Calculating the model probability**P01760 Advanced Concepts in Signal Processing**Calculating (cont)**P01760 Advanced Concepts in Signal Processing**Basic Markov Model: Example**P01760 Advanced Concepts in Signal Processing**Markov: Example 2**P01760 Advanced Concepts in Signal Processing**Hidden Markov Model**P01760 Advanced Concepts in Signal Processing**Hidden Markov model**This model shows all state transitions as being possible: not always the case. P01760 Advanced Concepts in Signal Processing**Left-to-Right Models**P01760 Advanced Concepts in Signal Processing**Probability Parameters**P01760 Advanced Concepts in Signal Processing**3 central issues**P01760 Advanced Concepts in Signal Processing**Evaluation**P01760 Advanced Concepts in Signal Processing**Evaluation (cont)**P01760 Advanced Concepts in Signal Processing**Recursive calculation of P(VT)**Let us write P(VT) as: However we don’t have to do the calculation in this order! Re-ordering we get: P01760 Advanced Concepts in Signal Processing**time**time time time time Recursive calculation of P(VT) Graphically we can illustrate this as follows (N.B. this is not a state transition diagram). We observe {v(1),v(2),v(3),…}. P01760 Advanced Concepts in Signal Processing**HMM Forward Algorithm**P01760 Advanced Concepts in Signal Processing**HMM Forward Algorithm (cont)**P01760 Advanced Concepts in Signal Processing**Forward Algorithm Step**P01760 Advanced Concepts in Signal Processing**Evaluation Example**P01760 Advanced Concepts in Signal Processing**0**0 0 0.0011 0 0.0024 0.0052 0.09 0 1 0.0077 0.01 0 0 0.0002 0.2 0.0057 0 0.0007 0 Evaluation Example (cont) v1 v3 v2 v0 w0 0.2x0 Initial state 0.3x0.3 w1 0.1x0.1 w2 0.4x0.5 w3 t=0 1 2 3 4 P01760 Advanced Concepts in Signal Processing**Making Decisions**Given the ability to calculate the probability of an observed sequence. We can now compare different HMMs. This is just Bayesian Decision theory revisited! Recall: Hence given model θ1 and θ2 we select θ1 if: Example: suppose θ1 = ‘y’-’e’-’s’ and θ2 = ‘n’-’o’. If we expect that the answer is more likely to be ‘yes’ we weight the priors accordingly. P01760 Advanced Concepts in Signal Processing**An Alternative Recursion**Alternatively given: We can re-ordering as: P01760 Advanced Concepts in Signal Processing**The Backward Algorithm**P01760 Advanced Concepts in Signal Processing**Backward Algorithm**P01760 Advanced Concepts in Signal Processing**Decoding Problem**The problem is to choose the most likely state sequence, ωT, for a given observation sequence VT. Unlike the evaluation problem, this one is not uniquely defined. For example at time t we could find: However this only finds the states that are individually most likely – hence the sequence, ωT, may not be viable. P01760 Advanced Concepts in Signal Processing**Viterbi Algorithm**P01760 Advanced Concepts in Signal Processing**w0**w1 w2 w3 t=0 1 2 3 4 Viterbi: is this possible? Optimal sequence for t = 1,2,3,4 Optimal sequence for t = 1,2,3 If not – why not? P01760 Advanced Concepts in Signal Processing**Viterbi Algorithm**P01760 Advanced Concepts in Signal Processing**0**0 0.0004032 0 0 1 0 0.09 0.0027 0.00126 0.0063 0.01 0.000126 0 0 0 0.000504 0 0.0036 0.2 Decoding Example v1 v3 v2 v0 w0 0.2x0 0x0 Initial state 0.3x0.3 0.09x0.3 w1 0.1x0.1 0.01x0.5 0.2x0.1 w2 0.4x0.5 w3 t=0 1 2 3 4 P01760 Advanced Concepts in Signal Processing**The Learning Problem (Briefly)**The 3rd problem is the most difficult. Aim: to learn the parameters, aij and bjkfrom a set of training data. Obvious approach: Maximum Likelihood Learning However we have a familiar problem: That is: we must marginalize out the state sequences, ωT. P01760 Advanced Concepts in Signal Processing**The Learning Problem (cont.)**Solution is similar to learning prior probability weights in MoGs (i.e. using EM a.k.a. Baum-Welch/Forward-Backward) we iteratively estimate the transition probabilities, and the emission probabilities, The key ingredient is the following quantity: i.e. it can be calculated from the Forward and Backward steps and the current estimates for and P01760 Advanced Concepts in Signal Processing**Expected number of transitions from i→j**Expected number of occurrences of state j emitting vk Expected number of transitions from i→anywhere The Learning Problem Updating requires the estimated prob. of moving from state i to state j, hence: Updating requires the estimated prob. of emitting visible symbol vk when in state j, hence: Expected number of occurrences of state j P01760 Advanced Concepts in Signal Processing**HMMs for speech recognition**• In ASR the observed data is usually a measure of the short term spectral properties of the speech. There are two popular approaches: • Continuous Density observations – The finite states ω(t) are mapped into a continuous feature space using a MoG density model. • VQ observations – the continuous feature space is discretized into a finite symbol set using vector quantization. HMM for word 1 HMM for word 2 LPC feature analysis & Vector Quantization speech signal Select max. output word . . . HMM for word N An example of an isolated word HMM recognition system: P01760 Advanced Concepts in Signal Processing