hidden markov models n.
Skip this Video
Loading SlideShow in 5 Seconds..
Hidden Markov Models PowerPoint Presentation
Download Presentation
Hidden Markov Models

play fullscreen
1 / 34
Download Presentation

Hidden Markov Models - PowerPoint PPT Presentation

fiona
139 Views
Download Presentation

Hidden Markov Models

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Hidden Markov Models So far: considered systems for making a single decision (e.g. discriminant functions or estimation of class-conditional densities.) Now we consider: the problem of sequential decision making Example: Automatic Speech Recognition (ASR). In ASR, we need to determine a sequence of phonemes (like vowels and consonants) that make up the observed speech sound. For this we will introduce Hidden Markov Models (HMMs): P01760 Advanced Concepts in Signal Processing

  2. First-order Markov Models NOTE: First-order depends only on previous state P01760 Advanced Concepts in Signal Processing

  3. Markov Model State Transition Graph P01760 Advanced Concepts in Signal Processing

  4. Calculating the model probability P01760 Advanced Concepts in Signal Processing

  5. Calculating (cont) P01760 Advanced Concepts in Signal Processing

  6. Basic Markov Model: Example P01760 Advanced Concepts in Signal Processing

  7. Markov: Example 2 P01760 Advanced Concepts in Signal Processing

  8. Hidden Markov Model P01760 Advanced Concepts in Signal Processing

  9. Hidden Markov model This model shows all state transitions as being possible: not always the case. P01760 Advanced Concepts in Signal Processing

  10. Left-to-Right Models P01760 Advanced Concepts in Signal Processing

  11. Probability Parameters P01760 Advanced Concepts in Signal Processing

  12. 3 central issues P01760 Advanced Concepts in Signal Processing

  13. Evaluation P01760 Advanced Concepts in Signal Processing

  14. Evaluation (cont) P01760 Advanced Concepts in Signal Processing

  15. Recursive calculation of P(VT) Let us write P(VT) as: However we don’t have to do the calculation in this order! Re-ordering we get: P01760 Advanced Concepts in Signal Processing

  16. time time time time time Recursive calculation of P(VT) Graphically we can illustrate this as follows (N.B. this is not a state transition diagram). We observe {v(1),v(2),v(3),…}. P01760 Advanced Concepts in Signal Processing

  17. HMM Forward Algorithm P01760 Advanced Concepts in Signal Processing

  18. HMM Forward Algorithm (cont) P01760 Advanced Concepts in Signal Processing

  19. Forward Algorithm Step P01760 Advanced Concepts in Signal Processing

  20. Evaluation Example P01760 Advanced Concepts in Signal Processing

  21. 0 0 0 0.0011 0 0.0024 0.0052 0.09 0 1 0.0077 0.01 0 0 0.0002 0.2 0.0057 0 0.0007 0 Evaluation Example (cont) v1 v3 v2 v0 w0 0.2x0 Initial state 0.3x0.3 w1 0.1x0.1 w2 0.4x0.5 w3 t=0 1 2 3 4 P01760 Advanced Concepts in Signal Processing

  22. Making Decisions Given the ability to calculate the probability of an observed sequence. We can now compare different HMMs. This is just Bayesian Decision theory revisited! Recall: Hence given model θ1 and θ2 we select θ1 if: Example: suppose θ1 = ‘y’-’e’-’s’ and θ2 = ‘n’-’o’. If we expect that the answer is more likely to be ‘yes’ we weight the priors accordingly. P01760 Advanced Concepts in Signal Processing

  23. An Alternative Recursion Alternatively given: We can re-ordering as: P01760 Advanced Concepts in Signal Processing

  24. The Backward Algorithm P01760 Advanced Concepts in Signal Processing

  25. Backward Algorithm P01760 Advanced Concepts in Signal Processing

  26. Decoding Problem The problem is to choose the most likely state sequence, ωT, for a given observation sequence VT. Unlike the evaluation problem, this one is not uniquely defined. For example at time t we could find: However this only finds the states that are individually most likely – hence the sequence, ωT, may not be viable. P01760 Advanced Concepts in Signal Processing

  27. Viterbi Algorithm P01760 Advanced Concepts in Signal Processing

  28. w0 w1 w2 w3 t=0 1 2 3 4 Viterbi: is this possible? Optimal sequence for t = 1,2,3,4 Optimal sequence for t = 1,2,3 If not – why not? P01760 Advanced Concepts in Signal Processing

  29. Viterbi Algorithm P01760 Advanced Concepts in Signal Processing

  30. 0 0 0.0004032 0 0 1 0 0.09 0.0027 0.00126 0.0063 0.01 0.000126 0 0 0 0.000504 0 0.0036 0.2 Decoding Example v1 v3 v2 v0 w0 0.2x0 0x0 Initial state 0.3x0.3 0.09x0.3 w1 0.1x0.1 0.01x0.5 0.2x0.1 w2 0.4x0.5 w3 t=0 1 2 3 4 P01760 Advanced Concepts in Signal Processing

  31. The Learning Problem (Briefly) The 3rd problem is the most difficult. Aim: to learn the parameters, aij and bjkfrom a set of training data. Obvious approach: Maximum Likelihood Learning However we have a familiar problem: That is: we must marginalize out the state sequences, ωT. P01760 Advanced Concepts in Signal Processing

  32. The Learning Problem (cont.) Solution is similar to learning prior probability weights in MoGs (i.e. using EM a.k.a. Baum-Welch/Forward-Backward) we iteratively estimate the transition probabilities, and the emission probabilities, The key ingredient is the following quantity: i.e. it can be calculated from the Forward and Backward steps and the current estimates for and P01760 Advanced Concepts in Signal Processing

  33. Expected number of transitions from i→j Expected number of occurrences of state j emitting vk Expected number of transitions from i→anywhere The Learning Problem Updating requires the estimated prob. of moving from state i to state j, hence: Updating requires the estimated prob. of emitting visible symbol vk when in state j, hence: Expected number of occurrences of state j P01760 Advanced Concepts in Signal Processing

  34. HMMs for speech recognition • In ASR the observed data is usually a measure of the short term spectral properties of the speech. There are two popular approaches: • Continuous Density observations – The finite states ω(t) are mapped into a continuous feature space using a MoG density model. • VQ observations – the continuous feature space is discretized into a finite symbol set using vector quantization. HMM for word 1 HMM for word 2 LPC feature analysis & Vector Quantization speech signal Select max. output word . . . HMM for word N An example of an isolated word HMM recognition system: P01760 Advanced Concepts in Signal Processing