1 / 29

Hidden Markov Model in Automatic Speech Recognition

Hidden Markov Model in Automatic Speech Recognition. Z. Fodroczi Pazmany Peter Catholic. Univ. Outline. Discrete Markov process Hidden Markov Model Vitterbi algorithm Forward algorithm Parameter estimation Type of them HMMs in ASR Systems Popular models State of the art and limitation.

thyra
Download Presentation

Hidden Markov Model in Automatic Speech Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hidden Markov Model in Automatic Speech Recognition Z. Fodroczi Pazmany Peter Catholic. Univ.

  2. Outline • Discrete Markov process • Hidden Markov Model • Vitterbi algorithm • Forward algorithm • Parameter estimation • Type of them • HMMs in ASR Systems • Popular models • State of the art and limitation

  3. Discrete Markov Processes States={S1,S2,S3} P(qt=Si|qt-1=Sj)=aij Sum(j=1..N)(aij)=1 a12 S2 S1 a11 a21 a22 • Consider the following model of weather: • S1: rain or snow • S2: cloudy • S3: sunny a23 a13 a32 a31 S3 a33 What is the probability that the weather for the next 3 days will be “sun,sun,rain”. O={S3,S3,S1} – observation sequence P(O|Model)=P(S3,S3,S1|Model)=P(S3)*P(S3|S3)*P(S3|S1)=1*0.8*0.3=0.24

  4. Hidden Markov Model The observation is a probabilistic function of the state. Teacher-mood-model Situation: Your school teacher gave three dierent types of daily homework assignments: • A: took about 5 minutes to complete • B: took about 1 hour to complete • C: took about 3 hours to complete Your teacher did not reveal openly his mood to you daily, but you knew that your teacher was either in a bad, neutral, or a good mood for a whole day. Mood changes occurred only overnight. Question: How were his moods related to the homework type assigned that day?

  5. Hidden Markov Model • Model parameters: • S - states {good, neutral, bad} • akl,- probability that state l is followed by k • Σ – alphabet {A,B,C} • ek(x) –probability that state k emit symbol x €Σ

  6. Hidden Markov Model One week, your teacher gave the following homework assignments: • Monday: A • Tuesday: C • Wednesday: B • Thursday: A • Friday: C Questions: • What did his mood curve look like most likely that week? (Searching for the most probable path – Viterbi algorithm) • What is the probability that he would assign this order of homework assignments? (Probability of a sequence - Forward algorithm) • How do we adjust the model parameters λ(S,aij,ei(x)) to maximize P(O| λ) (create a HMM for a given sequence set)

  7. HMM – Viterbi algorithm Given: • Hidden Markov model: S, akl ,Σ ,ek(x) • Observed symbol sequence E = x1x2, … xn. Most probable path of states that resulted in symbol sequence E. Let vk(i) be the probability of the most probable path of the symbol sequence x1, x2, …. xi ending in state k. Then:

  8. HMM – Viterbi algorithm Matrix vk(i), where k€Sand 1 <= I <= n. Initialization: vk(1) = ek(x1)/#states for all states k€S. vl(i) = el(xi)maxk(vk(i - 1)akl) for all states k€S, i >=2. Algorithm • Iteratively build up matrix vk(i). • Store pointers to chosen path. • Probability of most probable path in maximum entry in last column. • Reconstruct path along pointers.

  9. HMM – Viterbi algorithm • Empty table

  10. HMM – Viterbi algorithm

  11. HMM – Viterbi algorithm

  12. HMM – Viterbi algorithm

  13. HMM – Viterbi algorithm

  14. HMM – Viterbi algorithm

  15. HMM – Viterbi algorithm Question: What did his mood curve look like most likely that week? Answer: Most probable mood curve: Day: Mon Tue Wed Thu Fri Assignment: A C B A C Mood: good bad neutral good bad

  16. HMM – Forward algorithm Used to test the probability of a sequence. Given: • Hidden Markov model: S, akl, , el(x). • Observed symbol sequence E = x1; : : : ; xn. What is the probability of E. Let fk(i) be the probability of the symbol sequence x1, x2, …. xi ending in state k. Then:

  17. HMM – Forward algorithm Matrix fk(i), where k€Sand 1 <= i <= n. • Initialization: fk(1) = ek(x1)=#states for all states k€S. • fl(i) = el(xi)Pk(fk(i - 1)akl) for all states k€S, i <=2. Algorithm • Iteratively build up matrix fk(i). • Probability of symbol sequence is sum of entries in last column.

  18. HMM – Forward algorithm

  19. HMM – Forward algorithm

  20. HMM – Forward algorithm

  21. HMM – Parameter estimation

  22. HMM – Parameter estimation

  23. HMM – Parameter estimation

  24. Type of HMMs • Till now I was speaking about HMMs with discrete output (finite |Σ|). Extensions: • continuous observation probability density function • mixture of Gaussian pdfs

  25. HMMs in ASR Feature extraction Separation HMM decoder Understanding Application

  26. HMMs in ASR How HMM can used to classify feature sequences to known classes. Make a HMM to each class. By determineing the probability of a sequence to the HMMs, we can decide which HMM could most probable generate the sequence. There are several idea what to model: • Isolated word recognition ( HMM for each known word) Usable just on small dictionaries. (digit recognition etc.) Number of states usually >=4. Left-to-rigth HMM • Monophone acoustic model ( HMM for each phone) ~50 HMM • Triphone acoustic model (HMM for each three phone sequence) 50^3 = 125000 triphones each triphone has 3 state

  27. HMMs in ASR • Hierarchical system of HMMs HMM of a triphone HMM of a triphone HMM of a triphone Higher level HMM of a word Language model

  28. ASR state of the art Typical state-of-the-art large-vocabulary ASR system: - speaker independent - 64k word vocabulary - trigram (2-word context) language model - multiple pronunciations for each word - triphone or quinphone HMM-based acoustic model - 100-300X real-time recognition - WER 10%-50%

  29. HMM Limitations • Data intensive • Computationally intensive • 50 phones = 125000 possible triphones • 3 states per triphone • 3 Gaussian mixture for each state • 64k word vocabulary • 262 trillion trigrams • 2-20 phonemes per word in 64k vocabulary • 39 dimensional feature vector sampled every 10ms • 100 frame per second

More Related