1 / 125

Fast Inference and Learning in Large-State-Space HMMs

Fast Inference and Learning in Large-State-Space HMMs. Sajid M. Siddiqi Andrew W. Moore The Auton Lab Carnegie Mellon University. HMM Overview Reducing quadratic complexity in the number of states The model Algorithms for fast evaluation and inference Algorithms for fast learning Results

armani
Download Presentation

Fast Inference and Learning in Large-State-Space HMMs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Inference and Learning in Large-State-Space HMMs Sajid M. Siddiqi Andrew W. Moore The Auton Lab Carnegie Mellon University Siddiqi and Moore, www.autonlab.org

  2. HMM Overview • Reducing quadratic complexity in the number of states • The model • Algorithms for fast evaluation and inference • Algorithms for fast learning • Results • Speed • Accuracy • Conclusion

  3. HMM Overview • Reducing quadratic complexity in the number of states • The model • Algorithms for fast evaluation and inference • Algorithms for fast learning • Results • Speed • Accuracy • Conclusion

  4. q0 q1 q2 q3 q4 Hidden Markov Models O0 O1 O2 1/3 O3 O4

  5. q0 q1 q2 q3 q4 Transition Model 1/3

  6. q0 q1 q2 q3 q4 Notation: Transition Model 1/3 Each of these probability tables is identical

  7. q0 q1 q2 q3 q4 Observation Model O0 O1 O2 O3 O4

  8. q0 q1 q2 q3 q4 Observation Model Notation: O0 O1 O2 O3 O4

  9. Some Famous HMM Tasks Question 1: State Estimation What is P(qT=Si| O1O2…OT)

  10. Some Famous HMM Tasks Question 1: State Estimation What is P(qT=Si| O1O2…OT)

  11. Some Famous HMM Tasks Question 1: State Estimation What is P(qT=Si| O1O2…OT)

  12. Some Famous HMM Tasks Question 1: State Estimation What is P(qT=Si| O1O2…OT) Question 2: Most Probable Path Given O1O2…OT , what is the most probable path that I took?

  13. Some Famous HMM Tasks Question 1: State Estimation What is P(qT=Si| O1O2…OT) Question 2: Most Probable Path Given O1O2…OT , what is the most probable path that I took?

  14. Some Famous HMM Tasks Question 1: State Estimation What is P(qT=Si| O1O2…OT) Question 2: Most Probable Path Given O1O2…OT , what is the most probable path that I took? Woke up at 8.35, Got on Bus at 9.46, Sat in lecture 10.05-11.22…

  15. Some Famous HMM Tasks Question 1: State Estimation What is P(qT=Si| O1O2…OT) Question 2: Most Probable Path Given O1O2…OT , what is the most probable path that I took? Question 3: Learning HMMs: Given O1O2…OT , what is the maximum likelihood HMM that could have produced this string of observations?

  16. Some Famous HMM Tasks Question 1: State Estimation What is P(qT=Si| O1O2…OT) Question 2: Most Probable Path Given O1O2…OT , what is the most probable path that I took? Question 3: Learning HMMs: Given O1O2…OT , what is the maximum likelihood HMM that could have produced this string of observations?

  17. Ot Some Famous HMM Tasks aBB bB(Ot) Bus Question 1: State Estimation What is P(qT=Si| O1O2…OT) Question 2: Most Probable Path Given O1O2…OT , what is the most probable path that I took? Question 3: Learning HMMs: Given O1O2…OT , what is the maximum likelihood HMM that could have produced this string of observations? aAB aCB Ot-1 Ot+1 aBA aBC bA(Ot-1) bC(Ot+1) Eat walk aAA aCC

  18. Basic Operations in HMMs For an observation sequence O = O1…OT, the three basic HMM operations are: T = # timesteps, i.e. datapoints N = # states

  19. Basic Operations in HMMs For an observation sequence O = O1…OT, the three basic HMM operations are: This talk: A simple approach to reducing the complexity in N T = # timesteps, i.e. datapoints N = # states

  20. HMM Overview • Reducing quadratic complexity • The model • Algorithms for fast evaluation and inference • Algorithms for fast learning • Results • Speed • Accuracy • Conclusion

  21. Reducing Quadratic Complexity in N Why does it matter? • Quadratic HMM algorithms hinder HMM computations when N is large • Several promising applications for efficient large-state-space HMM algorithms in • topic modeling • speech recognition • real-time HMM systems such as for activity monitoring • … and more

  22. Idea One: Sparse Transition Matrix • Only K << N non-zero next-state probabilities

  23. Idea One: Sparse Transition Matrix • Only K << N non-zero next-state probabilities

  24. Idea One: Sparse Transition Matrix Only O(TNK)! • Only K << N non-zero next-state probabilities

  25. Idea One: Sparse Transition Matrix Only O(TNK)! • Only K << N non-zero next-state probabilities • But can get very badly confused by “impossible transitions” • Cannot learn the sparse structure (once chosen cannot change)

  26. Dense-Mostly-Constant (DMC) Transitions • K non-constant probabilities per row • DMC HMMs comprise a richer and more expressive class of models than sparse HMMs a DMC transition matrix with K=2

  27. Dense-Mostly-Constant (DMC) Transitions • The transition model for state i now consists of: • K = the number of non-constant values per row • NCi= { j : sisj is a non-constant transition probability } • ci = the transition probability for sito all states not in NCi • aij= the non-constant transition probability for si sj, K = 2 NC3 = {2,5} c3 = 0.05 a32 = 0.25 a35 = 0.6

  28. HMM Overview • Reducing quadratic complexity in the number of states • The model • Algorithms for fast evaluation and inference • Algorithms for fast learning • Results • Speed • Accuracy • Conclusion

  29. Evaluation in Regular HMMs P(qt = si | O1, O2 … Ot)

  30. Evaluation in Regular HMMs P(qt = si | O1, O2 … Ot) = Where

  31. Evaluation in Regular HMMs P(qt = si | O1, O2 … Ot) = Where Then,

  32. Evaluation in Regular HMMs P(qt = si | O1, O2 … Ot) = Where Then, Called the “forward variables”

  33. Cost O(TN2)

  34. Similarly, and Also costs O(TN2)

  35. Called the “backward variables” Similarly, and Also costs O(TN2)

  36. Fast Evaluation in DMC HMMs

  37. Fast Evaluation in DMC HMMs O(N), but only computed once per row of the table! O(K) for each t(j)entry • This yields O(TNK) complexity for the evaluation problem

  38. Fast Inference in DMC HMMs

  39. Fast Inference in DMC HMMs O(N2) recursion in regular model:

  40. Fast Inference in DMC HMMs O(N2) recursion in regular model: O(NK) recursion in DMC model: O(N), but only computed once per row of the table O(K) for each t(j)entry

  41. HMM Overview • Reducing quadratic complexity in the number of states • The model • Algorithms for fast evaluation and inference • Algorithms for fast learning • Results • Speed • Accuracy • Conclusion

  42. Learning a DMC HMM

  43. Learning a DMC HMM • Idea One: • Ask user to tell us the DMC structure • Learn the parameters using EM

  44. Learning a DMC HMM • Idea One: • Ask user to tell us the DMC structure • Learn the parameters using EM • Simple! • But in general, don’t know the DMC structure

  45. Learning a DMC HMM • Idea Two: Use EM to learn the DMC structure also • Guess DMC structure • Find expected transition counts and observation parameters, given current model and observations • Find maximum likelihood DMC model given counts • Goto 2

  46. Learning a DMC HMM • Idea Two: Use EM to learn the DMC structure also • Guess DMC structure • Find expected transition counts and observation parameters, given current model and observations • Find maximum likelihood DMC model given counts • Goto 2 DMC structure can (and does) change!

  47. Learning a DMC HMM • Idea Two: Use EM to learn the DMC structure also • Guess DMC structure • Find expected transition counts and observation parameters, given current model and observations • Find maximum likelihood DMC model given counts • Goto 2 In fact, just start with an all-constant transition model DMC structure can (and does) change!

More Related