1 / 70

HMMs

HMMs. Tamara Berg CS 590-133 Artificial Intelligence. Many slides throughout the course adapted from Svetlana Lazebnik , Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer , Rob Pless , Killian Weinberger, Deva Ramanan. 2048 Game. Game.

shino
Download Presentation

HMMs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HMMs Tamara Berg CS 590-133 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer, Rob Pless, Killian Weinberger, Deva Ramanan

  2. 2048 Game Game AI (Minimax with alpha/beta pruning)

  3. Announcements • HW2 graded • mean = 14.40, median = 16 • Grades to date after 2 HWs, 1 midterm (60% / 40%) • mean = 77.42, median = 81.36

  4. Announcements • HW4 will be online tonight (due April 3) • Midterm2 next Thursday • Review will be Monday, 5pm in FB009

  5. Thursday March 20SN0115:30-6:30 Information Session: BS/MS Program

  6. Recap from b.s.b.

  7. Bayes Nets Directed Graph, G = (X,E) Nodes Edges • Each node is associated with a random variable

  8. Example Joint Probability:

  9. Independence in a BN • Important question about a BN: • Are two nodes independent given certain evidence? • If yes, can prove using algebra (tedious in general) • If no, can prove with a counter example • Example: • Question: are X and Z necessarily independent? • Answer: no. Example: low pressure causes rain, which causes traffic. • X can influence Z, Z can influence X (via Y) • Addendum: they could be independent: how? X Y Z

  10. Bayes Ball • Shade all observed nodes. Place balls at the starting node, let them bounce around according to some rules, and ask if any of the balls reach any of the goal node. • We need to know what happens when a ball arrives at a node on its way to the goal node.

  11. Inference • Inference: calculating some useful quantity from a joint probability distribution • Examples: • Posterior probability: • Most likely explanation: B E A J M

  12. Inference Algorithms • Exact algorithms • Inference by Enumeration • Variable Elimination algorithm • Approximate (sampling) algorithms • Forward sampling • Rejection sampling • Likelihood weighting sampling

  13. Example BN – Naïve Bayes C – Class F - Features We only specify (parameters): prior over class labels how each feature depends on the class

  14. HW4!

  15. Slide from Dan Klein

  16. Slide from Dan Klein

  17. Slide from Dan Klein

  18. Percentage of documents in training set labeled as spam/ham Slide from Dan Klein

  19. In the documents labeled as spam, occurrence percentage of each word (e.g. # times “the” occurred/# total words). Slide from Dan Klein

  20. In the documents labeled as ham, occurrence percentage of each word (e.g. # times “the” occurred/# total words). Slide from Dan Klein

  21. Parameter estimation • Parameter estimate: • Parameter smoothing: dealing with words that were never seen or seen too few times • Laplacian smoothing: pretend you have seen every vocabulary word one more time (or m more times) than you actually did # of word occurrences in spam messages P(word | spam) = total # of words in spam messages # of word occurrences in spam messages + 1 P(word | spam) = total # of words in spam messages + V (V: total number of unique words)

  22. Classification The class that maximizes:

  23. Summary of model and parameters • Naïve Bayes model: • Model parameters: Likelihood of spam Likelihood of ¬spam prior P(w1 | spam) P(w2 | spam) … P(wn | spam) P(w1 | ¬spam) P(w2 | ¬spam) … P(wn | ¬spam) P(spam) P(¬spam)

  24. Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow

  25. Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities.

  26. Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities. • Since log is a monotonic function, the class with the highest score does not change.

  27. Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities. • Since log is a monotonic function, the class with the highest score does not change. • So, what we usually compute in practice is:

  28. Today: Adding time!

  29. Reasoning over Time • Often, we want to reason about a sequence of observations • Speech recognition • Robot localization • Medical monitoring • DNA sequence recognition • Need to introduce time into our models • Basic approach: hidden Markov models (HMMs) • More general: dynamic Bayes’ nets

  30. Reasoning over time • Basic idea: • Agent maintains a belief state representing probabilities over possible states of the world • From the belief state and a transition model, the agent can predict how the world might evolve in the next time step • From observations and an emission model, the agent can update the belief state

  31. Markov Model

  32. Markov Models • A Markov model is a chain-structured BN • Value of X at a given time is called the state • As a BN: ….P(Xt|Xt-1)….. • Parameters: initial probabilities and transition probabilities or dynamics(specify how the state evolves over time) X1 X2 X3 X4

  33. Conditional Independence • Basic conditional independence: • Past and future independent of the present • Each time step only depends on the previous • This is called the (first order) Markov property • Note that the chain is just a (growing) BN • We can always use generic BN reasoning on it if we truncate the chain at a fixed length X1 X2 X3 X4

  34. Example: Markov Chain X1 X2 X3 X4 • Weather: • States: X = {rain, sun} • Transitions: • Initial distribution: 1.0 sun • What’s the probability distribution after one step? 0.1 0.9 rain sun 0.9 0.1

  35. Mini-Forward Algorithm • Question: What’s P(X) on some day t? sun sun sun sun rain rain rain rain Forward simulation

  36. Example • From initial observation of sun • From initial observation of rain P(X1) P(X2) P(X3) P(X) P(X1) P(X2) P(X3) P(X)

  37. Stationary Distributions • If we simulate the chain long enough: • What happens? • Uncertainty accumulates • Eventually, we have no idea what the state is! • Stationary distributions: • For most chains, the distribution we end up in is independent of the initial distribution • Called the stationary distribution of the chain • Usually, can only predict a short time out

  38. Hidden Markov Model

  39. Hidden Markov Models • Markov chains not so useful for most agents • Eventually you don’t know anything anymore • Need observations to update your beliefs • Hidden Markov models (HMMs) • Underlying Markov chain over states S • You observe outputs (effects) at each time step • As a Bayes’ net: X1 X2 X3 X4 X5 E1 E2 E3 E4 E5

  40. Example • An HMM is defined by: • Initial distribution: • Transitions: • Emissions:

  41. Conditional Independence • HMMs have two important independence properties: • Markov hidden process, future depends on past via the present • Current observation independent of all else given current state • Quiz: does this mean that observations are independent given no evidence? • [No, correlated by the hidden state] X1 X2 X3 X4 X5 E1 E2 E3 E4 E5

  42. Real HMM Examples • Speech recognition HMMs: • Observations are acoustic signals (continuous valued) • States are specific positions in specific words (so, tens of thousands) • Machine translation HMMs: • Observations are words (tens of thousands) • States are translation options • Robot tracking: • Observations are range readings (continuous) • States are positions on a map (continuous)

  43. Filtering / Monitoring • Filtering, or monitoring, is the task of tracking the distribution B(X) (the belief state) over time • We start with B(X) in an initial setting, usually uniform • As time passes, or we get observations, we update B(X) • The Kalman filter was invented in the 60’s and first implemented as a method of trajectory estimation for the Apollo program

  44. Example: Robot Localization Example from Michael Pfeiffer t=0 Sensor model: never more than 1 mistake Motion model: may not execute action with small prob. Prob 0 1

  45. Example: Robot Localization t=1 Prob 0 1

  46. Example: Robot Localization t=2 Prob 0 1

  47. Example: Robot Localization t=3 Prob 0 1

  48. Example: Robot Localization t=4 Prob 0 1

  49. Example: Robot Localization t=5 Prob 0 1

More Related