1 / 33

Corpora and Statistical Methods Lecture 8

Corpora and Statistical Methods Lecture 8. Albert Gatt. Part 2. Markov and Hidden Markov Models: Conceptual Introduction. In this lecture. We focus on (Hidden) Markov Models conceptual intro to Markov Models relevance to NLP Hidden Markov Models algorithms. Acknowledgement.

mindy
Download Presentation

Corpora and Statistical Methods Lecture 8

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Corpora and Statistical MethodsLecture 8 Albert Gatt

  2. Part 2 Markov and Hidden Markov Models: Conceptual Introduction

  3. In this lecture • We focus on (Hidden) Markov Models • conceptual intro to Markov Models • relevance to NLP • Hidden Markov Models • algorithms

  4. Acknowledgement • Some of the examples in this lecture are taken from a tutorial on HMMs by Wolgang Maass

  5. Talking about the weather • Suppose we want to predict tomorrow’s weather. The possible predictions are: • sunny • foggy • rainy • We might decide to predict tomorrow’s outcome based on earlier weather • if it’s been sunny all week, it’s likelier to be sunny tomorrow than if it had been rainy all week • how far back do we want to go to predict tomorrow’s weather?

  6. Statistical weather model • Notation: • S: the state space, a set of possible values for the weather: {sunny, foggy, rainy} • (each state is identifiable by an integer i) • X: a sequence of random variables, each taking a value from S • these model weather over a sequence of days • t is an integer standing for time • (X1, X2, X3, ... XT) models the value of a series of random variables • each takes a value from S with a certain probability P(X=si) • the entire sequence tells us the weather over T days

  7. Statistical weather model • If we want to predict the weather for day t+1, our model might look like this: • E.g. P(weather tomorrow = sunny), conditional on the weather in the past t days. • Problem: the larger t gets, the more calculations we have to make.

  8. Markov Properties I: Limited horizon • The probability that we’re in state si at time t+1 only depends on where we were at time t: • Given this assumption, the probability of any sequence is just:

  9. Markov Properties II: Time invariance • The probability of being in state si given the previous state does not change over time:

  10. Concrete instantiation This is essentially a transition matrix, which gives us probabilities of going from one state to the other. We can denote state transition probabilities as aij(prob. of going from state i to state j)

  11. Graphical view • Components of the model: • states (s) • transitions • transition probabilities • initial probability distribution for states Essentially, a non-deterministic finite state automaton.

  12. Example continued • If the weather today (Xt) is sunny, what’s the probability that tomorrow (Xt+1) is sunny and the day after (Xt+2) is rainy? Markov assumption

  13. Formal definition • A Markov Model is a triple (S, , A) where: • S is the set of states •  are the probabilities of being initially in some state • A are the transition probabilities

  14. Hidden Markov Models

  15. A slight variation on the example • You’re locked in a room with no windows • You can’t observe the weather directly • You only observe whether the guy who brings you food is carrying an umbrella or not • Need a model telling you the probability of seeing the umbrella, given the weather • distinction between observations and their underlying emitting state. • Define: • Otas an observation at time t • K = {+umbrella, -umbrella} as the possible outputs • We’re interested in P(Ot=k|Xt=si) • i.e. p. of a given observation at t given that the underlying weather state at t is si

  16. Symbol emission probabilities This is the hidden model, telling us the probability that Ot = k given that Xt = si We assume that each underlying state Xt = si emits an observation with a given probability.

  17. Using the hidden model • Model gives:P(Ot=k|Xt=si) • Then, by Bayes’ Rule we can compute: P(Xt=si|Ot=k) • Generalises easily to an entire sequence

  18. Circles indicate states Arrows indicate probabilistic dependencies between states HMM in graphics

  19. HMM in graphics • Green nodes are hidden states • Each hidden state depends only on the previous state (Markov assumption)

  20. Why HMMs? • HMMs are a way of thinking of underlying events probabilistically generating surface events. • Example: Parts of speech • a POS is a class or set of words • we can think of language as an underlying Markov Chain of parts of speech from which actual words are generated (“emitted”) • So what are our hidden states here, and what are the observations?

  21. ADJ N V HMMs in POS Tagging DET • Hidden layer (constructed through training) • Models the sequence of POSs in the training corpus

  22. ADJ N V tall lady is HMMs in POS Tagging DET the • Observations are words. • They are “emitted” by their corresponding hidden state. • The state depends on its previous state.

  23. Why HMMs • There are efficient algorithms to train HMMs using Expectation Maximisation • General idea: • training data is assumed to have been generated by some HMM (parameters unknown) • try and learn the unknown parameters in the data • Similar idea is used in finding the parameters of some n-gram models, especially those that use interpolation.

  24. Formalisation of a Hidden Markov model

  25. Crucial ingredients (familiar) • Underlying states: S = {s1,…,sN} • Output alphabet (observations): K = {k1,…,kM} • State transition probabilities: A = {aij}, i,jЄ S • State sequence: X = (X1,…,XT+1) + a function mapping each Xt to a state s • Output sequence: O = (O1,…,OT) • where each otЄ K

  26. Crucial ingredients (additional) • Initial state probabilities: Π = {πi}, i Є S (tell us the initial probability of each state) • Symbol emission probabilities: B = {bijk}, i,jЄ S, k Є K (tell us the probability b of seeing observation Ot=k, given that Xt=si and Xt+1 = sj)

  27. Trellis diagram of an HMM a1,1 s1 a1,2 s2 a1,3 s3

  28. Trellis diagram of an HMM a1,1 s1 a1,2 s2 a1,3 s3 o1 Obs. seq: o3 o2 t1 t3 time: t2

  29. Trellis diagram of an HMM b1,1,k b1,1,k a1,1 s1 a1,2 b1,2,k s2 a1,3 b1,3,k s3 o1 Obs. seq: o3 o2 t1 t3 time: t2

  30. The fundamental questions for HMMs • Given a model μ = (A, B, Π), how do we compute the likelihood of an observation P(O| μ)? • Given an observation sequence O, and model μ, which is the state sequence (X1,…,Xt+1) that best explains the observations? • This is the decoding problem • Given an observation sequence O, and a space of possible models μ = (A, B, Π), which model best explains the observed data?

  31. Application of question 1 (ASR) • Given a model μ = (A, B, Π), how do we compute the likelihood of an observation P(O| μ)? • Input of an ASR system: a continuous stream of sound waves, which is ambiguous • Need to decode it into a sequence of phones. • is the input the sequence [n iy d] or [n iy]? • which sequence is the most probable?

  32. Application of question 2 (POS Tagging) • Given an observation sequence O, and model μ, which is the state sequence (X1,…,Xt+1) that best explains the observations? • this is the decoding problem • Consider a POS Tagger • Input observation sequence: • I can read • need to find the most likely sequence of underlying POS tags: • e.g. is can a modal verb, or the noun? • how likely is it that can is a noun, given that the previous word is a pronoun?

  33. Summary • HMMs are a way of representing: • sequences of observations arising from • sequences of states • states are the variables of interest, giving rise to the observations • Next up: • algorithms for answering the fundamental questions about HMMs

More Related