1 / 25

Foundations of Statistical NLP Chapter 9. Markov Models

Foundations of Statistical NLP Chapter 9. Markov Models. 한 기 덕. Contents. Introduction Markov Models Hidden Markov Models Why use HMMs General form of an HMM The Three Fundamental Questions for HMMs Fundamental Questions For HMMs Implementation, Properties, and Variants. Introduction.

oki
Download Presentation

Foundations of Statistical NLP Chapter 9. Markov Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Foundations of Statistical NLPChapter 9. Markov Models 한 기 덕

  2. Contents • Introduction • Markov Models • Hidden Markov Models • Why use HMMs • General form of an HMM • The Three Fundamental Questions for HMMs • Fundamental Questions For HMMs • Implementation, Properties, and Variants

  3. Introduction • Markov Model • Markov processes/chains/models were first developed by Andrei A. Markov • First use linguistic purpose : modeling the letter sequences in Russian literature(1913) • Current use general statistical tool • VMM (Visible Markov Model) • Words in sentences is depend on their syntax. • HMM (Hidden Markov Model) • operate high level abstraction by postulating additional “hidden” structures.

  4. Markov Models • Markov assumption • Future elements of the sequence independent of past elements, given the present element. • Limited Horizon • Xt = sequence of random variables • Sk = state space • Time invariant (stationary)

  5. Markov Models(Cont’) • Notation • stochastic transition • probability of different initial state • Application : Linear sequence of events • modeling valid phone sequences in speech recognition • sequences of speech acts in dialog systems

  6. Markov Chain • circle : state and state name • arrows connecting states : possible transition • arc label : probability of each transition

  7. Visible Markov Model • We know what states the machine is passing through. • mth order Markov model • n  3, n-gram violate Limited Horizen condition • reformulate any n-gram model as a visible Markov model by simply encoding (n-1)-gram

  8. Hidden Markov Model • We don’t know the state sequence that the model passes through, only some probabilistic function of it • Example 1 : The crazy soft drink machine • two state : cola preferring(CP), iced tea preferring(IP) • VMM : machine always put out a cola in CP • HMM : emission probability • Output probability given From state

  9. Crazy soft drink machine • Problem • What is the probability of seeing the output sequence {lem, ice-t} if the machine always start off in the cola preferring state?

  10. Crazy soft drink machine(Cont’)

  11. Why use HMMs? • underlying events probabilistically generate surface events • the words in a text  parts of speech • Linear interpolation of n-gram • Hidden state • the choice of whether to use the unigram, bigram, or trigram probabilities. • Two Keys • This is conversion works by adding epsilon transitions. • Separate parameters iab don’t adjust them separately.

  12. A A A A S S S S S B B B K K K K K Notation

  13. General form of an HMM • Arc-emission HMM • the symbol emits at time t depends on both the state at time t and at time(t+1). • State-emission HMM : ex) crazy drink machine • the symbol emits at time t depends just on the state at time t Figure 9.4 A program for a Markov process.

  14. The Three Fundamental Questions for HMMS

  15. Finding the probability of an observation

  16. The forward procedure • Cheap algorithm required only 2N2T multiplication

  17. The backward procedure • The total probability of seeing the rest of the observation sequence. • Use of a combination of forward and backward probabilities is vital for solving the third problem of parameter reestimation. • Backward variables Combining forward & backward

  18. Finding the best state sequence • State sequence that explains the observations is more than one way. • Find Xt that maximizes P(Xt|O, ) • This may yield a quite unlikely state sequence. • Viterbi algorithm is more efficient.

  19. Viterbi algorithem • The most likely complete path • This is sufficient to maximize for a fixed O • Definition

  20. Variable calculations for O = (lem, ice_t, cola)

  21. Parameter estimation • Given a certain observation sequence • Find the values of the model parameter = (A, B, ) • Using Maximum Likelihood Estimation • Locally maximize by an iterative hill-climbing algorithm usually effective for HMM

  22. Parameter estimation (Cont’)

  23. Parameter estimation (Cont’)

  24. Implementation, Properties, Variants • Implementation • Obvious issue : keeping on multiplying very small numbers  Use Log function • Variants • It is not impossible to estimate many number parameter. • Multiple input observations • Initialization of parameter values • Try to approach near global maximum

More Related