250 likes | 410 Views
Foundations of Statistical NLP Chapter 9. Markov Models. 한 기 덕. Contents. Introduction Markov Models Hidden Markov Models Why use HMMs General form of an HMM The Three Fundamental Questions for HMMs Fundamental Questions For HMMs Implementation, Properties, and Variants. Introduction.
E N D
Foundations of Statistical NLPChapter 9. Markov Models 한 기 덕
Contents • Introduction • Markov Models • Hidden Markov Models • Why use HMMs • General form of an HMM • The Three Fundamental Questions for HMMs • Fundamental Questions For HMMs • Implementation, Properties, and Variants
Introduction • Markov Model • Markov processes/chains/models were first developed by Andrei A. Markov • First use linguistic purpose : modeling the letter sequences in Russian literature(1913) • Current use general statistical tool • VMM (Visible Markov Model) • Words in sentences is depend on their syntax. • HMM (Hidden Markov Model) • operate high level abstraction by postulating additional “hidden” structures.
Markov Models • Markov assumption • Future elements of the sequence independent of past elements, given the present element. • Limited Horizon • Xt = sequence of random variables • Sk = state space • Time invariant (stationary)
Markov Models(Cont’) • Notation • stochastic transition • probability of different initial state • Application : Linear sequence of events • modeling valid phone sequences in speech recognition • sequences of speech acts in dialog systems
Markov Chain • circle : state and state name • arrows connecting states : possible transition • arc label : probability of each transition
Visible Markov Model • We know what states the machine is passing through. • mth order Markov model • n 3, n-gram violate Limited Horizen condition • reformulate any n-gram model as a visible Markov model by simply encoding (n-1)-gram
Hidden Markov Model • We don’t know the state sequence that the model passes through, only some probabilistic function of it • Example 1 : The crazy soft drink machine • two state : cola preferring(CP), iced tea preferring(IP) • VMM : machine always put out a cola in CP • HMM : emission probability • Output probability given From state
Crazy soft drink machine • Problem • What is the probability of seeing the output sequence {lem, ice-t} if the machine always start off in the cola preferring state?
Why use HMMs? • underlying events probabilistically generate surface events • the words in a text parts of speech • Linear interpolation of n-gram • Hidden state • the choice of whether to use the unigram, bigram, or trigram probabilities. • Two Keys • This is conversion works by adding epsilon transitions. • Separate parameters iab don’t adjust them separately.
A A A A S S S S S B B B K K K K K Notation
General form of an HMM • Arc-emission HMM • the symbol emits at time t depends on both the state at time t and at time(t+1). • State-emission HMM : ex) crazy drink machine • the symbol emits at time t depends just on the state at time t Figure 9.4 A program for a Markov process.
The forward procedure • Cheap algorithm required only 2N2T multiplication
The backward procedure • The total probability of seeing the rest of the observation sequence. • Use of a combination of forward and backward probabilities is vital for solving the third problem of parameter reestimation. • Backward variables Combining forward & backward
Finding the best state sequence • State sequence that explains the observations is more than one way. • Find Xt that maximizes P(Xt|O, ) • This may yield a quite unlikely state sequence. • Viterbi algorithm is more efficient.
Viterbi algorithem • The most likely complete path • This is sufficient to maximize for a fixed O • Definition
Parameter estimation • Given a certain observation sequence • Find the values of the model parameter = (A, B, ) • Using Maximum Likelihood Estimation • Locally maximize by an iterative hill-climbing algorithm usually effective for HMM
Implementation, Properties, Variants • Implementation • Obvious issue : keeping on multiplying very small numbers Use Log function • Variants • It is not impossible to estimate many number parameter. • Multiple input observations • Initialization of parameter values • Try to approach near global maximum