Foundations of Statistical NLP Chapter 9. Markov Models

Foundations of Statistical NLPChapter 9. Markov Models 한 기 덕

Contents • Introduction • Markov Models • Hidden Markov Models • Why use HMMs • General form of an HMM • The Three Fundamental Questions for HMMs • Fundamental Questions For HMMs • Implementation, Properties, and Variants

Introduction • Markov Model • Markov processes/chains/models were first developed by Andrei A. Markov • First use linguistic purpose : modeling the letter sequences in Russian literature(1913) • Current use general statistical tool • VMM (Visible Markov Model) • Words in sentences is depend on their syntax. • HMM (Hidden Markov Model) • operate high level abstraction by postulating additional “hidden” structures.

Markov Models • Markov assumption • Future elements of the sequence independent of past elements, given the present element. • Limited Horizon • Xt = sequence of random variables • Sk = state space • Time invariant (stationary)

Markov Models(Cont’) • Notation • stochastic transition • probability of different initial state • Application : Linear sequence of events • modeling valid phone sequences in speech recognition • sequences of speech acts in dialog systems

Markov Chain • circle : state and state name • arrows connecting states : possible transition • arc label : probability of each transition

Visible Markov Model • We know what states the machine is passing through. • mth order Markov model • n  3, n-gram violate Limited Horizen condition • reformulate any n-gram model as a visible Markov model by simply encoding (n-1)-gram

Hidden Markov Model • We don’t know the state sequence that the model passes through, only some probabilistic function of it • Example 1 : The crazy soft drink machine • two state : cola preferring(CP), iced tea preferring(IP) • VMM : machine always put out a cola in CP • HMM : emission probability • Output probability given From state

Crazy soft drink machine • Problem • What is the probability of seeing the output sequence {lem, ice-t} if the machine always start off in the cola preferring state?

Crazy soft drink machine(Cont’)

Why use HMMs? • underlying events probabilistically generate surface events • the words in a text  parts of speech • Linear interpolation of n-gram • Hidden state • the choice of whether to use the unigram, bigram, or trigram probabilities. • Two Keys • This is conversion works by adding epsilon transitions. • Separate parameters iab don’t adjust them separately.

A A A A S S S S S B B B K K K K K Notation

General form of an HMM • Arc-emission HMM • the symbol emits at time t depends on both the state at time t and at time(t+1). • State-emission HMM : ex) crazy drink machine • the symbol emits at time t depends just on the state at time t Figure 9.4 A program for a Markov process.

The Three Fundamental Questions for HMMS

Finding the probability of an observation

The forward procedure • Cheap algorithm required only 2N2T multiplication

The backward procedure • The total probability of seeing the rest of the observation sequence. • Use of a combination of forward and backward probabilities is vital for solving the third problem of parameter reestimation. • Backward variables Combining forward & backward

Finding the best state sequence • State sequence that explains the observations is more than one way. • Find Xt that maximizes P(Xt|O, ) • This may yield a quite unlikely state sequence. • Viterbi algorithm is more efficient.

Viterbi algorithem • The most likely complete path • This is sufficient to maximize for a fixed O • Definition

Variable calculations for O = (lem, ice_t, cola)

Parameter estimation • Given a certain observation sequence • Find the values of the model parameter = (A, B, ) • Using Maximum Likelihood Estimation • Locally maximize by an iterative hill-climbing algorithm usually effective for HMM

Parameter estimation (Cont’)

Implementation, Properties, Variants • Implementation • Obvious issue : keeping on multiplying very small numbers  Use Log function • Variants • It is not impossible to estimate many number parameter. • Multiple input observations • Initialization of parameter values • Try to approach near global maximum

Foundations of Statistical NLP Chapter 9. Markov Models

Foundations of Statistical NLP Chapter 9. Markov Models

Presentation Transcript

Chapter 9 Foundations of Economic Globalization

Hidden Markov Models

Ch 9. Markov Models

Markov Models

Markov Models

Markov Models

Markov Models

Markov Models

Hidden Markov Models Chapter 11

Hidden Markov models

Seminar: Statistical NLP

Overview of Statistical NLP

8. Markov Models

Statistical NLP: Hidden Markov Models

Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze)

Lecture 9 Hidden Markov Models

COMP790: Statistical NLP

Hidden Markov Models

Markov Models

Hidden Markov Models in NLP

Chapter 9: Foundations of Group Behavior

Overview of Statistical NLP