- 79 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Hidden Markov Models' - joseph-brewer

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Hidden Markov Models

Richard Golden

(following approach of Chapter 9 of Manning and Schutze, 2000)

REVISION DATE:April 15 (Tuesday), 2003

HMM Notation

- State Sequence Variables: X1, …, XT+1
- Output Sequence Variables: O1, …, OT
- Set of Hidden States (S1, …, SN)
- Output Alphabet (K1, …, KM)
- Initial State Probabilities (1, .., N)i=p(X1=Si), i=1,…,N
- State Transition Probabilities (aij) i,j{1,…,N}aij =p(Xt+1|Xt), t=1,…,T
- Emission Probabilities (bij) i{1,…,N},j {1,…,M}bij=p(Xt+1=Si|Xt=Sj), t=1,…,T

S1

K3

K2

K1

HMM State-Emission Representation- Note that sometimes a Hidden Markov Model is represented by having the emission arrows come off the arcs
- In this situation you would have a lot more emission arrows because there’s a lot more arcs…
- But the transition and emission probabilities are the same…it just takes longer to draw on your powerpoint presentation (self-conscious presentation)

a11=0.7

b11=0.6

b12=0.1

b13=0.3

1=1

a12=0.3

b22=0.7

b23=0.2

2=0

a21=0.5

S2

b21=0.1

a22=0.5

Arc-Emission Representation

- Note that sometimes a Hidden Markov Model is represented by having the emission arrows come off the arcs
- In this situation you would have a lot more emission arrows because there’s a lot more arcs…
- But the transition and emission probabilities are the same…it just takes longer to draw on your powerpoint presentation (self-conscious presentation)

Fundamental Questions for HMMs

- MODEL FIT
- How can we compute likelihood of observations and hidden statesgiven known emission and transition probabilities?

Compute:p(“Dog”/NOUN,”is”/VERB,”Good”/ADJ | {aij},{bkm})

- How can we compute likelihood of observationsgiven known emission and transition probabilities? p(“Dog”,”is”,”Good” | {aij},{bkm})

Fundamental Questions for HMMs

- INFERENCE
- How can we infer the sequence of hidden statesgiven the observations and the known emission and transition probabilities?
- Maximize:
- p(“Dog”/?,”is”/?, “Good”/? | {aij},{bkm})with respect to the unknown labels

Fundamental Questions for HMMs

- LEARNING
- How can we estimate the emission and transition probabilitiesgiven observations and assuming that hidden states are observable during learning process?
- How can we estimate emission and transition probabilitiesgivenobservations only?

Direct Calculation of Model Fit(note use of “Markov” Assumptions) Part 1

Follows directly from the definition of a conditional probability: p(o,x)=p(o|x)p(x)

EXAMPLE:P(“Dog”/NOUN,”is”/VERB,”Good”/ADJ | {aij},{bij}) =

p(“Dog”,”is”,”Good”|NOUN,VERB,ADJ {aij},{bij}) X

p(NOUN,VERB,ADJ | aij},{bij})

Direct Calculation of Likelihood of Labeled Observations(note use of “Markov” Assumptions)Part 2

EXAMPLE:Compute p(“DOG”/NOUN,”is”/VERB,”good”/ADJ|{aij},{bkm})

b11=0.6

b12=0.1

b13=0.3

1=1

a12=0.3

b22=0.7

S0

S1

K1

K2

K3

b23=0.2

a21=0.5

2=0

S2

b21=0.1

a22=0.5

Graphical Algorithm Representation of Direct Calculation of Likelihood of Observations and Hidden States (not hard!)Note that

“good” is

The name

Of the dogj

So it is a

Noun!

The likelihood of a particular “labeled” sequence of observations

(e.g., p(“Dog”/NOUN,”is”/VERB,”Good”/NOUN|{aij},{bkm})) may be computed

Using the “direct calculation” method using following simple graphical algorithm.

Specifically, p(K3/S1, K2/S2, K1/S1 |{aij},{bkm}))= 1b13a12b22a21b11

Extension to case where the likelihood of the observations given parameters is needed(e.g., p( “Dog”, ”is”, ”good” | {aij},{bij})

KILLER EQUATION!!!!!

Efficiency of Calculations is Important (e.g., Model-Fit)

- Assume 1 multiplication per microsecond
- Assume N=1000 word vocabulary and T=7 word sentence.
- (2T+1)NT+1 multiplications by “direct calculation” yields (2(7)+1)(1000)(7+1) is about475,000 million years of computer time!!!
- 2N2T multiplications using “forward method”is about 14 seconds of computer time!!!

Forward, Backward, and Viterbi Calculations

- Forward calculation methods are thus very useful.
- Forward, Backward, and Viterbi Calculations will now be discussed.

S1

Forward Calculations – OverviewTIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

K1

K2

K3

b12=0.1

b11=0.6

b13=0.3

a11=0.7

S1

S1

1

a12=0.3

a21=0.5

2

S2

S2

S2

b23=0.2

a22=0.5

b21=0.1

b22=0.1

K1

K2

K3

K1

K2

K1

K2

K3

K3

S1

Forward Calculations – Time 2 (1 word example)TIME 2

NOTE: that 1 (2)+ 2 (2)

is the likelihood of the observation/word “K3”in this “1 word example”

K1

K2

K3

b13=0.3

a11=0.7

S1

1

a12=0.3

a21=0.5

2

S2

S2

a22=0.5

b23=0.2

K1

K2

K3

S1

Forward Calculations – Time 3 (2 word example)TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

b12=0.1

1(3)

b11=0.6

a11=0.7

S1

S1

1

a12=0.3

a21=0.5

2

S2

S2

S2

a22=0.5

b21=0.1

b22=0.1

K1

K2

K3

K1

K2

K3

S1

Forward Calculations – Time 4 (3 word example)TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

K1

K2

K3

b12=0.1

b13=0.3

b11=0.6

a11=0.7

S1

S1

S1

1

a12=0.3

a21=0.5

2

S2

S2

S2

S2

a22=0.5

b21=0.1

b23=0.2

b22=0.1

K1

K2

K3

K1

K2

K1

K2

K3

K3

S1

Backward Calculations – OverviewTIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

K1

K2

K3

b12=0.1

b11=0.6

b13=0.3

a11=0.7

S1

S1

1

a12=0.3

a21=0.5

2

S2

S2

S2

b23=0.2

a22=0.5

b21=0.1

b22=0.1

K1

K2

K3

K1

K2

K1

K2

K3

K3

Backward Calculations – Time 2

TIME 3

TIME 4

TIME 2

NOTE: that 1 (2)+ 2 (2)

is the likelihood the observation/word sequence “K2,K1”in this “2 word example”

K1

K2

K3

K1

K2

K3

b12=0.1

b13=0.3

a11=0.7

S1

S1

a12=0.3

a21=0.5

a22=0.5

S2

S2

b23=0.2

b22=0.7

K1

K2

K1

K2

K3

K3

S1

Backward Calculations – Time 1TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

K1

K2

K3

b12=0.1

b11=0.6

b13=0.3

a11=0.7

S1

S1

1

a12=0.3

a21=0.5

2

S2

S2

S2

b23=0.2

a22=0.5

b21=0.1

b22=0.1

K1

K2

K3

K1

K2

K1

K2

K3

K3

The Forward-Backward Method

- Note the forward method computes:
- Note the backward method computes (t>1):
- We can do the forward-backward methodwhich computes p(K1,…,KT) using formula (using any choice of t=1,…,T+1!):

Solution to Problem 1

- The “hard part” of the 1st Problem was to find the likelihood of the observations for an HMM
- We can now do this using either theforward, backward, or forward-backwardmethod.

Solution to Problem 2: Viterbi Algorithm(Computing “Most Probable” Labeling)

- Consider direct calculation of labeledobservations
- Previously we summedthese likelihoods together across all possible labelings to solve the first problemwhich was to compute the likelihood of the observationsgiven the parameters (Hard part of HMM Question 1!).
- We solved this problem using forward or backward method.
- Now we want to compute all possible labelings and theirrespective likelihoods and pick the labeling which isthe largest!

EXAMPLE:Compute p(“DOG”/NOUN,”is”/VERB,”good”/ADJ|{aij},{bkm})

Efficiency of Calculations is Important (e.g., Most Likely Labeling Problem)

- Just as in the forward-backward calculations wecan solve problem of computing likelihood of every possible one of the NT labelings efficiently
- Instead of millions of years of computing time we can solve the problem in several seconds!!

S1

Viterbi Algorithm – Overview (same setup as forward algorithm)TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

K1

K2

K3

b12=0.1

b11=0.6

b13=0.3

a11=0.7

S1

S1

1

a12=0.3

a21=0.5

2

S2

S2

S2

b23=0.2

a22=0.5

b21=0.1

b22=0.1

K1

K2

K3

K1

K2

K1

K2

K3

K3

S1

Forward Calculations – Time 2 (1 word example)TIME 2

K1

K2

K3

b13=0.3

a11=0.7

S1

1=1

a12=0.3

a21=0.5

2=0

S2

S2

a22=0.5

b23=0.2

K1

K2

K3

S1

Backtracking – Time 2 (1 word example)TIME 2

K1

K2

K3

b13=0.3

a11=0.7

S1

1=1

a12=0.3

a21=0.5

2=0

S2

S2

a22=0.5

b23=0.2

K1

K2

K3

S1

Forward Calculations – (2 word example)TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

b12=0.1

b13=0.3

S1

S1

a11=0.7

1

a21=0.5

a12=0.3

2

a22=0.5

S2

S2

S2

b23=0.2

b22=0.1

K1

K2

K3

K1

K2

K3

S1

BACKTRACKING – (2 word example)TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

b12=0.1

b13=0.3

S1

S1

a11=0.7

1

a21=0.5

a12=0.3

2

a22=0.5

S2

S2

S2

b23=0.2

b22=0.1

K1

K2

K3

K1

K2

K3

S1

Forward Calculations – Time 4 (3 word example)TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

K1

K2

K3

b12=0.1

b13=0.3

b11=0.6

a11=0.7

S1

S1

S1

1

a12=0.3

a21=0.5

2

S2

S2

S2

S2

a22=0.5

b21=0.1

b23=0.2

b22=0.1

K1

K2

K3

K1

K2

K1

K2

K3

K3

S1

Backtracking to Obtain Labeling for 3 word caseTIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

K1

K2

K3

b12=0.1

b13=0.3

b11=0.6

a11=0.7

S1

S1

S1

1

a12=0.3

a21=0.5

2

S2

S2

S2

S2

a22=0.5

b21=0.1

b23=0.2

b22=0.1

K1

K2

K3

K1

K2

K1

K2

K3

K3

Third Fundamental Question:Parameter Estimation

- Make Initial Guess for {aij} and {bkm}
- Compute probability one hidden state follows another given: {aij} and {bkm} and sequence of observations.(computed using forward-backward algorithm)
- Compute probability of observed state given a hidden state given: {aij} and {bkm} and sequence of observations.(computed using forward-backward algorithm)
- Use these computed probabilities tomake an improved guess for {aij} and {bkm}
- Repeat this process until convergence
- Can be shown that this algorithm does infact converge to correct choice for {aij} and {bkm}assuming that the initial guess was close enough..

Download Presentation

Connecting to Server..