hidden markov models n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Hidden Markov Models PowerPoint Presentation
Download Presentation
Hidden Markov Models

Loading in 2 Seconds...

play fullscreen
1 / 41

Hidden Markov Models - PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on

Hidden Markov Models. Richard Golden (following approach of Chapter 9 of Manning and Schutze, 2000) REVISION DATE: April 15 (Tuesday), 2003. a 11 =0.7.  1. S 0. S 1. a 12 =0.3. a 21 =0.5.  2. S 2. a 22 =0.5. VMM (Visible Markov Model). HMM Notation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Hidden Markov Models' - joseph-brewer


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
hidden markov models

Hidden Markov Models

Richard Golden

(following approach of Chapter 9 of Manning and Schutze, 2000)

REVISION DATE:April 15 (Tuesday), 2003

vmm visible markov model

a11=0.7

1

S0

S1

a12=0.3

a21=0.5

2

S2

a22=0.5

VMM (Visible Markov Model)
hmm notation
HMM Notation
  • State Sequence Variables: X1, …, XT+1
  • Output Sequence Variables: O1, …, OT
  • Set of Hidden States (S1, …, SN)
  • Output Alphabet (K1, …, KM)
  • Initial State Probabilities (1, .., N)i=p(X1=Si), i=1,…,N
  • State Transition Probabilities (aij) i,j{1,…,N}aij =p(Xt+1|Xt), t=1,…,T
  • Emission Probabilities (bij) i{1,…,N},j {1,…,M}bij=p(Xt+1=Si|Xt=Sj), t=1,…,T
hmm state emission representation

S0

S1

K3

K2

K1

HMM State-Emission Representation
  • Note that sometimes a Hidden Markov Model is represented by having the emission arrows come off the arcs
  • In this situation you would have a lot more emission arrows because there’s a lot more arcs…
  • But the transition and emission probabilities are the same…it just takes longer to draw on your powerpoint presentation (self-conscious presentation)

a11=0.7

b11=0.6

b12=0.1

b13=0.3

1=1

a12=0.3

b22=0.7

b23=0.2

2=0

a21=0.5

S2

b21=0.1

a22=0.5

arc emission representation
Arc-Emission Representation
  • Note that sometimes a Hidden Markov Model is represented by having the emission arrows come off the arcs
  • In this situation you would have a lot more emission arrows because there’s a lot more arcs…
  • But the transition and emission probabilities are the same…it just takes longer to draw on your powerpoint presentation (self-conscious presentation)
fundamental questions for hmms
Fundamental Questions for HMMs
  • MODEL FIT
    • How can we compute likelihood of observations and hidden statesgiven known emission and transition probabilities?

Compute:p(“Dog”/NOUN,”is”/VERB,”Good”/ADJ | {aij},{bkm})

    • How can we compute likelihood of observationsgiven known emission and transition probabilities? p(“Dog”,”is”,”Good” | {aij},{bkm})
fundamental questions for hmms1
Fundamental Questions for HMMs
  • INFERENCE
  • How can we infer the sequence of hidden statesgiven the observations and the known emission and transition probabilities?
  • Maximize:
  • p(“Dog”/?,”is”/?, “Good”/? | {aij},{bkm})with respect to the unknown labels
fundamental questions for hmms2
Fundamental Questions for HMMs
  • LEARNING
    • How can we estimate the emission and transition probabilitiesgiven observations and assuming that hidden states are observable during learning process?
    • How can we estimate emission and transition probabilitiesgivenobservations only?
direct calculation of model fit note use of markov assumptions part 1
Direct Calculation of Model Fit(note use of “Markov” Assumptions) Part 1

Follows directly from the definition of a conditional probability: p(o,x)=p(o|x)p(x)

EXAMPLE:P(“Dog”/NOUN,”is”/VERB,”Good”/ADJ | {aij},{bij}) =

p(“Dog”,”is”,”Good”|NOUN,VERB,ADJ {aij},{bij}) X

p(NOUN,VERB,ADJ | aij},{bij})

direct calculation of likelihood of labeled observations note use of markov assumptions part 2
Direct Calculation of Likelihood of Labeled Observations(note use of “Markov” Assumptions)Part 2

EXAMPLE:Compute p(“DOG”/NOUN,”is”/VERB,”good”/ADJ|{aij},{bkm})

slide11

a11=0.7

b11=0.6

b12=0.1

b13=0.3

1=1

a12=0.3

b22=0.7

S0

S1

K1

K2

K3

b23=0.2

a21=0.5

2=0

S2

b21=0.1

a22=0.5

Graphical Algorithm Representation of Direct Calculation of Likelihood of Observations and Hidden States (not hard!)

Note that

“good” is

The name

Of the dogj

So it is a

Noun!

The likelihood of a particular “labeled” sequence of observations

(e.g., p(“Dog”/NOUN,”is”/VERB,”Good”/NOUN|{aij},{bkm})) may be computed

Using the “direct calculation” method using following simple graphical algorithm.

Specifically, p(K3/S1, K2/S2, K1/S1 |{aij},{bkm}))= 1b13a12b22a21b11

slide12
Extension to case where the likelihood of the observations given parameters is needed(e.g., p( “Dog”, ”is”, ”good” | {aij},{bij})

KILLER EQUATION!!!!!

efficiency of calculations is important e g model fit
Efficiency of Calculations is Important (e.g., Model-Fit)
  • Assume 1 multiplication per microsecond
  • Assume N=1000 word vocabulary and T=7 word sentence.
  • (2T+1)NT+1 multiplications by “direct calculation” yields (2(7)+1)(1000)(7+1) is about475,000 million years of computer time!!!
  • 2N2T multiplications using “forward method”is about 14 seconds of computer time!!!
forward backward and viterbi calculations
Forward, Backward, and Viterbi Calculations
  • Forward calculation methods are thus very useful.
  • Forward, Backward, and Viterbi Calculations will now be discussed.
forward calculations overview

S0

S1

Forward Calculations – Overview

TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

K1

K2

K3

b12=0.1

b11=0.6

b13=0.3

a11=0.7

S1

S1

1

a12=0.3

a21=0.5

2

S2

S2

S2

b23=0.2

a22=0.5

b21=0.1

b22=0.1

K1

K2

K3

K1

K2

K1

K2

K3

K3

forward calculations time 2 1 word example

S0

S1

Forward Calculations – Time 2 (1 word example)

TIME 2

NOTE: that 1 (2)+ 2 (2)

is the likelihood of the observation/word “K3”in this “1 word example”

K1

K2

K3

b13=0.3

a11=0.7

S1

1

a12=0.3

a21=0.5

2

S2

S2

a22=0.5

b23=0.2

K1

K2

K3

forward calculations time 3 2 word example

S0

S1

Forward Calculations – Time 3 (2 word example)

TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

b12=0.1

1(3)

b11=0.6

a11=0.7

S1

S1

1

a12=0.3

a21=0.5

2

S2

S2

S2

a22=0.5

b21=0.1

b22=0.1

K1

K2

K3

K1

K2

K3

forward calculations time 4 3 word example

S0

S1

Forward Calculations – Time 4 (3 word example)

TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

K1

K2

K3

b12=0.1

b13=0.3

b11=0.6

a11=0.7

S1

S1

S1

1

a12=0.3

a21=0.5

2

S2

S2

S2

S2

a22=0.5

b21=0.1

b23=0.2

b22=0.1

K1

K2

K3

K1

K2

K1

K2

K3

K3

backward calculations overview

S0

S1

Backward Calculations – Overview

TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

K1

K2

K3

b12=0.1

b11=0.6

b13=0.3

a11=0.7

S1

S1

1

a12=0.3

a21=0.5

2

S2

S2

S2

b23=0.2

a22=0.5

b21=0.1

b22=0.1

K1

K2

K3

K1

K2

K1

K2

K3

K3

backward calculations time 4
Backward Calculations – Time 4

TIME 4

K1

K2

K3

b11`=0.6

S1

S2

b21=0.1

K1

K2

K3

backward calculations time 3
Backward Calculations – Time 3

TIME 3

K1

K2

K3

b11`=0.6

S1

S2

b21=0.1

K1

K2

K3

backward calculations time 2
Backward Calculations – Time 2

TIME 3

TIME 4

TIME 2

NOTE: that 1 (2)+ 2 (2)

is the likelihood the observation/word sequence “K2,K1”in this “2 word example”

K1

K2

K3

K1

K2

K3

b12=0.1

b13=0.3

a11=0.7

S1

S1

a12=0.3

a21=0.5

a22=0.5

S2

S2

b23=0.2

b22=0.7

K1

K2

K1

K2

K3

K3

backward calculations time 1

S0

S1

Backward Calculations – Time 1

TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

K1

K2

K3

b12=0.1

b11=0.6

b13=0.3

a11=0.7

S1

S1

1

a12=0.3

a21=0.5

2

S2

S2

S2

b23=0.2

a22=0.5

b21=0.1

b22=0.1

K1

K2

K3

K1

K2

K1

K2

K3

K3

the forward backward method
The Forward-Backward Method
  • Note the forward method computes:
  • Note the backward method computes (t>1):
  • We can do the forward-backward methodwhich computes p(K1,…,KT) using formula (using any choice of t=1,…,T+1!):
solution to problem 1
Solution to Problem 1
  • The “hard part” of the 1st Problem was to find the likelihood of the observations for an HMM
  • We can now do this using either theforward, backward, or forward-backwardmethod.
solution to problem 2 viterbi algorithm computing most probable labeling
Solution to Problem 2: Viterbi Algorithm(Computing “Most Probable” Labeling)
  • Consider direct calculation of labeledobservations
  • Previously we summedthese likelihoods together across all possible labelings to solve the first problemwhich was to compute the likelihood of the observationsgiven the parameters (Hard part of HMM Question 1!).
    • We solved this problem using forward or backward method.
  • Now we want to compute all possible labelings and theirrespective likelihoods and pick the labeling which isthe largest!

EXAMPLE:Compute p(“DOG”/NOUN,”is”/VERB,”good”/ADJ|{aij},{bkm})

efficiency of calculations is important e g most likely labeling problem
Efficiency of Calculations is Important (e.g., Most Likely Labeling Problem)
  • Just as in the forward-backward calculations wecan solve problem of computing likelihood of every possible one of the NT labelings efficiently
  • Instead of millions of years of computing time we can solve the problem in several seconds!!
viterbi algorithm overview same setup as forward algorithm

S0

S1

Viterbi Algorithm – Overview (same setup as forward algorithm)

TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

K1

K2

K3

b12=0.1

b11=0.6

b13=0.3

a11=0.7

S1

S1

1

a12=0.3

a21=0.5

2

S2

S2

S2

b23=0.2

a22=0.5

b21=0.1

b22=0.1

K1

K2

K3

K1

K2

K1

K2

K3

K3

forward calculations time 2 1 word example1

S0

S1

Forward Calculations – Time 2 (1 word example)

TIME 2

K1

K2

K3

b13=0.3

a11=0.7

S1

1=1

a12=0.3

a21=0.5

2=0

S2

S2

a22=0.5

b23=0.2

K1

K2

K3

backtracking time 2 1 word example

S0

S1

Backtracking – Time 2 (1 word example)

TIME 2

K1

K2

K3

b13=0.3

a11=0.7

S1

1=1

a12=0.3

a21=0.5

2=0

S2

S2

a22=0.5

b23=0.2

K1

K2

K3

forward calculations 2 word example

S0

S1

Forward Calculations – (2 word example)

TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

b12=0.1

b13=0.3

S1

S1

a11=0.7

1

a21=0.5

a12=0.3

2

a22=0.5

S2

S2

S2

b23=0.2

b22=0.1

K1

K2

K3

K1

K2

K3

backtracking 2 word example

S0

S1

BACKTRACKING – (2 word example)

TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

b12=0.1

b13=0.3

S1

S1

a11=0.7

1

a21=0.5

a12=0.3

2

a22=0.5

S2

S2

S2

b23=0.2

b22=0.1

K1

K2

K3

K1

K2

K3

forward calculations time 4 3 word example1

S0

S1

Forward Calculations – Time 4 (3 word example)

TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

K1

K2

K3

b12=0.1

b13=0.3

b11=0.6

a11=0.7

S1

S1

S1

1

a12=0.3

a21=0.5

2

S2

S2

S2

S2

a22=0.5

b21=0.1

b23=0.2

b22=0.1

K1

K2

K3

K1

K2

K1

K2

K3

K3

backtracking to obtain labeling for 3 word case

S0

S1

Backtracking to Obtain Labeling for 3 word case

TIME 3

TIME 4

TIME 2

K1

K2

K3

K1

K2

K3

K1

K2

K3

b12=0.1

b13=0.3

b11=0.6

a11=0.7

S1

S1

S1

1

a12=0.3

a21=0.5

2

S2

S2

S2

S2

a22=0.5

b21=0.1

b23=0.2

b22=0.1

K1

K2

K3

K1

K2

K1

K2

K3

K3

third fundamental question parameter estimation
Third Fundamental Question:Parameter Estimation
  • Make Initial Guess for {aij} and {bkm}
  • Compute probability one hidden state follows another given: {aij} and {bkm} and sequence of observations.(computed using forward-backward algorithm)
  • Compute probability of observed state given a hidden state given: {aij} and {bkm} and sequence of observations.(computed using forward-backward algorithm)
  • Use these computed probabilities tomake an improved guess for {aij} and {bkm}
  • Repeat this process until convergence
  • Can be shown that this algorithm does infact converge to correct choice for {aij} and {bkm}assuming that the initial guess was close enough..