hmm i
Download
Skip this Video
Download Presentation
HMM (I)

Loading in 2 Seconds...

play fullscreen
1 / 49

HMM (I) - PowerPoint PPT Presentation


  • 134 Views
  • Uploaded on

HMM (I). LING 570 Fei Xia Week 7: 11/5-11/7/07. HMM. Definition and properties of HMM Two types of HMM Three basic questions in HMM. Definition of HMM. Hidden Markov Models. There are n states s 1 , …, s n in an HMM, and the states are connected.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'HMM (I)' - kylee


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
hmm i

HMM (I)

LING 570

Fei Xia

Week 7: 11/5-11/7/07

slide2
HMM
  • Definition and properties of HMM
    • Two types of HMM
  • Three basic questions in HMM
hidden markov models
Hidden Markov Models
  • There are n states s1, …, sn in an HMM, and the states are connected.
  • The output symbols are produced by the states or edges in HMM.
  • An observation O=(o1, …, oT) is a sequence of output symbols.
  • Given an observation, we want to recover the hidden state sequence.
  • An example: POS tagging
    • States are POS tags
    • Output symbols are words
    • Given an observation (i.e., a sentence), we want to discover the tag sequence.
same observation different state sequences
V

DT

N

N

N

time

flies

like

an

arrow

Same observation, different state sequences

P

DT

N

N

V

time

flies

like

an

arrow

two types of hmms
Two types of HMMs
  • State-emission HMM (Moore machine):
    • The output symbol is produced by states:
      • By the from-state
      • By the to-state
  • Arc-emission HMM (Mealy machine):
    • The output symbol is produce by the edges; i.e., by the (from-state, to-state) pairs.
formal definition of pfa
Formal definition of PFA

A PFA is

  • Q: a finite set of N states
  • Σ: a finite set of input symbols
  • I: Q R+ (initial-state probabilities)
  • F: Q R+ (final-state probabilities)
  • : the transition relation between states.
  • P:(transition probabilities)
slide9
Constraints on function:

Probability of a string:

an example of pfa
b:0.8

a:1.0

q0:0

q1:0.2

An example of PFA

F(q0)=0

F(q1)=0.2

I(q0)=1.0

I(q1)=0.0

P(abn)=I(q0)*P(q0,abn,q1)*F(q1)

=1.0 * 1.0*0.8n *0.2

definition of arc emission hmm
Definition of arc-emission HMM
  • A HMM is a tuple :
    • A set of states S={s1, s2, …, sN}.
    • A set of output symbols Σ={w1, …, wM}.
    • Initial state probabilities
    • Transition prob: A={aij}.
    • Emission prob: B={bijk}
constraints in an arc emission hmm
Constraints in an arc-emission HMM

For any integer n and any HMM

an example hmm structure
An example: HMM structure

w1

w2

w1

w1

w5

sN

s1

s2

w4

w3

Same kinds of parameters but the emission probabilities depend on both states: P(wk | si, sj)

 # of Parameters: O(N2M + N2).

a path in an arc emission hmm
o1

o2

on

Xn+1

X1

X2

Xn

A path in an arc emission HMM
  • State sequence: X1,n+1
  • Output sequence: O1,n
pfa vs arc emission hmm
PFA vs. Arc-emission HMM

A PFA is

  • Q: a finite set of N states
  • Σ: a finite set of input symbols
  • I: Q R+ (initial-state probabilities)
  • F: Q R+ (final-state probabilities)
  • : the transition relation between states.
  • P:(transition probabilities)

A HMM is a tuple :

    • A set of states S={s1, s2, …, sN}.
    • A set of output symbols Σ={w1, …, wM}.
    • Initial state probabilities
    • Transition prob: A={aij}.
    • Emission prob: B={bijk}
definition of state emission hmm
Definition of state-emission HMM
  • A HMM is a tuple :
    • A set of states S={s1, s2, …, sN}.
    • A set of output symbols Σ={w1, …, wM}.
    • Initial state probabilities
    • Transition prob: A={aij}.
    • Emission prob: B={bjk}
  • We use si and wk to refer to what is in an HMM structure.
  • We use Xi and Oi to refer to what is in a particular HMM path and its output
constraints in a state emission hmm
Constraints in a state-emission HMM

For any integer n and any HMM

an example the hmm structure
An example: the HMM structure

s1

s2

sN

w1

w2

w1

w3

w5

w1

  • Two kinds of parameters:
  • Transition probability: P(sj| si)
  • Emission probability: P(wk | si)
  •  # of Parameters: O(NM+N2)
output symbols are generated by the from states

X1

X2

Xn

o2

on

o1

Output symbols are generated by the from-states
  • State sequence: X1,n
  • Output sequence: O1,n
output symbols are generated by the to states

X2

X3

X1

Xn+1

o2

on

o1

Output symbols are generated by the to-states
  • State sequence: X1,n+1
  • Output sequence: O1,n
a path in a state emission hmm

X1

X2

Xn

o2

on

o1

X2

X3

X1

Xn+1

o2

on

o1

A path in a state-emission HMM

Output symbols are produced by the from-states:

Output symbols are produced by the to-states:

arc emission vs state emission
o1

o2

on

Xn+1

X1

X2

Xn

X2

X3

X1

Xn+1

o2

on

o1

Arc-emission vs. state-emission
properties of hmm
Properties of HMM
  • Markov assumption (Limited horizon):
  • Stationary distribution (Time invariance): the probabilities do not change over time:
  • The states are hidden because we know the structure of the machine (i.e., S and Σ), but we don’t know which state sequences generate a particular output.
are the two types of hmms equivalent
Are the two types of HMMs equivalent?
  • For each state-emission HMM1, there is an arc-emission HMM2, such that for any sequence O, P(O|HMM1)=P(O|HMM2).
  • The reverse is also true.
  • How to prove that?
applications of hmm
Applications of HMM
  • N-gram POS tagging
    • Bigram tagger: oi is a word, and si is a POS tag.
  • Other tagging problems:
    • Word segmentation
    • Chunking
    • NE tagging
    • Punctuation predication
  • Other applications: ASR, ….
three fundamental questions for hmms
Three fundamental questions for HMMs
  • Training an HMM: given a set of observation sequences, learn its distribution, i.e. learn the transition and emission probabilities
  • HMM as a parser: Finding the best state sequence for a given observation
  • HMM as an LM: compute the probability of a given observation
training an hmm estimating the probabilities
Training an HMM: estimating the probabilities
  • Supervised learning:
    • The state sequences in the training data are known
    • ML estimation
  • Unsupervised learning:
    • The state sequences in the training data are unknown
    • forward-backward algorithm
hmm as a parser finding the best state sequence
oT

o1

o2

XT+1

XT

X1

X2

HMM as a parser: Finding the best state sequence
  • Given the observation O1,T=o1…oT, find the state sequence X1,T+1=X1 … XT+1 that maximizes P(X1,T+1 | O1,T).

 Viterbi algorithm

time flies like an arrow
“time flies like an arrow”

\emission

N time 0.1

V time 0.1

N flies 0.1

V flies 0.2

V like 0.2

P like 0.1

DT an 0.3

N arrow 0.1

\init

BOS 1.0

\transition

BOS N 0.5

BOS DT 0.4

BOS V 0.1

DT N 1.0

N N 0.2

N V 0.7

N P 0.1

V DT 0.4

V N 0.4

V P 0.1

V V 0.1

P DT 0.6

P N 0.4

finding all the paths to build the trellis
N

V

P

DT

Finding all the paths: to build the trellis

time flies like an arrow

N

N

N

N

V

V

V

V

BOS

P

P

P

P

DT

DT

DT

DT

finding all the paths cont
Finding all the paths (cont)

time flies like an arrow

N

N

N

N

N

V

V

V

V

V

BOS

P

P

P

P

P

DT

DT

DT

DT

DT

viterbi algorithm
Viterbi algorithm

The probability of the best path that produces O1,t-1 while ending up in state sj:

Initialization:

Induction:

 Modify it to allow ²-emission

viterbi algorithm calculating j t
Viterbi algorithm: calculating ±j(t)

# N is the number of states in the HMM structure

# observ is the observation O, and leng is the length of observ.

Initialize viterbi[0..leng] [0..N-1] to 0

for each state j

viterbi[0] [j] = ¼[j]

back-pointer[0] [j] = -1 # dummy

for (t=0; t

for (j=0; j

k=observ[t] # the symbol at time t

viterbi[t+1] [j] = maxi viterbi[t] [i] aij bjk

back-pointer[t+1] [j] = arg maxi viterbi[t] [i] aij bjk

viterbi algorithm retrieving the best path
Viterbi algorithm: retrieving the best path

# find the best path

best_final_state = arg maxj viterbi[leng] [j]

# start with the last state in the sequence

j = best_final_state

push(arr, j);

for (t=leng; t>0; t--)

i = back-pointer[t] [j]

push(arr, i)

j = i

return reverse(arr)

hw7 and hw8
Hw7 and Hw8
  • Hw7: write an HMM “class”:
    • Read HMM input file
    • Output HMM
  • Hw8: implement the algorithms for two HMM tasks:
    • HMM as parser: Viterbi algorithm
    • HMM as LM: the prob of an observation
implementation issue storing hmm
Implementation issue storing HMM

Approach #1:

  • ¼i: pi {state_str}
  • aij: a {from_state_str} {to_state_str}
  • bjk: b {state_str} {symbol}

Approach #2:

  • state2idx{state_str} = state_idx
  • symbol2idx{symbol_str} = symbol_idx
  • ¼i: pi [state_idx] = prob
  • aij: a [from_state_idx] [to_state_idx] = prob
  • bjk: b [state_idx] [symbol_idx] = prob
  • idx2state[state_idx] = state_str
  • Idx2symbol[symbol_idx] = symbol_str
storing hmm sparse matrix
Storing HMM: sparse matrix
  • aij: a [i] [j] = prob
  • bjk: b [j] [k] = prob
  • aij: a[i] = “j1 p1 j2 p2 …”
  • aij: a[j] = “i1 p1 i2 p2 …”
  • bjk: b[j] = “k1 p1 k2 p2 ….”
  • bjk: b[k] = “j1 p1 j2 p2 …”
other implementation issues
Other implementation issues
  • Index starts from 0 in programming, but often starts from 1 in algorithms
  • The sum of logprob is used in practice to replace the product of prob.
  • Check constraints and print out warning if the constraints are not met.
hmm as an lm computing p o 1 o t
HMM as an LM: computing P(o1, …, oT)

1st try:

- enumerate all possible paths

- add the probabilities of all paths

forward probabilities
Forward probabilities
  • Forward probability: the probability of producing O1,t-1 while ending up in state si:
calculating forward probability
Calculating forward probability

Initialization:

Induction:

summary
Summary
  • Definition: hidden states, output symbols
  • Properties: Markov assumption
  • Applications: POS-tagging, etc.
  • Three basic questions in HMM
    • Find the probability of an observation: forward probability
    • Find the best sequence: Viterbi algorithm
    • Estimate probability: MLE
  • Bigram POS tagger: decoding with Viterbi algorithm
ad