- By
**kylee** - Follow User

- 134 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'HMM (I)' - kylee

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

HMM

- Definition and properties of HMM
- Two types of HMM
- Three basic questions in HMM

Hidden Markov Models

- There are n states s1, …, sn in an HMM, and the states are connected.
- The output symbols are produced by the states or edges in HMM.
- An observation O=(o1, …, oT) is a sequence of output symbols.
- Given an observation, we want to recover the hidden state sequence.
- An example: POS tagging
- States are POS tags
- Output symbols are words
- Given an observation (i.e., a sentence), we want to discover the tag sequence.

Two types of HMMs

- State-emission HMM (Moore machine):
- The output symbol is produced by states:
- By the from-state
- By the to-state
- Arc-emission HMM (Mealy machine):
- The output symbol is produce by the edges; i.e., by the (from-state, to-state) pairs.

Formal definition of PFA

A PFA is

- Q: a finite set of N states
- Σ: a finite set of input symbols
- I: Q R+ (initial-state probabilities)
- F: Q R+ (final-state probabilities)
- : the transition relation between states.
- P:(transition probabilities)

Constraints on function:

Probability of a string:

b:0.8

a:1.0

q0:0

q1:0.2

An example of PFAF(q0)=0

F(q1)=0.2

I(q0)=1.0

I(q1)=0.0

P(abn)=I(q0)*P(q0,abn,q1)*F(q1)

=1.0 * 1.0*0.8n *0.2

Definition of arc-emission HMM

- A HMM is a tuple :
- A set of states S={s1, s2, …, sN}.
- A set of output symbols Σ={w1, …, wM}.
- Initial state probabilities
- Transition prob: A={aij}.
- Emission prob: B={bijk}

Constraints in an arc-emission HMM

For any integer n and any HMM

An example: HMM structure

w1

w2

w1

w1

w5

…

sN

s1

s2

w4

w3

Same kinds of parameters but the emission probabilities depend on both states: P(wk | si, sj)

# of Parameters: O(N2M + N2).

PFA vs. Arc-emission HMM

A PFA is

- Q: a finite set of N states
- Σ: a finite set of input symbols
- I: Q R+ (initial-state probabilities)
- F: Q R+ (final-state probabilities)
- : the transition relation between states.
- P:(transition probabilities)

A HMM is a tuple :

- A set of states S={s1, s2, …, sN}.
- A set of output symbols Σ={w1, …, wM}.
- Initial state probabilities
- Transition prob: A={aij}.
- Emission prob: B={bijk}

Definition of state-emission HMM

- A HMM is a tuple :
- A set of states S={s1, s2, …, sN}.
- A set of output symbols Σ={w1, …, wM}.
- Initial state probabilities
- Transition prob: A={aij}.
- Emission prob: B={bjk}
- We use si and wk to refer to what is in an HMM structure.
- We use Xi and Oi to refer to what is in a particular HMM path and its output

Constraints in a state-emission HMM

For any integer n and any HMM

An example: the HMM structure

…

s1

s2

sN

w1

w2

w1

w3

w5

w1

- Two kinds of parameters:
- Transition probability: P(sj| si)
- Emission probability: P(wk | si)
- # of Parameters: O(NM+N2)

…

X1

X2

Xn

o2

on

o1

Output symbols are generated by the from-states- State sequence: X1,n
- Output sequence: O1,n

…

X2

X3

X1

Xn+1

o2

on

o1

Output symbols are generated by the to-states- State sequence: X1,n+1
- Output sequence: O1,n

…

X1

X2

Xn

o2

on

o1

…

X2

X3

X1

Xn+1

o2

on

o1

A path in a state-emission HMMOutput symbols are produced by the from-states:

Output symbols are produced by the to-states:

Properties of HMM

- Markov assumption (Limited horizon):
- Stationary distribution (Time invariance): the probabilities do not change over time:
- The states are hidden because we know the structure of the machine (i.e., S and Σ), but we don’t know which state sequences generate a particular output.

Are the two types of HMMs equivalent?

- For each state-emission HMM1, there is an arc-emission HMM2, such that for any sequence O, P(O|HMM1)=P(O|HMM2).
- The reverse is also true.
- How to prove that?

Applications of HMM

- N-gram POS tagging
- Bigram tagger: oi is a word, and si is a POS tag.
- Other tagging problems:
- Word segmentation
- Chunking
- NE tagging
- Punctuation predication
- …
- Other applications: ASR, ….

Three fundamental questions for HMMs

- Training an HMM: given a set of observation sequences, learn its distribution, i.e. learn the transition and emission probabilities
- HMM as a parser: Finding the best state sequence for a given observation
- HMM as an LM: compute the probability of a given observation

Training an HMM: estimating the probabilities

- Supervised learning:
- The state sequences in the training data are known
- ML estimation
- Unsupervised learning:
- The state sequences in the training data are unknown
- forward-backward algorithm

oT

o1

o2

XT+1

…

XT

X1

X2

HMM as a parser: Finding the best state sequence- Given the observation O1,T=o1…oT, find the state sequence X1,T+1=X1 … XT+1 that maximizes P(X1,T+1 | O1,T).

Viterbi algorithm

“time flies like an arrow”

\emission

N time 0.1

V time 0.1

N flies 0.1

V flies 0.2

V like 0.2

P like 0.1

DT an 0.3

N arrow 0.1

\init

BOS 1.0

\transition

BOS N 0.5

BOS DT 0.4

BOS V 0.1

DT N 1.0

N N 0.2

N V 0.7

N P 0.1

V DT 0.4

V N 0.4

V P 0.1

V V 0.1

P DT 0.6

P N 0.4

Viterbi algorithm

The probability of the best path that produces O1,t-1 while ending up in state sj:

Initialization:

Induction:

Modify it to allow ²-emission

Viterbi algorithm: calculating ±j(t)

# N is the number of states in the HMM structure

# observ is the observation O, and leng is the length of observ.

Initialize viterbi[0..leng] [0..N-1] to 0

for each state j

viterbi[0] [j] = ¼[j]

back-pointer[0] [j] = -1 # dummy

for (t=0; t

for (j=0; j

k=observ[t] # the symbol at time t

viterbi[t+1] [j] = maxi viterbi[t] [i] aij bjk

back-pointer[t+1] [j] = arg maxi viterbi[t] [i] aij bjk

Viterbi algorithm: retrieving the best path

# find the best path

best_final_state = arg maxj viterbi[leng] [j]

# start with the last state in the sequence

j = best_final_state

push(arr, j);

for (t=leng; t>0; t--)

i = back-pointer[t] [j]

push(arr, i)

j = i

return reverse(arr)

Hw7 and Hw8

- Hw7: write an HMM “class”:
- Read HMM input file
- Output HMM
- Hw8: implement the algorithms for two HMM tasks:
- HMM as parser: Viterbi algorithm
- HMM as LM: the prob of an observation

Implementation issue storing HMM

Approach #1:

- ¼i: pi {state_str}
- aij: a {from_state_str} {to_state_str}
- bjk: b {state_str} {symbol}

Approach #2:

- state2idx{state_str} = state_idx
- symbol2idx{symbol_str} = symbol_idx
- ¼i: pi [state_idx] = prob
- aij: a [from_state_idx] [to_state_idx] = prob
- bjk: b [state_idx] [symbol_idx] = prob
- idx2state[state_idx] = state_str
- Idx2symbol[symbol_idx] = symbol_str

Storing HMM: sparse matrix

- aij: a [i] [j] = prob
- bjk: b [j] [k] = prob
- aij: a[i] = “j1 p1 j2 p2 …”
- aij: a[j] = “i1 p1 i2 p2 …”
- bjk: b[j] = “k1 p1 k2 p2 ….”
- bjk: b[k] = “j1 p1 j2 p2 …”

Other implementation issues

- Index starts from 0 in programming, but often starts from 1 in algorithms
- The sum of logprob is used in practice to replace the product of prob.
- Check constraints and print out warning if the constraints are not met.

HMM as an LM: computing P(o1, …, oT)

1st try:

- enumerate all possible paths

- add the probabilities of all paths

Forward probabilities

- Forward probability: the probability of producing O1,t-1 while ending up in state si:

Summary

- Definition: hidden states, output symbols
- Properties: Markov assumption
- Applications: POS-tagging, etc.
- Three basic questions in HMM
- Find the probability of an observation: forward probability
- Find the best sequence: Viterbi algorithm
- Estimate probability: MLE
- Bigram POS tagger: decoding with Viterbi algorithm

Download Presentation

Connecting to Server..