HMM Algorithms

HMM Algorithms Linguistics 570, Lecture #5

HW #1

Where we left off • HMMs • A set of states • A set of observations • Transition probability matrix , where • Emission probability matrix , where • An initial probability distribution , where • A set of final/accepting states

Answer 3 questions [If we already know the parameters:] 1. How likely is this data given our model? 2. Which states most likely generated that data? [If we don’t know the parameters:] 3.Which parameters make this data most likely?

Answer 3 questions [If we already know the parameters:] 1. How likely is this data given our model? 2. Which states most likely generated that data? [If we don’t know the parameters:] 3.Which parameters make this data most likely? Dynamic Programming is crucial to all of these. DP = find the optimal answer (max / arg max)

An aside:Why “dynamic programming”? “programming” “computer programming” Think television programming • put programs in a schedule • maximize viewership / ad revenue This programming is optimization/solving, often max / arg max • Linear programming: optimize linear function • Quadratic programming: optimize quadratic fn • Dynamic programming: optimize by divide and conquer

Quick Math Review Take this function • What is • What is • How do I turn a max into a min?

DP: Fibonacci numbers

Fibonacci in Python #!/usr/bin/python def fib(n): if n == 0: return 1 if n == 1: return 1 return fib(n-1) + fib(n-2) print fib(40)

Runtime cquirk@patas:~ $ time ./fib.py 165580141 real 1m20.136s user 1m20.076s sys 0m0.038s

Dynamic programming

Dynamic programming solution #!/usr/bin/python def fib(n): x = [] x.append(1) x.append(1) for i in range(2, n + 1): x.append(x[i-2] + x[i-1]) return x[n] print fib(40) cquirk@patas:~ $ time ./fib_dp.py 165580141 real 0m0.020s user 0m0.013s sys 0m0.007s

Memoized solution #!/usr/bin/python x = [1,1] def fib(n): if n < len(x): return x[n] f = fib(n-1) + fib(n-2) x.append(f) return f print fib(40) cquirk@patas:~ $ time ./fib_memo.py 165580141 real 0m0.020s user 0m0.013s sys 0m0.007s

DP: String Edit Distance Classic problem: String Edit Distance (Levenstein Distance) • Given two strings and • What is the minimal sequence of insertions, deletions, and substitutions that turn into ? • Copy a character for free • Inserting a character drops my score by 1 • Deleting a character drops my score by 1

DP: String Edit Distance

Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits }

Recursive solution If one of the strings is empty, we have to use all insertions or all deletions intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits }

Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits } If the last character of each string is the same, copy

Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits } Try deleting the last character of the source

Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits } Try inserting the last character of the target

Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits } Whichever way resulted in the minimal number of edits – that’s our distance

Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits } This algorithm works, but it wastes a huge amount of computation!

Recursive solution Say we want to compute dist(chi, kit) dist(chi, kit)

Recursive solution We need to compute distances of substrings… dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(ch, ki)

Recursive solution So first, let’s take the middle arrow (recursive call): dist(ch, ki) dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(ch, ki)

Recursive solution It does a bunch of computation to figure out its distance dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(ch, ki) dist(ch, ki) dist(c, ki) dist(ch, ) dist(c, k) dist(, ki) dist(c, ) dist(, k) dist(,)

Recursive solution Now take another arrow… dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(chi, k) dist(ch, ki) dist(ch, ki)

Recursive solution It *also* asks for dist(ch, ki), but we just computed that! dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(chi, k) dist(ch, ki) dist(ch, ki)

Recursive solution It *also* asks for dist(ch, ki), but we just computed that! dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(chi, k) dist(ch, ki) dist(ch, ki) Let’s save a bunch of work and just compute these things once! This is Dynamic Programming Fill out a data structure that holds the solutions to all the subproblems This is called the CHART

DP: String Edit Distance

Back to our HMM • Let’s answer that first question: • Given an observation sequence • Model over obs & state seq: ) • What is

Back to our HMM • Let’s answer that first question: • Given an observation sequence • Model over obs & state seq: ) • What is • Remember from last week: • We could enumerate all state sequences, but how many are there? ( tags, words)

Back to our HMM • Let’s answer that first question: • Given an observation sequence • Model over obs & state seq: ) • What is • Remember from last week: • We could enumerate all state sequences, but how many are there? ( tags, words) • With 40 POS tags, a 10 word sentences has 10,485,760,000,000,000 possible state sequences!

Forward Algorithm • Dynamic programming solution: • Tabulates intermediate results as it computes the probability of the sequence • Folds summation over paths into a forward trellis

Forward Algorithm • Dynamic programming solution: • Tabulates intermediate results as it computes the probability of the sequence • Folds summation over paths into a forward trellis • Cell = probability of being in state after seeing first observations • Computed by summing over all paths to this cell

Forward Algorithm • Dynamic programming solution: • Tabulates intermediate results as it computes the probability of the sequence • Folds summation over paths into a forward trellis • Cell = probability of being in state after seeing first observations • Computed by summing over all paths to this cell • Assume are distinguished, non-emitting start and final states

time flies like an arrow

0.05 time flies like an arrow

0.05 0.01 time flies like an arrow

0.05 0.01 0 0 time flies like an arrow

0.05 0.01 = 0 0 time flies like an arrow

0.05 0.01 = =0.050.20.1+0.010.40.1 =0.001+0.0004 =0.0014 0 0 time flies like an arrow

0.05 0.0014 0.01 = =0.050.20.1+0.010.40.1 =0.001+0.0004 =0.0014 0 0 time flies like an arrow

0.05 0.0014 0.01 = =0.050.70.1+0.010.10.1 =0.0035+0.0001 =0.0036 0 0 time flies like an arrow

0.05 0.0014 0.01 0.0036 = =0.050.70.1+0.010.10.1 =0.0035+0.0001 =0.0036 0 0 time flies like an arrow

HMM Algorithms

HMM Algorithms

Presentation Transcript

Speech recognition using HMM

Three classic HMM problems

Learning HMM parameters

Visitor-Based HMM

Hmm…

HMM - Part 2

Parallelizing HMM Decoding

FSA and HMM

Overview of HMM

HMM - Part 2

HMM – HMM Comparison

Cryptanalysis using HMM

Protein homology detection by HMM–HMM comparison Johannes Söding

Initial HMM Model

hmm…you’ll find out.

HMM structure:

Learning HMM parameters

Hmm, HID HMMs

Protein homology detection by HMM–HMM comparison Johannes Söding

HMM Toolkit (HTK)