1 / 102

HMM Algorithms

HMM Algorithms. Linguistics 570, Lecture #5. HW #1. Where we left off. HMMs A set of states A set of observations Transition probability matrix , where Emission probability matrix , where An initial probability distribution , where A set of final/accepting states .

vadin
Download Presentation

HMM Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HMM Algorithms Linguistics 570, Lecture #5

  2. HW #1

  3. Where we left off • HMMs • A set of states • A set of observations • Transition probability matrix , where • Emission probability matrix , where • An initial probability distribution , where • A set of final/accepting states

  4. Answer 3 questions [If we already know the parameters:] 1. How likely is this data given our model? 2. Which states most likely generated that data? [If we don’t know the parameters:] 3.Which parameters make this data most likely?

  5. Answer 3 questions [If we already know the parameters:] 1. How likely is this data given our model? 2. Which states most likely generated that data? [If we don’t know the parameters:] 3.Which parameters make this data most likely? Dynamic Programming is crucial to all of these. DP = find the optimal answer (max / arg max)

  6. An aside:Why “dynamic programming”? “programming” “computer programming” Think television programming • put programs in a schedule • maximize viewership / ad revenue This programming is optimization/solving, often max / arg max • Linear programming: optimize linear function • Quadratic programming: optimize quadratic fn • Dynamic programming: optimize by divide and conquer

  7. Quick Math Review Take this function • What is • What is • How do I turn a max into a min?

  8. DP: Fibonacci numbers

  9. Fibonacci in Python #!/usr/bin/python def fib(n): if n == 0: return 1 if n == 1: return 1 return fib(n-1) + fib(n-2) print fib(40)

  10. Runtime cquirk@patas:~ $ time ./fib.py 165580141 real 1m20.136s user 1m20.076s sys 0m0.038s

  11. Dynamic programming

  12. Dynamic programming solution #!/usr/bin/python def fib(n): x = [] x.append(1) x.append(1) for i in range(2, n + 1): x.append(x[i-2] + x[i-1]) return x[n] print fib(40) cquirk@patas:~ $ time ./fib_dp.py 165580141 real 0m0.020s user 0m0.013s sys 0m0.007s

  13. Memoized solution #!/usr/bin/python x = [1,1] def fib(n): if n < len(x): return x[n] f = fib(n-1) + fib(n-2) x.append(f) return f print fib(40) cquirk@patas:~ $ time ./fib_memo.py 165580141 real 0m0.020s user 0m0.013s sys 0m0.007s

  14. DP: String Edit Distance Classic problem: String Edit Distance (Levenstein Distance) • Given two strings and • What is the minimal sequence of insertions, deletions, and substitutions that turn into ? • Copy a character for free • Inserting a character drops my score by 1 • Deleting a character drops my score by 1

  15. DP: String Edit Distance

  16. Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits }

  17. Recursive solution If one of the strings is empty, we have to use all insertions or all deletions intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits }

  18. Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits } If the last character of each string is the same, copy

  19. Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits } Try deleting the last character of the source

  20. Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits } Try inserting the last character of the target

  21. Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits } Whichever way resulted in the minimal number of edits – that’s our distance

  22. Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits } This algorithm works, but it wastes a huge amount of computation!

  23. Recursive solution Say we want to compute dist(chi, kit) dist(chi, kit)

  24. Recursive solution We need to compute distances of substrings… dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(ch, ki)

  25. Recursive solution So first, let’s take the middle arrow (recursive call): dist(ch, ki) dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(ch, ki)

  26. Recursive solution It does a bunch of computation to figure out its distance dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(ch, ki) dist(ch, ki) dist(c, ki) dist(ch, ) dist(c, k) dist(, ki) dist(c, ) dist(, k) dist(,)

  27. Recursive solution Now take another arrow… dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(chi, k) dist(ch, ki) dist(ch, ki)

  28. Recursive solution It *also* asks for dist(ch, ki), but we just computed that! dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(chi, k) dist(ch, ki) dist(ch, ki)

  29. Recursive solution It *also* asks for dist(ch, ki), but we just computed that! dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(chi, k) dist(ch, ki) dist(ch, ki) Let’s save a bunch of work and just compute these things once! This is Dynamic Programming Fill out a data structure that holds the solutions to all the subproblems This is called the CHART

  30. DP: String Edit Distance

  31. DP: String Edit Distance

  32. DP: String Edit Distance

  33. Back to our HMM • Let’s answer that first question: • Given an observation sequence • Model over obs & state seq: ) • What is

  34. Back to our HMM • Let’s answer that first question: • Given an observation sequence • Model over obs & state seq: ) • What is

  35. Back to our HMM • Let’s answer that first question: • Given an observation sequence • Model over obs & state seq: ) • What is • Remember from last week: • We could enumerate all state sequences, but how many are there? ( tags, words)

  36. Back to our HMM • Let’s answer that first question: • Given an observation sequence • Model over obs & state seq: ) • What is • Remember from last week: • We could enumerate all state sequences, but how many are there? ( tags, words) • With 40 POS tags, a 10 word sentences has 10,485,760,000,000,000 possible state sequences!

  37. Forward Algorithm • Dynamic programming solution: • Tabulates intermediate results as it computes the probability of the sequence • Folds summation over paths into a forward trellis

  38. Forward Algorithm • Dynamic programming solution: • Tabulates intermediate results as it computes the probability of the sequence • Folds summation over paths into a forward trellis • Cell = probability of being in state after seeing first observations • Computed by summing over all paths to this cell

  39. Forward Algorithm • Dynamic programming solution: • Tabulates intermediate results as it computes the probability of the sequence • Folds summation over paths into a forward trellis • Cell = probability of being in state after seeing first observations • Computed by summing over all paths to this cell • Assume are distinguished, non-emitting start and final states

  40. time flies like an arrow

  41. time flies like an arrow

  42. 0.05 time flies like an arrow

  43. 0.05 0.01 time flies like an arrow

  44. 0.05 0.01 0 0 time flies like an arrow

  45. 0.05 0.01 0 0 time flies like an arrow

  46. 0.05 0.01 = 0 0 time flies like an arrow

  47. 0.05 0.01 = =0.050.20.1+0.010.40.1 =0.001+0.0004 =0.0014 0 0 time flies like an arrow

  48. 0.05 0.0014 0.01 = =0.050.20.1+0.010.40.1 =0.001+0.0004 =0.0014 0 0 time flies like an arrow

  49. 0.05 0.0014 0.01 = =0.050.70.1+0.010.10.1 =0.0035+0.0001 =0.0036 0 0 time flies like an arrow

  50. 0.05 0.0014 0.01 0.0036 = =0.050.70.1+0.010.10.1 =0.0035+0.0001 =0.0036 0 0 time flies like an arrow

More Related