1 / 18

Finite-State and the Noisy Channel

Finite-State and the Noisy Channel. Word Segmentation. theprophetsaidtothecity. What does this say? And what other words are substrings? Could segment with parsing (how?), but slow. Given L = a “lexicon” FSA that matches all English words. How to apply to this problem?

fidelia
Download Presentation

Finite-State and the Noisy Channel

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finite-State and the Noisy Channel 600.465 - Intro to NLP - J. Eisner

  2. Word Segmentation theprophetsaidtothecity • What does this say? • And what other words are substrings? • Could segment with parsing (how?), but slow. • Given L = a “lexicon” FSA that matches all English words. • How to apply to this problem? • What if Lexicon is weighted? • From unigrams to bigrams? • Smooth L to include unseen words? 600.465 - Intro to NLP - J. Eisner

  3. Spelling correction • Spelling correction also needs a lexicon L • But there is distortion … • Let T be a transducer that models common typos and other spelling errors • ance () ence(deliverance, ...) • ee(deliverance, ...) • e e // Cons _ Cons(athlete, ...) • rrr (embarrasş occurrence, …) • gedge(privilege, …) • etc. • Now what can you do with L .o. T ? • Should T and L have probabilities? • Want T to include “all possible” errors … 600.465 - Intro to NLP - J. Eisner

  4. Noisy Channel Model real language X noisy channel X  Y yucky language Y want to recover X from Y 600.465 - Intro to NLP - J. Eisner

  5. Noisy Channel Model real language X correct spelling typos noisy channel X  Y yucky language Y misspelling want to recover X from Y 600.465 - Intro to NLP - J. Eisner

  6. Noisy Channel Model real language X (lexicon space)* delete spaces noisy channel X  Y yucky language Y text w/o spaces want to recover X from Y 600.465 - Intro to NLP - J. Eisner

  7. language model acoustic model Noisy Channel Model real language X (lexicon space)* pronunciation noisy channel X  Y yucky language Y speech want to recover X from Y 600.465 - Intro to NLP - J. Eisner

  8. Noisy Channel Model real language X tree probabilistic CFG delete everythingbut terminals noisy channel X  Y yucky language Y text want to recover X from Y 600.465 - Intro to NLP - J. Eisner

  9. Noisy Channel Model real language X p(X) * p(Y | X) noisy channel X  Y = yucky language Y p(X,Y) want to recover xX from yY choose x that maximizes p(x | y) or equivalently p(x,y) 600.465 - Intro to NLP - J. Eisner

  10. a:a/0.7 b:b/0.3 .o. a:C/0.1 b:C/0.8 a:D/0.9 b:D/0.2 = a:C/0.07 b:C/0.24 a:D/0.63 b:D/0.06 Noisy Channel Model p(X) * p(Y | X) = p(X,Y) Note p(x,y) sums to 1. Suppose y=“C”; what is best “x”? 600.465 - Intro to NLP - J. Eisner

  11. Noisy Channel Model a:a/0.7 b:b/0.3 p(X) .o. * a:C/0.1 b:C/0.8 p(Y | X) a:D/0.9 b:D/0.2 = = p(X,Y) a:C/0.07 b:C/0.24 a:D/0.63 b:D/0.06 Suppose y=“C”; what is best “x”? 600.465 - Intro to NLP - J. Eisner

  12. .o. * (Y=y)? C:C/1 Noisy Channel Model a:a/0.7 b:b/0.3 p(X) .o. * a:C/0.1 b:C/0.8 p(Y | X) a:D/0.9 b:D/0.2 restrict just to paths compatible with output “C” = = p(X, y) a:C/0.07 b:C/0.24 best path 600.465 - Intro to NLP - J. Eisner

  13. Morpheme Segmentation • Let Lexicon be a machine that matches all Turkish words • Same problem as word segmentation • Just at a lower level: morpheme segmentation • Turkish word: uygarlaştıramadıklarımızdanmışsınızcasına= uygar+laş+tır+ma+dık+ları+mız+dan+mış+sınız+ca+sı+na(behaving) as if you are among those whom we could not cause to become civilized • Some constraints on morpheme sequence: bigram probs • Generative model – concatenate then fix up joints • stop + -ing = stopping, fly + -s = flies, vowel harmony • Use a cascade of transducers to handle all the fixups • But this is just morphology! • Can use probabilities here too (but people often don’t) 600.465 - Intro to NLP - J. Eisner

  14. Edit Distance Transducer O(k) deletion arcs b:e a:e O(k2) substitution arcs a:b b:a e:a O(k) insertionarcs a:a e:b b:b O(k) no-change arcs 600.465 - Intro to NLP - J. Eisner

  15. Stochastic Edit Distance Transducer O(k) deletion arcs b:e a:e O(k2) substitution arcs a:b b:a e:a O(k) insertionarcs a:a e:b b:b O(k) identity arcs Likely edits = high-probability arcs 600.465 - Intro to NLP - J. Eisner

  16. b:e a:e a:b b:a e:a a:a e:b b:b Stochastic Edit Distance Transducer Best path (by Dijkstra’s algorithm) clara l:e a:e r:e a:e c:e .o. a:c l:c c:c a:c r:c e:c e:c e:c e:c e:c e:c l:e a:e r:e a:e c:e l:a c:a a:a r:a a:a = e:a e:a e:a e:a e:a e:a l:e a:e r:e a:e c:e l:c c:c a:c r:c a:c e:c e:c e:c e:c e:c e:c l:e a:e r:e a:e c:e l:a .o. c:a a:a a:a r:a e:a e:a e:a e:a e:a e:a l:e a:e r:e a:e c:e caca 600.465 - Intro to NLP - J. Eisner

  17. .o. .o. Speech Recognition by FST Composition (Pereira & Riley 1996) p(word seq) trigram language model p(phone seq | word seq) pronunciation model p(acoustics | phone seq) acoustic model .o. observed acoustics 600.465 - Intro to NLP - J. Eisner

  18. .o. .o. Speech Recognition by FST Composition (Pereira & Riley 1996) p(word seq) trigram language model p(phone seq | word seq) CAT:k æ t p(acoustics | phone seq) æ : phone context phone context .o. 600.465 - Intro to NLP - J. Eisner

More Related