1 / 23

Finite-State Methods

Finite-State Methods. c. a. e. Finite state acceptors (FSAs). Regexps Union, Kleene *, concat, intersect, complement, reversal Determinization, minimization Pumping, Myhill-Nerode. slide courtesy of L. Karttunen (modified). Network. Wordlist. c. l. e. a. r. clear clever ear

byrnes
Download Presentation

Finite-State Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finite-State Methods 600.465 - Intro to NLP - J. Eisner

  2. c a e Finite state acceptors (FSAs) • Regexps • Union, Kleene *, concat, intersect, complement, reversal • Determinization, minimization • Pumping, Myhill-Nerode 600.465 - Intro to NLP - J. Eisner

  3. slide courtesy of L. Karttunen (modified) Network Wordlist c l e a r clear clever ear ever fat father e v e compile f a t h Example: Unweighted acceptor FSM 17728 states, 37100 arcs 0.6 sec /usr/dict/words 25K words 206K chars 600.465 - Intro to NLP - J. Eisner

  4. slide courtesy of L. Karttunen (modified) Wordlist clear 0 clever 1 ear 2 ever 3 fat 4 father 5 Example: Weighted acceptor Network c/0 l/0 e/0 a/0 r/0 e/2 v/1 e/0 compile f/4 a/0 t/0 h/1 Computes a perfect hash! 600.465 - Intro to NLP - J. Eisner

  5. slide courtesy of L. Karttunen (modified) Wordlist c/0 l/0 e/0 a/0 r/0 clear 0 clever 1 ear 2 ever 3 fat 4 father 5 e/2 v/1 e/0 f/4 a/0 t/0 h/1 Example: Weighted acceptor Network c l e a r 2 6 1 2 1 2 e v e compile f a t h 2 1 2 2 • Successor states partition the path set • Use offsets of successor states as arc weights • Q: Would this work for an arbitrary numbering of the words? • Compute number of paths from each state (Q: how?) 600.465 - Intro to NLP - J. Eisner

  6. the problem of morphology (“word shape”) - an area of linguistics Example: Unweighted transducer VP [head=vouloir,...] V[head=vouloir,tense=Present,num=SG, person=P3] ... veut 600.465 - Intro to NLP - J. Eisner

  7. slide courtesy of L. Karttunen (modified) vouloir +Pres +Sing + P3 Finite-state transducer veut canonical form inflection codes v o u l o i r +Pres +Sing +P3 v e u t inflected form Example: Unweighted transducer VP [head=vouloir,...] V[head=vouloir,tense=Present,num=SG, person=P3] ... veut the relevant path 600.465 - Intro to NLP - J. Eisner

  8. slide courtesy of L. Karttunen vouloir +Pres +Sing + P3 Finite-state transducer veut canonical form inflection codes v o u l o i r +Pres +Sing +P3 v e u t inflected form Example: Unweighted transducer • Bidirectional: generation or analysis • Compact and fast • Xerox sells for about 20 languges including English, German, Dutch, French, Italian, Spanish, Portuguese, Finnish, Russian, Turkish, Japanese, ... • Research systems for many other languages, including Arabic, Malay the relevant path 600.465 - Intro to NLP - J. Eisner

  9. Regular Relation (of strings) • Relation: like a function, but multiple outputs ok • Regular: finite-state • Transducer: automaton w/ outputs • b ? a ? • aaaaa ? a:e b:b {b} {} a:a a:c {ac, aca, acab, acabc} ?:c b:b ?:b • Invertible? • Closed under composition? ?:a b:e 600.465 - Intro to NLP - J. Eisner

  10. Regular Relation (of strings) • Can weight the arcs:  vs.  • b {b} a {} • aaaaa {ac, aca, acab, acabc} • How to find best outputs? • For aaaaa? • For all inputs at once? a:e b:b a:a a:c ?:c b:b ?:b ?:a b:e 600.465 - Intro to NLP - J. Eisner

  11. Acceptors (FSAs) Transducers (FSTs) c c:z a a:x Unweighted e e:y c:z/.7 c/.7 a:x/.5 a/.5 Weighted .3 .3 e:y/.5 e/.5 Function from strings to ... {false, true} strings numbers (string, num) pairs 600.465 - Intro to NLP - J. Eisner

  12. Sample functions Acceptors (FSAs) Transducers (FSTs) {false, true} strings Grammatical? Markup Correction Translation Unweighted numbers (string, num) pairs How grammatical? Better, how likely? Good markups Good corrections Good translations Weighted 600.465 - Intro to NLP - J. Eisner

  13. abcd abcd f g Functions ab?d 600.465 - Intro to NLP - J. Eisner

  14. Functions ab?d abcd Function composition: f  g [first f, then g – intuitive notation, but opposite of the traditional math notation] 600.465 - Intro to NLP - J. Eisner

  15. g 3 4 abgd 2 2 abed abed 8 6 abd abjd ... From Functions to Relations f ab?d abcd 600.465 - Intro to NLP - J. Eisner

  16. From Functions to Relations 4 ab?d abgd 2 abed Relation composition: f  g 8 abd ... 600.465 - Intro to NLP - J. Eisner

  17. From Functions to Relations ab?d abed Pick best output Often in NLP, all of the functions or relations involved can be described as finite-state machines, and manipulated using standard algorithms. 600.465 - Intro to NLP - J. Eisner

  18. g 3 4 abgd 2 2 abed abed 8 6 abd abjd ... Inverting Relations f ab?d abcd 600.465 - Intro to NLP - J. Eisner

  19. Inverting Relations g-1 f-1 3 4 ab?d abcd abgd 2 2 abed abed 8 6 abd abjd ... 600.465 - Intro to NLP - J. Eisner

  20. Inverting Relations 4 ab?d abgd 2 abed (f  g)-1 = g-1 f-1 8 abd ... 600.465 - Intro to NLP - J. Eisner

  21. slide courtesy of L. Karttunen (modified) c l e a r e v e f a t h Lexical Transducer (a single FST) Lexicon FSA composition Regular Expressions for Rules ComposedRule FSTs Compiler b i g +Adj +Comp g e b i g r one path Building a lexical transducer big | clear | clever | ear | fat | ... Regular Expression Lexicon 600.465 - Intro to NLP - J. Eisner

  22. slide courtesy of L. Karttunen (modified) c l e a r e v e f a t h Lexicon FSA Building a lexical transducer • Actually, the lexicon must contain elements likebig +Adj +Comp • So write it as a more complicated expression:(big | clear | clever | fat | ...) +Adj ( | +Comp | +Sup)  adjectives | (ear | father | ...) +Noun (+Sing | +Pl)  nouns | ...  ... • Q: Why do we need a lexicon at all? big | clear | clever | ear | fat | ... Regular Expression Lexicon 600.465 - Intro to NLP - J. Eisner

  23. slide courtesy of L. Karttunen (modified) être+IndP +SG + P1 suivre+IndP+SG+P1 suivre+IndP+SG+P2 suivre+Imp+SG + P2 Weighted version of transducer: Assigns a weight to each string pair “upper language” 4 payer+IndP+SG+P1 19 20 12 Weighted French Transducer 50 3 suis paie “lower language” paye 600.465 - Intro to NLP - J. Eisner

More Related