1 / 52

CSA3050: Natural Language Algorithms

CSA3050: Natural Language Algorithms. Finite State Devices. Sources. Blackburn & Striegnitz Ch. 2. Part I. Parsers and Transducers. Parsers vs. Recognisers. Recognizers tell us whether a given input is accepted by some finite state automaton.

kare
Download Presentation

CSA3050: Natural Language Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSA3050: Natural Language Algorithms Finite State Devices

  2. Sources • Blackburn & Striegnitz Ch. 2 CSA3180 NLP

  3. Part I Parsers and Transducers

  4. Parsers vs. Recognisers • Recognizers tell us whether a given input is accepted by some finite state automaton. • Often we would like to have an explanation of why it was accepted. • Parsers give us that kind of explanation. • What form does it take? CSA3180 NLP

  5. Finite State Parser • The output of a finite state parser is a sequence of nodes and arcs. If we, gave the input [h,a,h,a,!] to a parser for our first laughing automaton, it should give us [1,h,2,a,3,h,2,a,3,!,4]. • The standard technique in Prolog for turning a recognizer into a parser is to add one or more extra arguments to keep track of the structure that was found. CSA3180 NLP

  6. Recogniser recognize1(Node,[ ]) :-    final(Node). Parser parse1(Node,[ ],[Node]) :-    final(Node). Base Case CSA3180 NLP

  7. Recogniser recognize1(Node1, String) :- arc(Node1,Node2,Label), traverse1(Label, String, NewString), recognize1(Node2, NewString). Parser parse1(Node1, String, [Node1,Label|Path]) :-arc(Node1,Node2,Label),traverse1( Label, String, NewString), parse1(Node2, NewString, Path). Recursive Case CSA3180 NLP

  8. Words as Labels • So far we have only considered transitions with single-character labels. • More complex labels are possible – e.g. words comprising several characters. • We can construct an FSA recognizing English noun phrases that can be built from the words:the, a, wizard, witch, broomstick, hermione, harry, ron, with, fast. CSA3180 NLP

  9. FSA for Noun Phrases CSA3180 NLP

  10. initial(1).final(3).arc(1,2,a).arc(1,2,the).arc(2,2,brave).arc(2,2,fast).arc(2,3,witch). initial(1).final(3).arc(1,2,a).arc(1,2,the).arc(2,2,brave).arc(2,2,fast).arc(2,3,witch). arc(2,3,wizard).arc(2,3,broomstick).arc(2,3,rat).arc(1,3,harry).arc(1,3,ron).arc(1,3,hermione).arc(3,1,with). FSA for NPs in Prolog CSA3180 NLP

  11. Parsing a Noun Phrase testparse1(Symbols,Parse) :- initial(Node),parse1(Node,Symbols,Parse). ?- testparse1([the,fast,wizard],Z). Z=[1, the, 2, fast, 2, wizard, 3] CSA3180 NLP

  12. Rewriting Categories • It is also possible to obtain a more abstract parse, e.g. ?- testparse2([the,fast,wizard],Z). Z=[1, det, 2, adj, 2, noun, 3] • What changes are required to obtain this behaviour? CSA3180 NLP

  13. 1. Changes to the FSA %FSA %Lexicon initial(1).           lex(a,det).final(3).             lex(the,det).arc(1,2,det).         lex(fast,adj).arc(2,2,adj).         lex(brave,adj).arc(2,3,cn).          lex(witch,cn).arc(1,3,pn).          lex(wizard,cn).arc(3,1,prep).        lex(broomstick,cn).                      lex(rat,cn).                      lex(harry,pn).                      lex(hermione,pn).                      lex(ron,pn).                      lex(with,prep). CSA3180 NLP

  14. Parse1 parse1(Node1, String, [Node1,Label|Path]) :-arc(Node1,Node2,Label),traverse1( Label, String, NewString), parse1(Node2, NewString, Path). Changes to the Parser Parse2 parse2(Node1, String, [Node1,Label|Path]) :-arc(Node1,Node2,Label),traverse2( Label, String, NewString), parse2(Node2, NewString, Path). traverse2(Cat,[Word|S],S) :-   lex(Word,Cat). CSA3180 NLP

  15. Handling Jumps traverse3('#',String,String). traverse3(Cat,[Word|Words],Words) :-   lex(Word,Cat). CSA3180 NLP

  16. Finite State Transducers • A finite state transducer essentially is a finite state automaton that works on two (or more) tapes. • The most common way to think about transducers is as a kind of “translating machine” which works by reading from one tape and writing onto the other. CSA3180 NLP

  17. initial state: arrowhead final state:double circle a:b read from first tape and write to second tape A Translator from a to b a:b 1 CSA3180 NLP

  18. Prolog Representation :- op(250,xfx,:).initial(1).final(1).arc(1,1,a:b). CSA3180 NLP

  19. Modes of Operation • generation mode: It writes a string of as on one tape and a string of bs on the other tape. Both strings have the same length. • recognition mode: It accepts when the word on the first tape consists of exactly as many as as the word on the second tape consists of bs. • translation mode (left to right): It reads as from the first tape and writes a b for every a that it reads onto the second tape. • translation mode (right to left): It reads bs from the second tape and writes an a for every b that it reads onto the first tape. CSA3180 NLP

  20. Computational Morphology Part II

  21. Morphology • Morphemes: The smallest unit in a word that bear some meaning, such as rabbit and s, are called morphemes. • Combination of morphemes to form words that are legal in some language. • Two kinds of morphology • Inflectional • Derivational CSA3180 NLP

  22. Inflectional+s plural+ed past category preserving productive: always applies (esp. new words, e.g. fax) systematic: same semantic effect Derivational+ment category changingescape+ment not completely productive: detractment* not completely systematic: apartment Inflectional/DerivationalMorphology CSA3180 NLP

  23. Example: English Noun Inflections CSA3180 NLP

  24. Morphological Parsing Output Analysis cat N PL Input Word cats Morphological Parser • Output is a string of morphemes • lexeme, other meaningful morphemes • Reversibility? CSA3180 NLP

  25. Morphological Parsing • The goal of morphological parsing is to find out what morphemes a given word is built from. cats cat N PL mice mouse N PL foxes fox N PL CSA3180 NLP

  26. Morphological Analysis with FSTs • Basic idea is to write FSTs that map the surface form of a word to a description of the morphemes that constitute that word or vice versa. • Example: wizard+s to wizard+PL or kiss+ed to kiss+PAST. CSA3180 NLP

  27. Plural Nouns in English • Regular Forms • add an s as in wizard+s. • add –es as in witch +s • Handled with morpho-phonological rules that insert an e whenever the morpheme preceding the s ends in s, x, ch or another fricative. • Irregular forms • mouse/mice • automaton/automata • Handled on a case-by-case basis • Require transducer that translates wizard+s into wizard+PL, witch+es into witch+PL, mice, into mouse+PL and automata into automaton+PL. CSA3180 NLP

  28. 2 Steps • Split word up into its possible components, using + to indicate possible morpheme boundaries. cats cat + s foxes fox + s mice mouse + s • Look up the categories of the stems and the meaning of the affixes, using a lexicon of stems and affixescat + s cat NP PL fox + s fox N PL mouse + s mouse N PL CSA3180 NLP

  29. Step 1 • Transducer may or may not insert a ‘+’ (morpheme boundary) if the word ends in ‘s’. • If the word ends in ses, xes, or zes, it may delete the ‘e’ when inserting the morpheme boundary, e.g.churches → church + s CSA3180 NLP

  30. Transducer for Step 1Surface Intermediate CSA3180 NLP

  31. Transducer for Step 1Surface Intermediate CSA3180 NLP

  32. Prolog Representation • The transducer specifications we have seen translate easily into Prolog format except for the other transition. • arc(1,3,z:z).arc(1,3,s:s).arc(1,3,x:x).arc(1,2,#:+).arc(3,1,<other>).Arc(1,1,<other>). CSA3180 NLP

  33. One Way to Handle <other> arcs arc(1,3,z:z).arc(1,3,s:s).arc(1,3,x:x).arc(1,2,#:+).arc(3,1,a:a). arc(3,1,b:b). arc(3,1,c:c). : etc : etc arc(3,1,y:y). CSA3180 NLP

  34. Transducer for Step2 Intermediate Morphemes Possible inputs to the transducer are: • Regular noun stem: cat • Regular noun stem + s: cat+s • Singular irregular noun stem: mouse • Plural irregular noun stem: mice CSA3180 NLP

  35. 2. Intermediate MorphemesTransducer CSA3180 NLP

  36. Handling Stems cat /cat mice/mouse CSA3180 NLP

  37. Completed Stage 2 CSA3180 NLP

  38. Joining Stages 1 and 2 • If the two transducers run in a cascade (i.e. we let the second transducer run on the output of the first one), we can do a morphological parse of (some) English noun phrases. • We can change also the direction of translation (in translation mode). • This transducer can also be used for generating a surface form from an underlying form. CSA3180 NLP

  39. Combining Rules • Consider the word “berries”. • Two rules are involved • berry + s • y → ie under certain circumstances. • Combinations of such rules can be handled in two ways • Cascade, i.e. sequentially • Parallel • Algorithms exist for combining transducers together in series or in parallel. • Such algorithms involve computations over regular relations. CSA3180 NLP

  40. FSA 3 Related Frameworks REGULAR LANGUAGES REGULAR EXPRESSIONS CSA3180 NLP

  41. Concatenation overFS Automata a c ⌣ b d a c = b d CSA3180 NLP

  42. REGULAR RELATIONS AUGMENTED REGULAR EXPRESSIONS FINITE STATE TRANSDUCERS REGULAR RELATIONS CSA3180 NLP

  43. Putting it all together execution of FSTi takes place in parallel CSA3180 NLP

  44. Kaplan and KayThe Xerox View FSTi are aligned but separate FSTi intersected together CSA3180 NLP

  45. Summary • Morphological processing can be handled by finite state machinery • Finite State Transducers are formally very similar to Finite State Automata. • They are formally equivalent to regular relations, i.e. sets of pairings of sentences of regular languages. CSA3180 NLP

  46. Exercises • Change the representation of automata that allow them to be given names. • Make the corresponding changes to the transducer. • Write a predicate which allows two named automata to be composed – i.e. the output of one becomes the input of the other CSA3180 NLP

  47. Simple Transducer in Prolog transduce1(Node,[ ],[ ]) :-    final(Node). transduce1(Node1,Tape1,Tape2) :-arc(Node1,Node2,Label),traverse1(Label, Tape1, NewTape1, Tape2, NewTape2),transduce1(Node2,NewTape1,NewTape2). CSA3180 NLP

  48. Traverse for FST traverse1(L1:L2, [L1|RestTape1], RestTape1, [L2|RestTape2], RestTape2). testtrans1(Tape1,Tape2) :-    initial(Node),    transduce1(Node,Tape1,Tape2). CSA3180 NLP

  49. Transducers can make jumps going from one state to another without doing anything on either one or on both of the tapes. So, transitions of the form a:# or #:a or #:# are possible. Transducers and Jumps CSA3180 NLP

  50. Handling Jumps:4 cases • Jump on both tapes. • Jump on the first but not on the second tape. • Jump on the second but not on the first tape. • Jump on neither tape (this is what traverse1 does). CSA3180 NLP

More Related