1 / 26

CSA3050: NLP Algorithms

CSA3050: NLP Algorithms. Finite State Transducers for Morphological Parsing. Resumé. FSAs are equivalent to regular languages FSTs are equivalent to regular relations (over pairs of regular languages) FSTs are like FSAs but with complex labels.

kaipo
Download Presentation

CSA3050: NLP Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing CSA3050: NLP Algorithms

  2. Resumé • FSAs are equivalent to regular languages • FSTs are equivalent to regular relations (over pairs of regular languages) • FSTs are like FSAs but with complex labels. • We can use FSTs to transduce between surface and lexical levels. CSA3050: NLP Algorithms

  3. f o x g:g o:e o:e s:s e:e f:f o:o x:x Dotted Pair Notation 1) FSA recogniser for "fox" 2) FST transducers for fox/fox; goose/geese CSA3050: NLP Algorithms

  4. g o:e o:e s e Dotted Pair Notation (2) • By convention, x:y pairs lexical symbol x with surface symbol y • By convention, within the context of FSTs, we often encounter "default pairs" of the form x:x. These are often written as "x". CSA3050: NLP Algorithms

  5. FSA for Number Inflection How can we augment this to produce an analysis? CSA3050: NLP Algorithms

  6. 3 Steps • Create a transducer Tnum for noun number inflection. This will add number and category information given word classes as input. • Create a transducer Tstems mapping words to word classes. • Hook the two together. CSA3050: NLP Algorithms

  7. Tnum example “lexical”  +N +PL reg-noun-stem ^ s # reg-noun-stem “intermediate” CSA3050: NLP Algorithms

  8. 1. Tnum: Noun Number Inflection • multi-character symbols • morpheme boundary ^ • word boundary # CSA3050: NLP Algorithms

  9. Tstems example “intermediate” # reg-noun-stem Tstems d:d o:o g:gf:f o:o x:x # “surface” CSA3050: NLP Algorithms

  10. Tstems example “intermediate” # irreg-pl-noun-form Tstems m o:i u:ε s es h e e p # “surface” CSA3050: NLP Algorithms

  11. 2. Tstems Lexicon CSA3050: NLP Algorithms

  12. Hooking Together • There are two ways to hook the two transducers together • Cascading: hooking the output of one transducer with the input of the other and running them in series. • Composition: composing the two transducers together mathematically to create a third, equivalent transducer. CSA3050: NLP Algorithms

  13. # +N ^ +PL s reg-noun-stem reg-noun-stem Hooking Together: cascading lexical Tnum intermediate Tstems dogfox #  s surface CSA3050: NLP Algorithms

  14. Composition of Relations • Let R and S be binary relations. • The composition of R and S written R S is defined as: • (a,c)  R S if and only if(a,b)  R and (b,c)  Sfor all a,b,c • Transducers can also be composed CSA3050: NLP Algorithms

  15. Tnum o Tstem CSA3050: NLP Algorithms

  16. English Spelling Rules • consonant doubling: beg / begging • y replacement: try/tries • k insertion: panic/panicked • e deletion: make/making • e insertion: watch/watches • Each rule can be stated in more detail ... CSA3050: NLP Algorithms

  17. e Insertion Rule • Insert an e on the surface tape just when the lexical tape has morpheme ending in x,s,z,or ch and the next and final morpheme is -s • Stated formally  e [x|s|z|ch]^ __ s# CSA3050: NLP Algorithms

  18. e insertion over 3 levels The rule corresponds to the mapping between surface and intermediate levels CSA3050: NLP Algorithms

  19. e insertion as an FST CSA3050: NLP Algorithms

  20. Incorporating Spelling Rules • Spelling rules, each corresponding to an FST, can be run in parallel provided that they are "aligned". • The set of spelling rules is positioned between the surface level and the intermediate level. • Parallel execution of FSTs can be carried out: • by simulation: in this case FSTs must first be aligned. • by first constructing a a single FST corresponding to their intersection. CSA3050: NLP Algorithms

  21. Putting it all together execution of FSTi takes place in parallel CSA3050: NLP Algorithms

  22. Kaplan and KayThe Xerox View FSTi are aligned but separate FSTi intersected together CSA3050: NLP Algorithms

  23. Operations over FSTs • We can perform operations over FSTs which yield other FSTs. • Inversion • Union • Composition • The inversion of T, or T-1 simply computes the inverse mapping to T. CSA3050: NLP Algorithms

  24. Inversion c a t ^ PL c a t ^ PL lexical lexical T-1 T surface surface c a t s c a t s CSA3050: NLP Algorithms

  25. Inversion • To invert a transducer • we switch the order of the complex symbols, i.e. every i:o becomes o:i • or we leave the transducer alone, and slightly change the parsing algorithm. • Practical consequences: • Transducer is reversible • We can use the exactly the same transducer to perform either analysis or generation. CSA3050: NLP Algorithms

  26. Closure Properties of FSTs Relations computed by FSTs are • closed under • inversion • union • composition • not closed (in general) under • intersection. However intersection is possible provided that we restrict the class of transducers. • complementation • subtraction CSA3050: NLP Algorithms

More Related