1 / 23

CSA3050: NLP Algorithms

CSA3050: NLP Algorithms. Finite State Transducers for Morphological Parsing. Acknowledgement. This lecture is largely based on material from Jurafsky & Martin chapter 3. Resumé. FSAs are equivalent to regular languages

tejana
Download Presentation

CSA3050: NLP Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing Advanced Topics in NLP

  2. Acknowledgement • This lecture is largely based on material from Jurafsky & Martin chapter 3 Advanced Topics in NLP

  3. Resumé • FSAs are equivalent to regular languages • FSTs are equivalent to regular relations (over pairs of regular languages) • FSTs are like FSAs but with complex labels. • We can use FSTs to transduce between surface and lexical levels. Advanced Topics in NLP

  4. Morphological Parsing • Given the input cats, we’d like to outputcat +N +Pl, telling us that cat is a plural noun. • Given the Spanish input bebo, we’d like to outputbeber +V +PInd +1P +Sg telling us that bebo is the present indicative first person singular form of the Spanish verb beber, ‘to drink’. Advanced Topics in NLP

  5. Two-Level Paradigm from Jurafsky & Martin Advanced Topics in NLP

  6. English Plural Advanced Topics in NLP

  7. Morphological Anlayser To build a morphological analyser we need: • lexicon: the list of stems and affixes, together with basic information about them • morphotactics: the model of morpheme ordering (eg English plural morpheme follows the noun rather than a verb) • orthographic rules: these spelling rules are used to model the changes that occur in a word, usually when two morphemes combine (e.g., fly+s = flies) Advanced Topics in NLP

  8. Lexicon & Morphotactics • Typically list of word parts (lexicon) and the models of ordering can be combined together into an FSA which will recognise the all the valid word forms. • For this to be possible the word parts must first be classified into sublexicons. • The FSA defines the morphotactics (ordering constraints). Advanced Topics in NLP

  9. Sublexiconsto classify the list of word parts Advanced Topics in NLP

  10. FSA Expresses Morphotactics (ordering model) Advanced Topics in NLP

  11. Towards the Analyser • We can use lexc or xfst to build such an FSA (see lex1.lexc) • To augment this to produce an analysis we must create a transducer Tnum which maps between the lexical level and an "intermediate" level that is needed to handle the spelling rules of English. Advanced Topics in NLP

  12. Three Levels of Analysis Advanced Topics in NLP

  13. 1. Tnum: Noun Number Inflection • multi-character symbols • morpheme boundary ^ • word boundary # Advanced Topics in NLP

  14. Towards the Analyser • We do this by first allowing the lexicon itself to also have two levels. Since surface geese maps to lexical goose, the new lexical entry will be “g:g o:e o:e s:s e:e” (see lex2.lexc) • We must also add the appropriate morphological labels (see lex3.lexc) Advanced Topics in NLP

  15. Intermediate Form to Surface • The reason we need to have an intermediate form is that funny things happen at morpheme boundaries, e.g. cat^s  cats fox^s  foxes fly^s  flies • The rules which describe these changes are called orthographic rules or "spelling rules". Advanced Topics in NLP

  16. More English Spelling Rules • consonant doubling: beg / begging • y replacement: try/tries • k insertion: panic/panicked • e deletion: make/making • e insertion: watch/watches • Each rule can be stated in more detail ... Advanced Topics in NLP

  17. Spelling Rules • Chomsky & Halle (1968) invented a special notation for spelling rules. • A very similar notation is embodied in the "conditional replacement" rules of xfst. E -> F || L _ R which means replace E with F when it appears between left context L and right context R Advanced Topics in NLP

  18. A Particular Spelling Rule This rule does e-insertion ^ -> e || x _ s# Advanced Topics in NLP

  19. e insertion over 3 levels The rule corresponds to the mapping between surface and intermediate levels Advanced Topics in NLP

  20. e insertion as an FST Advanced Topics in NLP

  21. Incorporating Spelling Rules • Spelling rules, each corresponding to an FST, can be run in parallel provided that they are "aligned". • The set of spelling rules is positioned between the surface level and the intermediate level. • Parallel execution of FSTs can be carried out: • by simulation: in this case FSTs must first be aligned. • by first constructing a a single FST corresponding to their intersection. Advanced Topics in NLP

  22. Putting it all together execution of FSTi takes place in parallel Advanced Topics in NLP

  23. Kaplan and KayThe Xerox View FSTi are aligned but separate FSTi intersected together Advanced Topics in NLP

More Related