1 / 16

CSA4050: Advanced Topics in NLP

CSA4050: Advanced Topics in NLP. Computational Morphology II Introduction 2 Level Morphology. The Problem. So far we have assumed that words are formed by strict concatenation of component morphemes as in the following example: en + large + ment + s

Download Presentation

CSA4050: Advanced Topics in NLP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSA4050: Advanced Topicsin NLP Computational Morphology II Introduction 2 Level Morphology

  2. The Problem • So far we have assumed that words are formed by strict concatenation of component morphemes as in the following example:en + large + ment + s • This assumption is convenient because it imposes a 1:1 correspondence between segmentation of the string and lookup of lexical items (which may be different types e.g. roots, affixes, particles etc) • The problem is that this is an unrealistic assumption to make. CSA405 Lecture 2lev

  3. English Spelling Rules • Final consonant doublingbegin + ing = beginning • s to eschurch + s = churches • y to i carry + ed = carried • Final e deletionrake + ing = raking • n to min + practical = impractical CSA405 Lecture 2lev

  4. dhalt dhalt dahal dahlet dhalna dhaltu dahlu Deletion of vowel Changes or insertion of vowel Non-concatenative morphology [in examples h should be crossed] Semitic Languages CSA405 Lecture 2lev

  5. Handling Spelling Rules • Such phenomena usually occur at morpheme boundaries, and prevent direct lookup of the surface string in the lexicon. • The solution is to suppose that two strings are involved: • The surface string: that which appears on the page • The lexical string: that which is used to index items in the lexicon. • What kind of mapping exists between the two strings? CSA405 Lecture 2lev

  6. Lexical Transformations SURFACE STRING LEXICAL STRING CSA405 Lecture 2lev

  7. Phonological Rules • Morphological rules are a reflection of phonological changes. • Assumption: lexical/surface transformation is rule governed. • Phonological rules systems had been extensively studied from the point of view of generative linguistics under Chomsky during the 1970s CSA405 Lecture 2lev

  8. Typical Phonological Rule • Typical rule has the following shapePhon1 -> Phon2//Lcontext __ Rcontext • Meaning: Phoneme Phon1 is transformed to phoneme Phon2 if it occures between left context Lcontext and right context Rcontext • Example[B] -> [P] // __ # • B is pronounced like P if it is word final (cf kelb) CSA405 Lecture 2lev

  9. Properties of Phonological Rules within the Generative Tradition • Rules are rewrite rules • Rules apply sequentially • Rules are ordered • Rules may act upon their own output (cyclic rules) • Effects of rules are not always reversible • Collections of rules have Turing power CSA405 Lecture 2lev

  10. C. Douglas Johnson (1972) • A theory of phonology with the right properties could be implemented using only finite state machinery. • Each rule is associated with a finite state transducer (FST). • All rules operated in simultaneously, thus eliminating the delicate problems of ordering associated with sequential cascades of rules. • The collection of FS rules operating in parallel is mathematically equivalent to a single FST representing the intersection of the component FSTs • Johnson’s work was mainly theoretical. He was not involved with computational issues, in particular the issue of computing the intersection of multiple FSTs. CSA405 Lecture 2lev

  11. FS Automaton For recognition and generation of regular languages. All operations over regular languages have corresponding operations over corresponding FSAs FS Transducer Like FSAs but with output as well as input For recognition and generation of regular relations. Some operations over regular languages do not have corresponding operations over corresponding FSTs Finite State Machinery CSA405 Lecture 2lev

  12. Kimmo Koskenniemi (1983) • Worked on morphology of Finnish and came up with a system of finite state transducers. • Came up with a computational framework for executing collections of finite state transducers in parallel. CSA405 Lecture 2lev

  13. Koskenniemi’s Model SURFACE STRING Interpreter executes round-robin keeping FSTs in lock-step before moving head FST1 FST2 FST3 … FSTn LEXICAL STRING CSA405 Lecture 2lev

  14. Martin Kay and Ron Kaplan (1981) • Kay and Kaplan (both at Xerox PARC) were very interested in the computational issues underlying morphological processing. • In particular, they studied the problems of • How to combine FSTs in parallel (computing the intersection of regular relations) • How to combine FSTs in series (computing the composition of FSTs). • Restrictions on rules have pleasant consequences CSA405 Lecture 2lev

  15. Restrictions on Rules • With the restriction that a rule shall not apply to its own output, Kaplan and Kay showed that the result of combining the corresponding relations under the under the operations of intersection, composition and union remains within a closed subclass of those computable by FSTs. • They then spent many years designing and implementing a calculus for describing and combining FSTs based upon regular expressions. CSA405 Lecture 2lev

  16. Summary Chomsky Generative Tradition Generative Phonology Johnson Parallel Rules Multilevel Cascades of Rules Koskiniemmi Parallel Rules KIMMO PC-Kimmo Xerox Tools xfst/twolc/lexc Kaplan/Kay Calculus CSA405 Lecture 2lev

More Related