1 / 26

CSA4050 Advanced Topics in NLP

CSA4050 Advanced Topics in NLP. Non-Concatenative Morphology Reduplication Interdigitation. Reference. Ken Beesely and Lauri Karttunen, Finite State Non-Concatenative Morphotactics, Proceedings of SIGPHON-2000. Koskenniemi 1983.

reece
Download Presentation

CSA4050 Advanced Topics in NLP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSA4050 Advanced Topicsin NLP Non-Concatenative Morphology Reduplication Interdigitation Computational Morphology VI

  2. Reference Ken Beesely and Lauri Karttunen, Finite State Non-Concatenative Morphotactics, Proceedings of SIGPHON-2000 Computational Morphology VI

  3. Koskenniemi 1983 "Only restricted infixation and reduplication can be handled adequately with the present system. Some extensions or revisions will be necessary for an adequate description of languages possessing extensive infixation or reduplication" Computational Morphology VI

  4. Non-Concatenative Languages • Most languages build words by stringing together morphemes like beads on a string. • The word-building processes of prefixation and suffixation can be straightforwardly modeled in finite state terms by concatenation. • But some languages also exhibit non-concatenative morphotactics. Computational Morphology VI

  5. Non-Concatenative Phenomena1. Reduplication • In Malaybagi (bag)bagi-bagi (bags) • Although this may appear concatenative, it does not involve concatenating a predictible morpheme – like "s". Instead the entire stem is copied no matter what its length. • In general language class (ww | w  L) is context sensitive, but if L is finite, we can construct an FS network that encodes it. Computational Morphology VI

  6. General Solution for Reduplication • Therefore, assuming the number of words subject to reduplication is finite, it is possible to construct a lexical transducer for languages like Malay. • To handle reduplication, a new operator ^n is introduced: • A^n denotes n concatenations of A. Computational Morphology VI

  7. Remarks from Beesleyon Context Sensitivity • finite-state grammars (cannot handle unlimited nesting or non-nested terminal dependencies) • context-free (can handle unlimited nesting, suchas matched parentheses in arithmetic expressions, but cannot handle non-nested dependencies between terminals) • context-sensitive (can also handle non-nesteddependencies between terminals, as indogdogwhere terminal elements 1 and 4 have to bethe same, 2 and 5 have to be the same, and3 and 6 have to be the same.  These dependenciescross, so they're not nested. Computational Morphology VI

  8. Non-Concatenation 2. Interdigitation • In Arabic and Maltese, prefixes and suffixes attach to stems in the usual concatenative way, but stems themselves are formed by a process known as interdigitation. • An example of occurs with the Arabic stem "katab" (wrote). • This stem is composed of three elements • the all consonant rootktb • an abstract consonant-vowel template CVCVC • a vocalisationaa (in this case signifying perfect tense and active voice) Computational Morphology VI

  9. Interdigitation • The same root ktb can combine with the same template CVCVC and a different vocalism ui (signifying imperfect aspect and passive voice) to produce "kutib" (was written). • The same root ktb can combine with a different template CVVCVC and the vocalism ui to produce "kuutib" – another form of the verb. Computational Morphology VI

  10. Intermediate Result:Template + Root d v v r v s Computational Morphology VI

  11. Final Result:Intermediate Result + Vocalism d u u r i s Computational Morphology VI

  12. Merge • In this case the filler language contains an infinite set of strings (i, ui, uui …) but only one path can be constructed because all strings end in i. Hence the earlier vowels must be "u". • This need not always be the case (eg if the filler language were u*i*). Computational Morphology VI

  13. Merge Operators • To introduce the merge operation into the Xerox calculus new operators, .<m. and .m>. have been introduced. • These differ only in the order of arguments. • [T .<m. F] and [F .m>. T] represent the same merge operation with F and T as filler and template respectively. Computational Morphology VI

  14. The Composite Transducer • With these operators the network above can be compiled by using the following expression:[d r s] .m>. [C V V C V C] .<m. [u* i] Computational Morphology VI

  15. i u Merge template c v v c v c vocalism root d r s Computational Morphology VI

  16. Compile-Replace • Regular expressions are compiled into networks as usual, but in addition, • the compiler is then appliedto its own output. • Central idea: • transduce to a language that has the format of regular expressions. • The compile-replace algorithm then replaces the regular expression with the result of its own compilation. Computational Morphology VI

  17. 0:^[ a * 0:^] Compile Replace Simple Example This network maps the string a* to ^[ a* ^] (i.e. the same RE but with special delimiters) Application of CR to the lower side of the network eliminates the markers, compile the RE a* and maps the upper side to to the language resulting from the compilation. Computational Morphology VI

  18. *:a 0:a a *:0 a:0 *:0 The result of compiling ^[ a* ^] • To answer the question: what does this network do? • Figure out what it does in upward and downward • directions Computational Morphology VI

  19. *:a 0:a a *:0 a:0 *:0 The result of compiling ^[ a* ^] When applied in the upward direction, this transducer maps any string of the infinite a* language into the regular expression from which it was compiled. When applied in the downward direction, it maps from a* to all the strings in the language a*, {0, a, aa, ...} Computational Morphology VI

  20. 0:^[ a * 0:^] a:a *:* Compile-Replace: 1 • Copy input path to output path until ^[ is encountered on indicated (in our case lower) side of the network. • Extract path until closing delimiter ^]. Computational Morphology VI

  21. a * a Compile-Replace: 2 • Symbols along indicated side are concatenated into a string and eliminated from the path leaving just the symbols on the opposite side. The remaining net is • The extracted string is compiled into a second network using the standard network compiler Computational Morphology VI

  22. *:a 0:a a *:0 a:0 *:0 a * a Compile-Replace: 3 • The 2 networksare combined together using the cross product operator. • The result • is spliced between the origin and destination states of the regular expression path. Computational Morphology VI

  23. Reduplication Revisited • Applying compile-replace to this transducer Lexical: b a g i +Noun +Plural Surface: ^[ [b a g i] ^ 2 ^] • yields this one Lexical: b a g i +Noun +Plural Surface: b a g i b a g i Computational Morphology VI

  24. Interdigitation Revisited • Applying compile-replace to this transducerUp: k i t e b +Verb +Past +3SgDo:[k t b] .m>. [C V C V C] .<m. [i e] • yields this oneUp: k i t e b +Verb +Past +3Sg Do: k i t e b Computational Morphology VI

  25. Remember: Two Central Problems • Morphotactics: constraints on combinations of morphemes governing the formation of valid words. unbelievable vs. believeunable • Phonological/Orthographical Alternation (spelling rules):how morphemes are realised in particular environmentsfly + s = flies Computational Morphology VI

  26. Xerox Perspective • Morphotactics: handle with lexc • Phonological/Orthographical Alternation (spelling rules):handle with xfst lexc Morphotactics Lexicon FST Lexical Transducer .o. xfst Alternations Rules FST Computational Morphology VI

More Related