1 / 26

Composition is Our Friend

Composition is Our Friend. Wednesday PM Kenneth R. Beesley Xerox Research Centre Europe. View composition vertically. p a t + i n + a d + i m + a b. Underlying form. e -> i || _ .#. Rule 1. p a t + i n + a d + i m + a b. Intermediate form. d -> j, t -> c || _ (“+”) i. Rule 2.

shino
Download Presentation

Composition is Our Friend

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Composition is Our Friend Wednesday PM Kenneth R. Beesley Xerox Research Centre Europe

  2. View composition vertically p a t + i n + a d + i m + a b Underlying form e -> i || _ .#. Rule 1 p a t + i n + a d + i m + a b Intermediate form d -> j, t -> c || _ (“+”) i Rule 2 p a c + i n + a j + i m + a b Intermediate form b -> p, d -> t, g -> k || _ .#. Rule n p a c + i n + a j + i m + a p Final form

  3. View composition vertically p a t + i n + a d + i m + a b e -> i || _ .#. .o. A Single FST d -> j, t -> c || _ (“+”) i .o. b -> p, d -> t, g -> k || _ .#. p a c + i n + a j + i m + a p

  4. Composition is Our Friend The composition operation is often the key to building, modifying, filtering and testing finite-state systems.

  5. You Can Compose Transducers • Regular languages (and the networks that encode them) can be unioned, concatenated, intersected, subtracted and complemented. • Regular relations (and the transducers that encode them) can be unioned and concatenated. • But you cannot, in general, intersect, complement, or subtract transducers (relations). This is a mathematical restriction. Relations are not closed for these operations. • But you can compose transducers

  6. An Example for the Mathematicians • Regular relations are not closed under intersection (&), subtraction (-) or complementation (&). • This means that when you intersect, subtract or complement regular relations, the result may no longer be regular. I.e. the result may no longer be finite state, and so cannot be encoded as a finite-state network. • The following example is based on intersection

  7. Intersection of Two Finite-State Relations FST A: [ a:b ]* [ 0:c ]* FST B: [ 0:b ]* [ a:c ]* a:b 0:c 0:b a:c 0:c a:c • On the upper side, some number n of as • On the lower side, n bs, followed by any number of cs • On the upper side, some number n of as • On the lower side, any number of bs, followed by n cs

  8. Attempted Intersection of Two Finite-State Relations (FSTs) 00 0:b a:c a:b 0:c a bc a:c 0:c aa bbcc And the bncn language is known to be context-free in power (i.e. beyond finite-state power). The lower-side language of the resulting relation is bncn aaa bbbccc aaaa bbbbcccc

  9. Back Down to Earth • Just be aware that transducers cannot, in general, be intersected, subtracted, or complemented. • But transducers can be unioned, concatenated, and composed. • Composition is often the key operation for modifying, filtering, and combining transducers.

  10. Phonological/Orthographical Rules Lexicon FST (lexc) “Application” of rules via composition is already familiar to us. .o. Rule 1 .o. Rule 2 .o. Rule n

  11. Orthographical Modification via Composition Standard German spelling uses ü, ö, ä and ß. An alternative orthography, where these letters are not available, replaces them with “ue”, “oe”, “ae” and “ss” respectively. läßt StandardGermanFST with ü, ö , ä and ß on the lower side ModifiedGermanFST with ue, oe, ae and ss on the lower side .o. [ ü -> u e , ö -> o e , ä -> a e, ß -> s s ] laesst How would we modify StandardGermanFST to analyze both über and ueber, läßt and laesst and laeßt and lässt?

  12. Composition: top and bottom If you compose a rule on the bottom of an FST, it modifies only the lower-side language of the FST. CoreFST .o. Rule CoreFST .o. Rule If you compose a rule on the top of an FST, it modifies only the upper-side language of the FST. Rule .o. Rule .o. CoreFST CoreFST

  13. Change a Tagname on the Upper Side via Composition An example of composition on the upper side ... casa[Subst][Masc][Pl] “[Subst]” <- “[Noun]” .o. casa[Noun][Masc][Pl] Baseform+Tags language Core Lexicon casas surface-word language

  14. Simple Filtering to Facilitate Testing Take a “lexical transducer”, remove everything but adjectives. When a simple language is used in composition, it is automatically treated like an identity relation. $“[Adj]” .o. Baseform+Tags language Core Lexicon surface-word language

  15. Simple Filtering II Take a lexical transducer and remove the adjectives (leave the rest). ~$“[Adj]” .o. Baseform+Tags language Core Lexicon surface-word language

  16. Simple Filtering III • Take an English lexical transducer and restrict it to contain • Only adjectives • that end in -ly $”[Adj]” .o. Baseform+Tags language Core Lexicon friendly, lovely, cowardly, dastardly, … surface-word language .o. ?* l y

  17. Mindtuning for Finite-State Development • Try to imagine all the possible uses/users of your system. • Try to create a core system that may, by itself, serve nobody; but which, via filtering, may serve in multiple systems.. • If it seems that you have to decide between choice A and choice B, try to create a single core system, with one set of source files, that supports both A and B • Language dialects • Spelling dialects • Spelling relaxations

  18. Language Dialects: equivalent ways to start Multichar_Symbols ^A ^B +Sg +Pl LEXICON Root Nouns ; LEXICON Nouns jail^A:jail N ; gaol^B:gaol N ; dog N ; LEXICON N +Sg:0 # ; +Pl:s # ; LEXICON Root Nouns ; LEXICON Nouns < j a i l %^A:0 > N ; <g a o l %^B:0 > N ; dog N ; LEXICON N < %+Sg:0 > # ; < %+Pl:s > # ;

  19. One Core, Several Final Products To leave both American and British words in the lexicon, just remove the dialect tags, mapping them to the empty string. 0 <- %^A .o. 0 <- %^B .o. CommonCoreFST

  20. One Core, Several Products To leave just British (and common) words in the lexicon, filter out the exclusively American words. Two equivalent ways: 0 <- %^B .o. ~$[%^A] .o. CommonCoreFST 0 <- %^B .o. ~[?*] <- %^A .o. CommonCoreFST

  21. One Core, Several Products To leave just American (and common) words in the lexicon, filter out the exclusively British words. Two equivalent ways: 0 <- %^A .o. ~$[%^B] .o. CommonCoreFST 0 <- %^A .o. ~[?*] <- %^B .o. CommonCoreFST

  22. Vulgar/Slang/Substandard Use similar feature symbols on the lexical side, e.g. ^V for vulgar words ^S for slang ^D for substandard forms Then filter them out as necessary, via composition, for each version of the final product.

  23. Spelling Distinctions If one dialect makes a spelling distinction, and another ignores it, build your core system to show the distinction. lingüístico Adj ; This is the Spanish spelling used in Latin America. Then for Spain, where the ü is not used, modify the core trivially via composition on both sides: u <- ü .o. CommonCoreFST .o. ü -> u

  24. Spelling Relaxations, Accentuation Build your core system to reflect formally correct spelling. Then relax that spelling in some versions of your system via composition, e.g. to allow accents to be “dropped”. StandardSpanishFST .o. [ é (->) e , í (->) i , á (->) a , ó (->) o , ú (->) u , ü (->) u ]

  25. Relaxed German, accept ü or ue Standard German spelling uses ü, ö, ä and ß. You might want to accept them AND also ue, oe, ae and ss. StandardGermanFST .o. [ ü (->) u e , ö (->) o e , ä (->) a e , ß (->) s s ]

  26. Summary: About Choices • When it appears that you have to make a choice (dialect, orthography, register, etc.) between A and B, always try to make a common “core” system that is the basis for • Choice A alone • Choice B alone • Choice A and B • Composition is often the key to modifying a common core system for a variety of uses. • The failure to abstract and generalize is a sign of a finite-state beginner.

More Related