1 / 54

Finite State Morphology

Finite State Morphology. Alexander Fraser & Liane Guillou {fraser,liane}@cis.uni-muenchen.de CIS , Ludwig-Maximilians-Universität München Computational Morphology and Electronic Dictionaries SoSe 2016 2016-05-09. Outline. Today we will cover finite state morphology more formally

blairm
Download Presentation

Finite State Morphology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finite State Morphology Alexander Fraser & Liane Guillou {fraser,liane}@cis.uni-muenchen.de CIS, Ludwig-Maximilians-Universität München Computational Morphology and Electronic Dictionaries SoSe 2016 2016-05-09

  2. Outline • Today we will cover finite state morphology more formally • We'll review concepts from the first lecture and from the exercises • And define operations in finite state more formally • We will then show how to convert regular expressions to finite state automata

  3. Exercise • Please check the website tomorrow for the exercise location (may also be cancelled)

  4. Credits • Credits: • Slides mostly adapted from: • Finite State Morphology • Helmut Schmid • U. Tübingen - Summer Semester 2015 • Thanks also to Kemal Oflazer and Lauri Kartunnen

  5. Review: Computational Morphology • examineswordformationsprocesses • providesanalysesofwordforms such asTarifverhandlungen: Tarif<NN>verhandeln<V>ung<SUFF><+NN><Fem><Nom><Pl> • splitswordformsintorootsandaffixes • providesinformation on • part-of-speechsuch as NN, V • canonicalformssuch as „verhandeln“ • morphosyntacticpropertiessuch asFem, Nom, Pl

  6. Terminology • word formwordasitappears in a runningtext: weitergehst • lemmacitationaslisted in a dictionary: weitergehen • stempartof a wordtowhichderivationalofinflectionalaffixesareattached: weitergeh • rootstemwhichcannotbefurtheranalysed: geh • morphemesmallestmorphologicalunits (stems, affixes): weiter, geh, en

  7. Word Formation Processes • Inflection • Derivation • Compounding

  8. Inflection • modifies a word in order to express different grammaticalcategories such astense, mood, voice, aspect, person, number, gender, case • verbal inflection: conjugationwalks, walked, walking • nominal inflection: declensioncomputers • usuallyrealisedby • prefixation • suffixation • circumfixationge+hab+t • infixationauf+zu+machen (not a perfectexample) • reduplication: orang+orang (plural of „man“ in Indonesian)

  9. Derivation • createsnewwords • Examples: un+translat+abil+itypiti+less-ness • changesthepart-of-speechand/ormeaningoftheword • addsprefixes, suffixes, circumfixes • conversion: changesthepart-of-speechwithoutmodifyingthewordbook (N) → book (V) leid(en) (V) → Leid (N) • templaticmorphology in Arabicktb + CVCCVC + (a,a) -> kattab (write)

  10. Compounding • createsnewwordsbycombiningseveralstems • example: Donau-dampf-schiff-fahrts-gesellschaft • veryproductive in German • affixoidcompoundingprocessthatturnsinto a derivationprocess Gas+werk, Stück+werk, Laub+werk schul+frei, schulter+frei, schulden+frei → no absolute boundarybetweencompoundingandderivation

  11. ClassificationofLanguages • isolating: Chinese, Vietnameselittleornoderivationandinflection • analytic: Chinese, Englishlittleornoinflection • synthetic • agglutinative: Finnish, Turkish, Hungarian, Swahilimorphemesareconcatenatedwithlittlemodificationeachaffixusuallyencodes a singlefeature • fusional (inflecting): Sanskrit, Latin, Russian, Germaninflectionalaffixesoftenencode a featurebundle: les+e (1 sgpres)

  12. Productivity • productiveprocessnewwordformscaneasilybecreateduse+less, hope+less, point+less, beard+less • unproductiveprocess: morphologicalprocesswhichisnolongeractivestreng+th, warm+th, dep+th

  13. Morphotactics Whichmorphemescanbearranged in which order? translat+abil+ity *translat+ity+abil translat+able *translat+able+ity(Allomorphs able-abil)

  14. Orthographic/Phonological Rules Howis a morphemerealised in a certaincontext? city+s → cities bake+ing → baking (e-elision) crash+s → crashes (e-epenthesis) beg+ing → begging (gemination) ad+simil+ate → assimilate (assimilation) ip+lEr → iplerkız+lEr → kızlar (vowelharmony)

  15. MorphologicalAmbiguity leaveshangedhung leaf+N+plleave+N+pl leave+V+3+sg hang+V+past

  16. Ingredientsof a Morph. Analyser • List ofrootswithpart-of-speech • List ofderivationalaffixes • morphotacticrules • orthographic (phonological) rules

  17. ComputationalMorphology analysesand/orgenerateswordforms • analysisAbteilungen → Abteilung<NN><Fem><Nom><Pl>Abteilung<NN><Fem><Acc><Pl> …ab<VPART>teilen<V>ung<NNSuff><Fem><Acc><Pl> …Abtei<NN> Lunge<NN><Fem><Nom><Pl> …Abt<NN> Ei<NN> Lunge<NN><Fem><Nom><Pl> …Abt<NN> eilen<V> ung<NNSuff><Fem><Nom><Pl> … • generation sichern<+V><1><Sg><Pres><Ind> → sichere, sichre

  18. Implementation • using a mappingtableworksreasonably well forlanguages such as English, Chinese • algorithmicmoresuitableforlanguageswithcomplexmorphology such asTurkishor Czech • finite statetransducerssimple, well understood, efficient, bidirectional (analysis & generation)

  19. Short History 1968 Chomsky & Halle proposeorderedcontext-sensitive rewriterulesx → y / w _ z (replace x by y in thecontext w … z) 1972 C. Douglas Johnson discoversthatordered rewrite rules can be implemented with a cascade of FSTs if the rules are never applied to their own output 1961 Schützenbergerprovedthat 2 sequentialtransducers (wheretheoutputofthefirstformstheinputofthesecond) canbereplacedby a singletransducer. 1980 Kaplan & Kay rediscover the findings of Johnson and Schützenberger 1983KimmoKoskenniemiinvents 2-level-morphology 1987 Karttunen& Koskenniemiimplement the first FST compiler based on Kaplan’s implementation of the finite-state calculus

  20. Finite State Automaton directedgraphwithlabelledtransitions, a startstateand a setof final states w a l k i n g recogniseswalk, walks, walked, walking, talk, talks, talked, talking s e d t

  21. Finite State Automaton FSAs areisomorphictoregularexpressionsandregulargrammars. All ofthemdefine a regularlanguage. regularexpression: (w|t)alk(s|ed|ing)? regulargrammar: S → w A B → s B → S → t A B → e d A → a l k B B → i n g bothequivalenttotheautomaton on thepreviousslide

  22. Finite State Automaton FSAs areisomorphictoregularexpressionsandregulargrammars. All ofthemdefine a regularlanguage. regularexpression: (w|t)alk(s|ed|ing)? regulargrammar: S → w A B → s S → t A B → e d A → a l k B B → i n g B → Bothareequivalenttotheautomaton on thepreviousslide context-free regular type 0 context- sensitive

  23. Operations on FSAs • Concatenation A B • Optionality A? = (|A) • Kleene‘sstar A* = (|A|AA|AAA|…) • DisjunctionA | B • Conjunction A & B • Complement !A • Subtraction A – B = A & !B • Reversal

  24. From Regular Expressionsto FSAs singlesymbol a • Create a newstartstateand a new end state • Add a transitionfromthestarttothe end statelabelled „a“ a 2 1

  25. From Regular Expressionsto FSAs Concatenation A B • addepsilontransitionfrom final stateof A tostartstateof B • make final stateof B thenew final state ε 2 4 4 1 1 3 3 2

  26. From Regular Expressionsto FSAs Optionality A? • add an epsilontransitionfromstartto end state ε 2 2 1 1

  27. From Regular Expressionsto FSAs Kleene‘ star A* • add an epsilontransitionfrom end tostartstate • makestartstatethenew end state ε 2 1 1 2

  28. From Regular Expressionsto FSAs Disjunction A B • newstartstatewithepsilontransitionstotheoldstartstates • new final statewithepsilontransitionsfromtheold final states ε ε 2 4 6 ε ε 1 1 3 3 2 4 5

  29. From Regular Expressionsto FSAs Reversal • reverse all transitions • swapstartand end state 2 1 1 2

  30. From Regular Expressionsto FSAs Conjunction A & B • I'm skipping the details of conjunction (see the Appendix for the algorithm) • Basically, we can automatically create a new FSA that essentially runs both acceptors in parallel • Our new FSA only accepts if both FSAs are in the accept state • Clearly the FSA A&B then only accepts strings that are in the regular languages accepted by both FSAs (FSA A and FSA B)

  31. Properties of FSAs • epsilon-freenotransitionislabelledwiththeemptystringepsilon • deterministicepsilon-freeandnotwotransitionsoriginating in the same statehavethe same label • minimalnootherautomatonhas a smallernumberofstates

  32. Properties of FSAs II • We can algorithmically construct a new FSA from the old FSA such that it is: • epsilon-free • deterministic • minimal • See the Appendix for the algorithms

  33. Conclusion: Finite State Acceptors • Any regular expression can be mapped to a finite state acceptor • However, "regexes" in Perl are misnamed! • "Regexes" contain more powerful constructs than mathematical regular expressions • For instance /(.+)\1/ • However, these constructs are not used much • See EN Wikipedia page on regular expressions, subsection "Regular expressions in programming languages" for details • We will now move on to finite state transducers

  34. Finite State Transducers • FSTs are FSAs whosetransitionsarelabelledwithsymbolpairs • Theymapstringsto (setsof) otherstrings • maps walk, walks, walked, walkingto walk • and talk, talks, talked, talkingto talk (in generationmode) • can also map walk to walk, walks, walked, walking in analysismode s: w:w a:a l:l k:k i: n: g: e: d: t:t

  35. FSTs andRegular Expressions a:b Single symbolmappinga:b Operations on FSTs • Concatenation, Kleene's star, disjunction, conjunction, complement (from FSAs) • composition A || BThe outputoftransducer A istheinputoftransducer B. • projection • upper language replaces transition label a:b by b:b • lower language replaces transition label a:b by a:a The resultcorrespondsto an automaton

  36. WeightedTransducers • A weighted FST assigns a numericalweighttoeachtransition • The total weightof a string-to-stringmappingisthesumoftheweights on thecorrespondingpathfromstartto end state. • Weighted FSTs allowdisambiguationbetween different analysesbychoosingtheonewiththesmallest (orlargest) weight

  37. Working with FSTs • FSTs canbespecifiedbymeansofregularexpressions (like FSAs). The translationisperformedby a compiler. • Usingthe same algorithmsasfor FSA • FSTs canbemadeepsilon-free in the sense thatnotransitionislabelledwith ε:ε (a pair ofemptystringsymbols) • FSTs canbemadedeterministic in the sense thatnotwotransitionsoriginating in the same statehavethe same label pair • FSTs canbeminimised in the sense thatnoother FST whichproducesthe same regularrelationwiththe same input-outputalignmentissmaller.(Theremightbe a smallertransducerproducingthe same relationwith a different alignment.) • FSTs canbeused in bothdirections (generationandanalysis)

  38. FST Toolkits Some FST toolkits • Xerox finite-statetoolsxfstandlexcwell-suitedforbuildingmorphologicalanalysers • foma (Mans Hulden)open-source alternative toxfst/lexc • AT&T toolsweightedtransducersfortasks such asspeechrecognitionlittlesupportforbuildingmorphologicalanalysers • openFST (Google, NYU)open-source alternative tothe AT&T tools • SFSTopen-source alternative toxfst/lexc but using a moregeneraland flexible programminglanguage

  39. SFST • programminglanguagefordeveloping finite-statetransducers • compilerwhichtranslatesprogramstotransducers • toolsfor • applyingtransducers • printingtransducers • comparingtransducers

  40. SFST Example Session > echo "Hello\ World\!" > test.fst storing a smalltestprogram > fst-compiler test.fst test.acallingthecompiler test.fst: 2 > fst-mortest.ainteractivetransducerusage readingtransducer... transducerisloaded finished. analyze> Hello World! input Hello World! recognised analyze> Hello World anotherinput noresultforHello World not recognised analyze> q terminateprogram

  41. SFST Programming Language Colonoperatora:b emptystringsymbol<> Example: m:m o:i u:<> s:c e:e identitymapping a (an abbreviationfor a:a) Example: m o:i u:<> s:c e {abc}:{AB} isexpandedtoa:A b:B c:<> Example: {mouse}:{mice}

  42. Disjunction John | Mary | James acceptsthesethreestringsandmapsthemontothemselves mouse | {mouse}:{mice} analyses mouse and mice as mouse note that analysis here maps lower language (mice) to upper language (mouse), i.e., implements lemmatization Generation goes in the opposite direction

  43. Multi-Character Symbols stringsenclosed in <…> aretreatedas a singleunit. {mouse<N><pl>}:{mice} analyzes mice as mouse<N><pl>

  44. Multi-Character Symbols A morecomplexexample: schreib {<V><pres>}:{} (\{<1><sg>}:{e} |\{<2><sg>}:{st} |\{<3><sg>}:{t} |\{<1><pl>}:{en} |\{<2><pl>}:{t} |\{<3><pl>}:{en}) The backslashes (\) indicatethattheexpressioncontinues in thenextline Whatistheanalysisofschreibstandschreiben?

  45. Conclusion: Finite State Morphology • Talked about finite state morphology in a more formal way • Showed how to convert regular expressions to finite state automata • Talked about finite state transducers for computational morphology • Morphological analysis and generation

  46. Thank you for your attention

  47. Appendix • Details of Conjunction of FSAs • Algorithms for Determinisation, Composition and Minimisation of FSAs

More Related