240 likes | 366 Views
This resource explores the fundamentals of English morphology and its application within Natural Language Processing (NLP). It covers how word parts combine to form whole words, the processes of inflectional and derivational morphology, and the role of finite state transducers (FSTs) in analyzing morphological structures. Key concepts include morphemes, affixation, and morphotactics, supported by examples and formal definitions of FSTs and their operations. This overview serves as an essential guide for understanding morphological parsing in computational linguistics.
E N D
CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers CSA3050: NLP Algorithms
Acknowledgement For further details see Jurafsky & Martin Ch.3 CSA3050: NLP Algorithms
Morphology • Morphology is the study of how word-parts combine to form word wholes. • Several different dimensions: • Orthographic - rules for combining strings of characters together. • Syntax - effect on syntactic category. • Semantic - effect on meaning. CSA3050: NLP Algorithms
Examples ofMorphological Processes • Affixation • prefix • suffix • circumfix: German ge + stem + te.g. sagen, gesagt • infix: unbloodylikely • Vowel change: swim/swam • Consonant change: send/sent CSA3050: NLP Algorithms
Inflectional+s plural+ed past category preserving productive: always applies (esp. new words, e.g. fax) systematic: same semantic effect Derivational+ment category changingescape+ment not completely productive: detractment* not completely systematic: catchment Inflectional/DerivationalMorphology CSA3050: NLP Algorithms
English Inflectional Morphology • Applies to nouns, verbs and adjectives only • Number of inflections relatively small • Nouns • Plural, Possessive • Verbs • Verb forms • Adjectives • Comparison CSA3050: NLP Algorithms
Noun Inflections CSA3050: NLP Algorithms
Regular Verb Inflections CSA3050: NLP Algorithms
Irregular Verb Inflections CSA3050: NLP Algorithms
Morphological Parsing Output Analysis cat + PL Input Word cats Morphological Parser • Output is a string of morphemes • Reversibility? CSA3050: NLP Algorithms
Morphological Parsing: Examples CSA3050: NLP Algorithms
Morphemes • Morpheme is a theoretical contruct ... • but has a practical use • Choice of morpheme vocabulary: theoretical and practical motivation • Distinction between underlying morpheme and its realisation. • String of morphemes could be turned into another representation later CSA3050: NLP Algorithms
Morphological Parsing Requires • Lexicon: list of stems and affixes + related information (e.g syntactic category) • Morphotactics: a model of ordering constraints over morphemes (e.g. the fact that +s comes after the stem not before). • Correspondences between input and output strings • SpellingRules: city + s cities CSA3050: NLP Algorithms
Lexicon • Lexicon is generally divided into sublexicons • Stem Lexicon • Noun Stems • Verb Stems • etc • Suffix Lexicon • Prefix Lexicon • Can all be represented as FSAs CSA3050: NLP Algorithms
FSA for Sublexicon Fragment o t h e s a e i t s CSA3050: NLP Algorithms
FSA for Morphotactics forNoun Inflection CSA3050: NLP Algorithms
Morphotactics for Verb Inflection CSA3050: NLP Algorithms
Input/Output Correspondences • Problem: how to specify correspondence between input word, and output analysis. • Given: both input and output are strings. • Two level morphology (Koskenniemi 1983) proposes • Surface Tape (words) • Lexical Tape (concatenation of morphemes) CSA3050: NLP Algorithms
2 Level Model The automaton used to perform the mapping Between these levels is the finite state transducer (FST). CSA3050: NLP Algorithms
Basic FS Transducer • Each transition of a transducer is labelled with a pair of symbols • Input symbols are matched against the lower-side symbols on transitions. • If analysis succeeds, return the string of upper-side symbols output symb input symb CSA3050: NLP Algorithms
C A T +N +PL e C A T S Morphological Analysis { ("CATS", "CAT+N+PL"), ("CAT", "CAT+N+SG") } CSA3050: NLP Algorithms
FST Formal Definition • States, initial state, final states: same as FSA • Alphabets I and O are input and output alphabets, not necessarily disjoint. • FST Alphabet Σ I x O • Transition function δ(q, i:o), defines the state q' that ensues when the machine is in state q and encounters complex symbol i:o. CSA3050: NLP Algorithms
FST Alphabet Example I x O O a:c a:a a:t a:ε c a t ε c:c c:ac:t c:ε Σ I ':c ':a ':t ':ε ' t:c t:a t:t t:ε CSA3050: NLP Algorithms
Summary • Morphological processing can be handled by finite state machinery • Finite State Transducers are formally very similar to Finite State Automata. • They are formally equivalent to regular relations, i.e. sets of pairings of sentences of regular languages. CSA3050: NLP Algorithms