Morphology from a computational point of view. March 2001. Today. Minimal Edit Distance, and Viterbi more generally; Letter to Sound What is morphology? Finite-state automata Finite-state phonological rules. 1. What is morphology?. Study of the internal structure of words:
Study of the internal structure of words:
Any high-level linguistic analysis: syntactic parser
speech recognition, text-to-speech (TTS)
information retrieval (IR)
An empirical fact:
AP newswire: mid-Feb – Dec 30 1988
Nearly 300,000 words.
“New” words that appeared on Dec 31 1988:
compounds: prenatal-care, publicly-funded, channel-switching, owner-president, logic-loving, part-Vulcan, signal-emitting, landsite, government-aligned, armhole, signal-emitting...
dumbbells, groveled, fuzzier, oxidized
ex-presidency, puppetry, boulderlike, over-emphasized, hydrosulfite, outclassing, non-passengers, racialist, counterprograms, antiprejudice, re-unification, traumatological, refinancings, instrumenting, ex-critters, mega-lizard
This is often called the OOV problem(“out of vocabulary”).
If we work out the principles of word-formation, we will simultaneously:
Problem: take text, in standard spelling, and produce a sequence of phonemes which can be synthesized by the “backend”.
Severe problems: Proper names (persons, places), OOV words
boathouse B OW1 T H AU2 S
Take a sound file (e.g., *.wav) and produce a list of words in standard orthography.
Bill Clinton is a recent ex-president.
If someone says it, we need to figure out what the word was.
door, dog, jump, -ing, -s, to
More controversial morphemes
sing/sang: s-ng + i/a
cut/cut: cut + PAST
Analytic (isolating) languages:
Synthetic (inflecting) languages:
talo 'the-house' kaup-pa 'the-shop'
talo-ni 'my house' kaup-pa-ni 'my shop'
talo-ssa 'in the-house' kaup-a-ssa 'in the-shop'
talo-ssa-ni 'in my house’ kaup-a-ssa-ni 'in my shop'
talo-i-ssa 'in the-houses’ kaup-o-i-ssa 'in the-shops'
talo-i-ssa-ni 'in my houses’ kaup-o-i-ssa-ni 'in my shops'
Courtesy of Bucknell Univ. web page
Nominative (Subject) hort-us hort-i
Genitive (of) hort-i hort-rum
Dative (for/to) hort-o hort-is
Accusative (Direct Obj) hort-um hort-us
Vocative (Call) hort-e hort-i
Ablative (from/with) hort-o hort-is
Derivational morphology: creates one lexeme from another
compute > computer > computerize > computerization
Inflectional morphology: creates the form of a lexeme that’s right for a sentence:
the nominative singular form of a noun; or the past 3rd person singular form of a verb.
In many languages (unlike English), constellations of word-forms forming a lexeme demand the recognition of a basic stem which does not stand freely as a word:
Italian ragazzo, ragazzi (boy, girl)
ragazzi, ragazze (boys, girls)
Compounds are composed of 2 (or more) words or stems
Compounds: hot dog, White House, bookstore, cherry-covered
English has a lot of derivational morphology and relatively little inflectional morphology
English verb’s inflectional forms:
bare stem, -s, -ed, -ing
Not uncommon for a verb to have 30 to 50+ forms:
marking tense, person and number of the subject
Derivational morphology usually consists of adding a prefix or suffix to a base (= stem).
The base has a lexical category (it is a noun, verb, adjective), and the suffix typically assigns a different category to the whole word.
-ness: suffix that takes
an adjective, & makes a noun.
un interest ing
English (and some other languages) permit the collapsing together of common words. In some extremely rare cases, only the collapsed form exists (English possessive ’s).
He will arrive tonight > he’ll arrive…
The [King of England]’s children
Nouns: -NULL, -s, -’s
Verbs: -NULL, s, -ed, -ing
(so-called weak verbs)
Strong verbs: 3 major groups
a. Internal verb change (sing/sang, drive/drove/driven, dive/dove)
b. –t suffix, typically with vowel-shortening dream/dreamt, sleep/slept
c. –aught replacement: catch, teach, seek,
This morphology creates new words, by adding prefixes or suffixes.
It is helpful to divide them into two groups, depending on whether they leave the pronunciation of the base unchanged or not.
There are, as always, some fuzzy cases.
ize, ization, al, ity, al, ic, al, ity, ion, y (nominaliz-ing), al, ate, ous, ive, ation
Can attach to non-word stems (fratern-al, paternal; parent-al)
Typically change stress and vowel quality of stem
Never precede Level 1 suffixes
Never change stress pattern or vowel quality
Almost always attach to words that already exist
hood, ness, ly, s, ing, ish, ful, ly, ize, less, y (adj.)
look interest add claim mark extend demand remain want succeed record offer represent cover return end explain follow help belong attempt talk fear happen assault account point award appeal train contract result request staff view fail kick visit confront attack comment sponsor
paper retain improvement missile song truth doctor indictment window conductor dick misunderstanding struggle stake tank belief cafeteria material mind operator bassi lot movement chain notion marriage dancer scholarship reservoir sweet right battalion hold mr shot cardinal athletic revenue duel confrontation solo talent guest shoe russian commitment average monk election street roger rifle worker area plane pinch-hitter dozen browning conclusion teacher narcotic appearance alternative dealer producer mile stock shrine sometime bag successor career mistake ankle weapon model front spotlight rhode pace debate payment requirement fairway consultation chip dollar employer thank mustang rocket-bomb hat string precinct robert employee action detective pressure measure spirit forbid hitter breast yankee partner floor member
increase tie hole associate reserve price fire receive challenge rate purchase propose feature celebrate decide suite single change sculpture combine privilege pledge issue frame indicate believe damage include use aide graduate surprise intervene practice trouble serve oppose promise charge note schedule continue raise decline cause operate emphasize relieve hope share judge birdie produce exchange
NULL.ed.er.ing.s report turn walk park pick flow
NULL.d.ment enforce announce engage arrange replace improve encourage
NULL.n.s rose low take law drive rise undertake
NULL.al intern profession logic fat tradition extern margin jurisdiction historic education promotion constitution addition sensation roy ration origin classic convention
NULL.man sand news police states gross sun fresh sports boss sales 3- patrol bonds
ed.er.ing slugg manag crush publish robb
NULL.ity.s major senior moral hospital
NULL.ry hung mason ave summit scene surge rival forest
NULL.a.s indian kind american
Just a little difference:
The best things in life
-er –est -ly
Stop states: 2,3
-er –est -ly
Figure 3.5 p. 69
Yet a third way: rows in an array(to-column can consist of pointers)
Stop states: 2,4,5
The symbols of the FST are complex: they’re really pairs of symbols, one for each of two “tapes” or levels.
Recognizer: decides if a given pair of representations fits together “OK”
Generator: generates pairs of representations that fit together
Translator: takes a representation on one level and produces the appropriate representation on the other level
fox^s#…we get to q1 with ‘x’
fox^s#…we get to q2 with ‘^’
fox^s#…we can get to q3
fox^s#…we also get to q5 with ‘s’
but we don’t want to!
?friend^ship, ?fox^s^s (= foxes’s)
fox^s#…we also get to q5 with ‘s’
but we don’t want to!
arizona: we leave q0 but return
m i s s ^ s