morphology l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Morphology PowerPoint Presentation
Download Presentation
Morphology

Loading in 2 Seconds...

play fullscreen
1 / 108

Morphology - PowerPoint PPT Presentation


  • 708 Views
  • Uploaded on

Morphology. Sudeshna Sarkar IIT Kharagpur. Morphology Morphology – analysis and synthesis - computational aspects. Morphology. Morphology is the field of linguistics that studies the internal structure of words

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Morphology' - elina


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
morphology

Morphology

Sudeshna Sarkar

IIT Kharagpur

slide2
Morphology
  • Morphology – analysis and synthesis - computational aspects
morphology3
Morphology
  • Morphology is the field of linguistics that studies the internal structure of words
  • How words are built up from smaller meaningful units called morphemes (morph = shape, logos = word)
  • We can usefully divide morphemes into two classes
    • Stems: The core meaning bearing units
    • Affixes: Bits and pieces that adhere to stems to change their meanings and grammatical functions
      • Prefix: un-, anti-, etc (a- ati- pra- etc)
      • Suffix: -ity, -ation, etc ( -taa, -ke, -ka etc)
      • Infix: are inserted inside the stem
        • Tagalog: um + hingi humingi
      • Circumfixes – precede and follow the stem
  • Turkish can have words with a lot of suffixes (agglutinative language) Many indian languages also have agglutinative suffixes
examples english
Examples (English)
  • “unladylike”
    • 3 morphemes, 4 syllables

un- ‘not’

lady ‘(well behaved) female adult human’

-like ‘having the characteristics of’

    • Can’t break any of these down further without distorting the meaning of the units
  • “technique”
    • 1 morpheme, 2 syllables
  • “dogs”
    • 2 morphemes, 1 syllable

-s, a plural marker on nouns

Modified from Dorr and Habash (after Jurafsky and Martin)

examples bengali
Examples (Bengali)
  • “chhelederTaakei”
    • 5 morphemes

chhele ‘boy’

-der ‘plural genitive’

-Taa ‘classifier’

-ke ‘dative’

-i ‘emphasizer’

Can’t break any of these down further without distorting the meaning of the units

  • “atipraakrritake”

ati-

praakrrita

-ke

Modified from Dorr and Habash (after Jurafsky and Martin)

morpheme definitions
Morpheme Definitions
  • Root
    • The portion of the word that:
      • is common to a set of derived or inflected forms, if any, when all affixes are removed
      • is not further analyzable into meaningful elements
      • carries the principle portion of meaning of the words
  • Stem
    • The root or roots of a word, together with any derivational affixes, to which inflectional affixes are added.
  • Affix
    • A bound morpheme that is joined before, after, or within a root or stem.
  • Clitic
    • a morpheme that functions syntactically like a word, but does not appear as an independent phonological word
      • Spanish: un beso, las aguas, English: Hal’s (genetive marker)

Modified from Dorr and Habash (after Jurafsky and Martin)

inflectional derivational morphology
Inflectional & Derivational Morphology
  • We can also divide morphology up into two broad classes
    • Inflectional
    • Derivational
inflectional morphology
Inflectional Morphology
  • Inflection:
    • Variation in the form of a word, typically by means of an affix, that expresses a grammatical contrast.
      • Doesn’t change the word class
      • Usually produces a predictable, nonidiosyncratic change of meaning.
      • Serves a grammatical/semantic purpose different from the original

After a combination with an inflectional morpheme,

the meaning and class of the actual stem usually do not change.

    • eat / eats pencil / pencils
    • helaa / khele / khelchhila bai / baiTAke / baiyera
inflectional morphology9
Inflectional Morphology
  • Adds:
    • tense, number, person, mood, aspect
  • Word class doesn’t change
  • Word serves new grammatical role
  • Examples
    • come is inflected for person and number:

The pizza guy comes at noon.

    • las and rojas are inflected for agreement with manzanas in grammatical gender by -a and in number by –s

las manzanas rojas (‘the red apples’)

Modified from Dorr and Habash (after Jurafsky and Martin)

derivational morphology
Derivational Morphology
  • Derivation:
    • The formation of a new word or inflectable stem from another word or stem.
  • After a combination with an derivational morpheme, the meaning and the class of the actual stem usually change.
    • compute / computer do / undo friend / friendly
    • Uygar / uygarlaş kapı /kapıcı
    • udaara (J) / udaarataa (N)
    • bhadra / abhadra
    • baayu / baayabiiya
  • Irregular changes may happen with derivational affixes.
derivational morphology11
Derivational Morphology
  • Nominalization (formation of nouns from other parts of speech, primarily verbs in English):
    • computerization
    • appointee
    • killer
    • fuzziness
  • Formation of adjectives (primarily from nouns)
    • computational
    • clueless
    • Embraceable

Modified from Dorr and Habash (after Jurafsky and Martin)

concatinative morphology
Concatinative Morphology
  • Morpheme+Morpheme+Morpheme+…
  • Stems: also called lemma, base form, root, lexeme
    • hope+ing  hoping hop  hopping
  • Affixes
    • Prefixes: Antidisestablishmentarianism
    • Suffixes: Antidisestablishmentarianism
    • Infixes: hingi (borrow) – humingi (borrower) in Tagalog
    • Circumfixes: sagen (say) – gesagt (said) in German
  • Agglutinative Languages
    • uygarlaştıramadıklarımızdanmışsınızcasına
    • uygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casına
    • Behaving as if you are among those whom we could not cause to become civilized

Modified from Dorr and Habash (after Jurafsky and Martin)

templatic morphology
Templatic Morphology
  • Roots and Patterns
    • Example: Hebrew verbs
    • Root:
      • Consists of 3 consonants CCC
      • Carries basic meaning
    • Template:
      • Gives the ordering of consonants and vowels
      • Specifies semantic information about the verb
        • Active, passive, middle voice
    • Example:
      • lmd (to learn or study)
        • CaCaC -> lamad (he studied)
        • CiCeC -> limed (he taught)
        • CuCaC -> lumad (he was taught)

Modified from Dorr and Habash (after Jurafsky and Martin)

syntax and morphology
Syntax and Morphology
  • Phrase-level agreement
    • Subject-Verb
      • John studies hard (STUDY+3SG)
    • Noun-Adjective
      • Las vacas hermosas
  • Sub-word phrasal structures
    • שבספרינו
    • ש+ב+ספר+ים+נו
    • That+in+book+PL+Poss:1PL
    • Which are in our books

Modified from Dorr and Habash (after Jurafsky and Martin)

surface and lexical forms
Surface and Lexical Forms
  • The surface level of a word represents the actual spelling

of that word.

    • geliyorum eats cats kitabım
  • The lexical level of a word represents a simple concatenation

of morphemes making up that word.

    • gel +PROG +1SG
    • eat +AOR
    • cat +PLU
    • kitap +P1SG
  • Morphological processors try to find correspondences between lexical and surface forms of words.
    • Morphological recognition/ analysis – surface to lexical
    • Morphological generation/ synthesis – lexical to surface
morphology morphemes order
Morphology: Morphemes & Order
  • Handles what is an isolated form in written text
  • Grouping of phonemes into morphemes
    • sequence deliverables ~deliver, able and s(3 units)
  • Morpheme Combination
    • certain combinations/sequencing possible, other not:
      • deliver+able+s, but not able+derive+s; noun+s, but not noun+ing
      • typically fixed (in any given language)
morphological parsing
Morphological Parsing
  • Morphological parsing is to find the lexical form of a word

from its surface form.

    • cats -- cat +N +PLU
    • cat -- cat +N +SG
    • goose -- goose +N +SG or goose +V
    • geese -- goose +N +PLU
    • gooses -- goose +V +3SG
    • catch -- catch +V
    • caught -- catch +V +PAST or catch +V +PP
    • AsachhilAma AsA+PROG+PAST+1st I/We was/were coming
  • There can be more than one lexical level representation

for a given word. (ambiguity)

flies flyVERB+PROG

flyNOUN+PLU

mAtAla

kare

slide18
The history of morphological analysis dates back to the ancient Indian linguist Pāṇini, who formulated the 3,959 rules of Sanskrit morphology in the text Aṣṭādhyāyī by using a Constituency Grammar.
formal definition of the problem
Formal definition of the problem
  • Surface form: The word (ws) as it occurs in the text. [sings]

ws L  Σ+

  • Lexical form: The root word(s) (r1, r2, …) and other grammatical features (F). [sing,v,+sg,+3rd ]

wl {Σ+,}+F+

wl  Δ+

analysis synthesis
Analysis & Synthesis
  • Morphological Analysis: Maps a string from surface form to corresponding lexical form.

fMA:Σ+  Δ+

  • Morphological Synthesis: Maps a string from lexical form to surface form.

fMA:Δ+ Σ+

relationship between ma ms
Relationship between MA & MS

fMSfMA(ws) = ws

fMAfMS(wl) = wl

fMS = fMA, fMA = fMS

But is that really the case?

-1

-1

slide22
Fly + s  flys  flies (y i rule)
  • Duckling

Go-getter  get + er

Doer  do + er

Beer  ?

What knowledge do we need?

How do we represent it?

How do we compute with it?

knowledge needed
Knowledge needed
  • Knowledge of stems or roots
    • Duck is a possible root, not duckl

We need a dictionary (lexicon)

  • Only some endings go on some words
    • Do + er ok
    • Be + er – not ok
  • In addition, spelling change rules that adjust the surface form
    • Get + er – double the t getter
    • Fox + s – insert e – foxes
    • Fly + s – insert e – flys – y to i – flies
    • Chase + ed – drop e - chased
put all this in a big dictionary lexicon
Put all this in a big dictionary (lexicon)
  • Turkish – approx 600  106 forms
  • Finnish – 107
  • Hindi, Bengali, Telugu, Tamil?
  • Besides, always novel forms can be constructed
    • Anti-missile
      • Anti-anti-missile
        • Anti-anti-anti-missile
          • ……..
  • Compounding of words – Sanskrit, German
morphology from morphemes to lemmas categories
Morphology: From Morphemes to Lemmas & Categories
  • Lemma: lexical unit, “pointer” to lexicon
    • typically is represented as the “base form”, or “dictionary headword”
      • possibly indexed when ambiguous/polysemous:
        • state1 (verb), state2 (state-of-the-art), state3 (government)
    • from one or more morphemes (“root”, “stem”, “root+derivation”, ...)
  • Categories: non-lexical
    • small number of possible values (< 100, often < 5-10)
morphology level the mapping
Morphology Level: The Mapping
  • Formally: A+  2(L,C1,C2,...,Cn)
    • A is the alphabet of phonemes (A+ denotes any non-empty sequence of phonemes)
    • L is the set of possible lemmas, uniquely identified
    • Ci are morphological categories, such as:
      • grammatical number, gender, case
      • person, tense, negation, degree of comparison, voice, aspect, ...
      • tone, politeness, ...
      • part of speech (not quite morphological category, but...)
    • A, L and Ci are obviously language-dependent
morphological analysis cont
Morphological Analysis (cont.)
  • Relatively simple for English.
  • But for many Indian languages, it may be more difficult.

Examples

Inflectional and Derivational Morphology.

  • Common tools: Finite-state transducers
  • A transducer maps a set/string of symbols to another set/string of symbols
a simpler problem
A simpler problem
  • Linear concatenation of morphemes with possible spelling changes at the boundary and a few irregular cases.
  • Quite practical assumptions
    • English, Hindi, Bengali, Telugu, Tamil, French, Turkish …
    • Exceptions: Semitic languages, Sanskrit
computational morphology
Computational Morphology
  • Approaches
    • Lexicon only
    • Rules only
    • Lexicon and Rules
      • Finite-state Automata
      • Finite-state Transducers

Modified from Dorr and Habash (after Jurafsky and Martin)

computational morphology30
Computational Morphology
  • Systems
    • WordNet’s morphy
    • PCKimmo
      • Named after Kimmo Koskenniemi, much work done by Lauri Karttunen, Ron Kaplan, and Martin Kay
      • Accurate but complex
      • http://www.sil.org/pckimmo/
    • Two-level morphology
      • Commercial version available from InXight Corp.
  • Background
    • Chapter 3 of Jurafsky and Martin
    • A short history of Two-Level Morphology
      • http://www.ling.helsinki.fi/~koskenni/esslli-2001-karttunen/

Modified from Dorr and Habash (after Jurafsky and Martin)

finite state machines
Finite State Machines
  • FSAs are equivalent to regular languages
  • FSTs are equivalent to regular relations (over pairs of regular languages)
  • FSTs are like FSAs but with complex labels.
  • We can use FSTs to transduce between surface and lexical levels.
can fsas help
Can FSAs help?

Reg-noun

Plural (-s)

Q0

Q1

Q2

Irreg-pl-noun

Irreg-sg-noun

what s this for
What’s this for?

un

Adj-root

Q0

Q1

Q2

Q3

-er -est -ly

ε

un?ADJ-ROOT{er | est | ly}?

morphotactics
Morphotactics
  • The last two examples basically model some parts of the English morphotactics
  • But where is the information about regular and irregular roots?

LEXICON

  • Can we include the lexicon in the FSA?
the english pluralization fsa

Reg-noun

Plural (-s)

Q0

Q1

Q2

Irreg-pl-noun

Irreg-sg-noun

The English Pluralization FSA
after adding a mini lexicon
After adding a mini-lexicon

a

s

g

u

b

s

Q1

Q2

Q0

d

o

g

m

a

n

n

e

elegance power
Elegance & Power
  • FSAs are elegant because
    • NFA  DFA
    • Closed under Union, Intersection, Concatenation, Complementation
    • Traversal is always linear on input size
    • Well-known algorithms for minimization, determinization, compilation etc.
  • They are powerful because they can capture
    • Linear morphology
    • Irregularities
slide38
But…

FSAs are language recognizer/generator.

We need transducers to build

Morphological Analyzer (fMA) &

Morphological Synthesizers (fMS)

finite state transducers
Finite State Transducers

Surface form

Finite State Machine

Lexical form

formal definition
Formal Definition
  • A 6-tuple {Σ,Δ,Q,δ,q0,F}
    • Σ is the (finite) set of input symbols
    • Δ is the (finite) set of output symbols
    • Q is the set (FINITE) of states
    • δ is the transition function Q Σ to Q  Δ
    • q0 Q is the start state
    • F  Q is the set of accepting states
an example fst
An example FST

a:a

s:ε

g:g

b:b

u:u

s:s

Q1

Q2

Q0

d:d

o:o

g:g

a:a

m:m

n:n

n:n

e:a

the lexicon fst
The Lexicon FST

a:a

s:+Pl

g:g

#:+Sg

b:b

u:u

s:s

Q1

Q2

Q0

d:d

o:o

g:g

#:+Sg

a:a

n:n

m:m

Q3

e:a

#:+Pl

n:n

Q4

ways to look at fsts
Ways to look at FSTs
  • Recognizer of a pair of strings
  • Generator of a pair of strings
  • Translator from one regular language to another
  • Computer of a relation – regular relation
questions about fsts
Questions about FSTs
  • Suppose T1 and T2 are two FSTs. Which of the following are FSTs (i.e. regular relations)?
    • Union: T1 T2
    • Composition: T1 T2
    • Intersection: T1 T2
  • If R is a regular relation, is R-1 regular?
questions about fsts45
Questions about FSTs
  • Suppose T1 and T2 are two FSTs. Which of the following are FSTs (i.e. regular relations)?
    • Union: T1 T2 YES
    • Composition: T1 T2 YES
    • Intersection: T1 T2 NOT in general
  • If R is a regular relation, is R-1 regular?  YES
invertibility
Invertibility
  • Given T = {Σ,Δ,Q,δ,q0,F}
  • Construct T-1 = {Δ,Σ,Q,δ-1,q0,F}

such that if δ(x,q)  (y,q’)

then δ-1(y,q)  (x,q’)

where, x Σand y  Δ

compositionality
Compositionality
  • T1 = {Σ, X, Q1,δ1,q1,F1} & T2 = {X, Δ, Q2,δ2,q2,F2}
  • DefineT3 = {Σ, Δ, Q3,δ3,q3,F3}

such that Q3 = Q1 Q2

q3 = (q1, q2)

δ3 ((q,s), i) = ((q’,s’),o) if

c s.t δ1 (q, i) = (q’,c) andδ2 (s, c) = (s’,o)

modelling orthographic rules
Modelling Orthographic Rules
  • Spelling changes in morpheme boundaries
    • bus+s  buses, watch+s  watches
    • fly+s  flies
    • make+ing  making
  • Rules
    • E-insertion takes place if the stem ends in s, z, ch, sh etc.
    • y maps to ie when pluralization marker s is added
rewrite rules
Rewrite Rules
  • Chomsky and Halle (1968)
  • General form:

ab / λ__ ρ

  • E-insertion:

ε e / {x,s,z,ch,sh…}^ __ s#

  • Kay and Kaplan (1994) showed that FSTs can be compiled from general rewrite rules
two level morphology koskenniemi 1983
Two-level Morphology (Koskenniemi, 1983)

lexical

LEXICON FST

intermediate

FST1

FSTn

orthographic rules

surface

a single fst for ma and ms

b

b

u

u

s

s

e

e

s

s

A Single FST for MA and MS

b

u

s

+N

+Pl

b

u

s

+N

+Pl

LEXICON FST

Morphology FST

b

u

s

^

s

#

FST1

FSTn

orthographic rules

can we do without the lexicon
Can we do without the lexicon
  • Not really!
  • But for some applications we might need to know the stem only
  • Surface form  Stem [Stemming]
  • Porter Stemming algorithm (1980) is a very popular technique that does not use lexicon.
other issues
Other Issues
  • How to formulate the rewrite rules?
  • How to ensure coverage?
  • What to do for unknown roots?
  • Is it possible to learn morphology of a language in supervised/unsupervised manner?
  • What about non-linear morphology?
references
References
  • Chapter 3, pp 57-89

Speech and Language Processing by D. Jurafsky & J. H. Martin, Pearson Education Asia, 2002 (2000)

Slides based on the chapter

  • Chapter 2, pp70

Natural Language Understanding by J. Allen, Pearson Education, 2003 (1995)

Slide by Monojit Choudhury

morphological anlayser
Morphological Anlayser

To build a morphological analyser we need:

  • lexicon: the list of stems and affixes, together with basic information about them
  • morphotactics: the model of morpheme ordering (eg English plural morpheme follows the noun rather than a verb)
  • orthographic rules: these spelling rules are used to model the changes that occur in a word, usually when two morphemes combine (e.g., fly+s = flies)
lexicon morphotactics
Lexicon & Morphotactics
  • Typically list of word parts (lexicon) and the models of ordering can be combined together into an FSA which will recognise the all the valid word forms.
  • For this to be possible the word parts must first be classified into sublexicons.
  • The FSA defines the morphotactics (ordering constraints).
towards the analyser
Towards the Analyser
  • We can use lexc or xfst to build such an FSA (see lex1.lexc)
  • To augment this to produce an analysis we must create a transducer Tnum which maps between the lexical level and an "intermediate" level that is needed to handle the spelling rules of English.
1 t num noun number inflection
1. Tnum: Noun Number Inflection
  • multi-character symbols
  • morpheme boundary ^
  • word boundary #
intermediate form to surface
Intermediate Form to Surface
  • The reason we need to have an intermediate form is that funny things happen at morpheme boundaries, e.g.

cat^s  cats

fox^s  foxes

fly^s  flies

  • The rules which describe these changes are called orthographic rules or "spelling rules".
more english spelling rules
More English Spelling Rules
  • consonant doubling: beg / begging
  • y replacement: try/tries
  • k insertion: panic/panicked
  • e deletion: make/making
  • e insertion: watch/watches
  • Each rule can be stated in more detail ...
spelling rules
Spelling Rules
  • Chomsky & Halle (1968) invented a special notation for spelling rules.
  • A very similar notation is embodied in the "conditional replacement" rules of xfst.

E -> F || L _ R

which means replace E with F when it appears between left context L and right context R

a particular spelling rule
A Particular Spelling Rule

This rule does e-insertion

^ -> e || x _ s#

e insertion over 3 levels
e insertion over 3 levels

The rule corresponds to the mapping between

surface and intermediate levels

incorporating spelling rules
Incorporating Spelling Rules
  • Spelling rules, each corresponding to an FST, can be run in parallel provided that they are "aligned".
  • The set of spelling rules is positioned between the surface level and the intermediate level.
  • Parallel execution of FSTs can be carried out:
    • by simulation: in this case FSTs must first be aligned.
    • by first constructing a a single FST corresponding to their intersection.
parsing generation vs recognition
Parsing/Generation vs. Recognition
  • Recognition is usually not quite what we need.
    • Usually if we find some string in the language we need to find the structure in it (parsing)
    • Or we have some structure and we want to produce a surface form (production/generation)
  • Example
    • From “cats” to “cat +N +PL”and back
morphological parsing73
Morphological Parsing
  • Given the input cats, we’d like to outputcat +N +Pl, telling us that cat is a plural noun.
  • Given the Spanish input bebo, we’d like to outputbeber +V +PInd +1P +Sg telling us that bebo is the present indicative first person singular form of the Spanish verb beber, ‘to drink’.
morphological anlayser74
Morphological Anlayser

To build a morphological analyser we need:

  • lexicon: the list of stems and affixes, together with basic information about them
  • morphotactics: the model of morpheme ordering (eg English plural morpheme follows the noun rather than a verb)
  • orthographic rules: these spelling rules are used to model the changes that occur in a word, usually when two morphemes combine (e.g., fly+s = flies)
lexicon morphotactics75
Lexicon & Morphotactics
  • Typically list of word parts (lexicon) and the models of ordering can be combined together into an FSA which will recognise the all the valid word forms.
  • For this to be possible the word parts must first be classified into sublexicons.
  • The FSA defines the morphotactics (ordering constraints).
towards the analyser78
Towards the Analyser
  • We can use lexc or xfst to build such an FSA (see lex1.lexc)
  • To augment this to produce an analysis we must create a transducer Tnum which maps between the lexical level and an "intermediate" level that is needed to handle the spelling rules of English.
1 t num noun number inflection80
1. Tnum: Noun Number Inflection
  • multi-character symbols
  • morpheme boundary ^
  • word boundary #
intermediate form to surface81
Intermediate Form to Surface
  • The reason we need to have an intermediate form is that funny things happen at morpheme boundaries, e.g.

cat^s  cats

fox^s  foxes

fly^s  flies

  • The rules which describe these changes are called orthographic rules or "spelling rules".
more english spelling rules82
More English Spelling Rules
  • consonant doubling: beg / begging
  • y replacement: try/tries
  • k insertion: panic/panicked
  • e deletion: make/making
  • e insertion: watch/watches
  • Each rule can be stated in more detail ...
spelling rules83
Spelling Rules
  • Chomsky & Halle (1968) invented a special notation for spelling rules.
  • A very similar notation is embodied in the "conditional replacement" rules of xfst.

E -> F || L _ R

which means replace E with F when it appears between left context L and right context R

a particular spelling rule84
A Particular Spelling Rule

This rule does e-insertion

^ -> e || x _ s#

e insertion over 3 levels85
e insertion over 3 levels

The rule corresponds to the mapping between

surface and intermediate levels

incorporating spelling rules87
Incorporating Spelling Rules
  • Spelling rules, each corresponding to an FST, can be run in parallel provided that they are "aligned".
  • The set of spelling rules is positioned between the surface level and the intermediate level.
  • Parallel execution of FSTs can be carried out:
    • by simulation: in this case FSTs must first be aligned.
    • by first constructing a a single FST corresponding to their intersection.
putting it all together
Putting it all together

execution of FSTi

takes place in parallel

kaplan and kay the xerox view
Kaplan and Kay: The Xerox View

FSTi are aligned

but separate

FSTi intersected

together

finite state transducers90
Finite State Transducers
  • The simple story
    • Add another tape
    • Add extra symbols to the transitions
    • On one tape we read “cats”, on the other we write “cat +N +PL”, or the other way around.
transitions

+N:ε

+PL:s

c:c

a:a

t:t

Transitions
  • c:c means read a c on one tape and write a c on the other
  • +N:ε means read a +N symbol on one tape and write nothing on the other
  • +PL:s means read +PL and write an s
typical uses
Typical Uses
  • Typically, we’ll read from one tape using the first symbol on the machine transitions (just as in a simple FSA).
  • And we’ll write to the second tape using the other symbols on the transitions.
ambiguity
Ambiguity
  • Recall that in non-deterministic recognition multiple paths through a machine may lead to an accept state.
    • Didn’t matter which path was actually traversed
  • In FSTs the path to an accept state does matter since differ paths represent different parses and different outputs will result
ambiguity96
Ambiguity
  • What’s the right parse for
    • Unionizable
    • Union-ize-able
    • Un-ion-ize-able
  • Each represents a valid path through the derivational morphology machine.
ambiguity97
Ambiguity
  • There are a number of ways to deal with this problem
    • Simply take the first output found
    • Find all the possible outputs (all paths) and return them all (without choosing)
    • Bias the search so that only one or a few likely paths are explored
the gory details
The Gory Details
  • Of course, its not as easy as
    • “cat +N +PL” <-> “cats”
  • As we saw earlier there are geese, mice and oxen
  • But there are also a whole host of spelling/pronunciation changes that go along with inflectional changes
    • Cats vs Dogs
    • Fox and Foxes
multi tape machines
Multi-Tape Machines
  • To deal with this we can simply add more tapes and use the output of one tape machine as the input to the next
  • So to handle irregular spelling changes we’ll add intermediate tapes with intermediate symbols
generativity
Generativity
  • Nothing really privileged about the directions.
  • We can write from one and read from the other or vice-versa.
  • One way is generation, the other way is analysis
multi level tape machines
Multi-Level Tape Machines
  • We use one machine to transduce between the lexical and the intermediate level, and another to handle the spelling changes to the surface tape
intermediate to surface
Intermediate to Surface
  • The add an “e” rule as in fox^s# <-> foxes#
slide105
Note
  • A key feature of this machine is that it doesn’t do anything to inputs to which it doesn’t apply.
  • Meaning that they are written out unchanged to the output tape.
  • Turns out the multiple tapes aren’t really needed; they can be compiled away.
overall scheme
Overall Scheme
  • We now have one FST that has explicit information about the lexicon (actual words, their spelling, facts about word classes and regularity).
    • Lexical level to intermediate forms
  • We have a larger set of machines that capture orthographic/spelling rules.
    • Intermediate forms to surface forms