Analogy in morphology: Only a beginning

1 / 58

Analogy in morphology: Only a beginning - PowerPoint PPT Presentation

Analogy in morphology: Only a beginning. John Goldsmith The University of Chicago CNRS/MoDyCo. Analogy in grammar: Form and acquisition Max Planck Institute for Evolutionary Biology Leipzig September 2006. Outline of talk. Word segmentation problem

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'Analogy in morphology: Only a beginning' - otto

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Analogy in morphology:Only a beginning

John Goldsmith

The University of Chicago

CNRS/MoDyCo

Analogy in grammar:

Form and acquisition

Max Planck Institute for Evolutionary Biology

Leipzig September 2006

Outline of talk
• Word segmentation problem
• Minimum Description Length (MDL) framework
• Learning morphological structure: analogy takes us only so far

signature

Finite State Automaton

Input: inprincipioerailverbo

Language-independent

device

Output: in principio era il verbo

Word segmentation

Work by Michael Brent and by Carl de Marcken in the mid-1990s at MIT.

A lexicon Lis a pair of objects (L, pL ): a set L A*, and a probability distribution pL that is defined on A* for which L is the support of pL. We call L the words.

• We insist that A L: all individual letters are words.
• We define a language as a subset of L*; its members are sentences.
• Each sentence can be uniquely associated with an utterance (an element in A *) by a mapping F:

F:L*A*

S

If F(S) = U

then we say that S is a parse of U.

U

F:L*A*

S

U

We pull back the measure

from the space of letters to thespace of words.

Given an utterance U

The probability of a string of letters is the probabilityassigned to its best parse.

Our data is a finite string (“corpus”), generated by a finite alphabet;

We find the best parse for the string;

The probability of the parse is the product of the probability of its words;

The words are assigned a maximum likelihood probability of the simplest sort.

Results
• The Fulton County Grand Ju ry said Friday an investi gation of At l anta 's recent prim arye lectionproduc ed no e videnc e that any ir regul ar it i e s took place .
• Thejury further s aid in term - end present ment sthatthe City Ex ecutiveCommit t e e ,which had over - all charg eofthee lection , d e serv e s the pra is e and than ksofthe City of At l antafortheman ner in whichthee lection was conduc ted.

Chunks are too big

Chunks are too small

From Encarta: trained on the first 150 articles

La funzione originari a dell'abbigliament o fu for s e quell a di pro t egger e il corpo dalle av vers i tà del c li ma . Ne i paesi cal di uomini e donn e indoss ano gonn ell in i mor bi di e drappeggi ati e per i z om i . In generale gli abit ant i delle zon ecal d e non port ano ma i più di due stra t i di vestit i. Al contr ari o, nei luog h i dove il c li ma è più rigid o sono diffus i abiti ader enti e a più stra ti . C omun e alle due tradizion i è tuttavi a l' abitudin e di ricor re re a mantell i per ri par arsi dagli e le ment i.

3 major categories of failures of MDL word-discovery
• Many failures of word-discovery are correct discovery of morphemes (word-pieces) investi-gation, pro-t-egger-e.
• Many (thought fewer) failures of word-discovery are discovery of pairs of words that frequently appear together (for example, ofthe).
• Many failures are too short to be likely words.
As we add more linguistic sophistication to the class of models considered, MDL makes increasingly better predictions.
Part 2: Minimum Description Length (MDL) Analysis

Jorma Rissanen (1989) Stochastic complexity in statistical enquiry.

Synthetic apriori
• The mind’s construction of the world is its best understanding of what the senses provides it with.
• The real world is the one which is most probable, given our observations.

Bayesian,

maximum a posteriori reasoning

Bayes’ Rule

D = Data

H = Hypothesis

Bayes’ Rule

D = Data

H = Hypothesis

Definition

Define pr(A|B) = pr(A&B)/pr(B)

Bayes’ Rule

D = Data

H = Hypothesis

Definition

Definition

Bayes’ Rule

D = Data

H = Hypothesis

Definition

Definition

Bayes’ Rule

D = Data

H = Hypothesis

• we must find the hypothesis for which the following is a maximum:

D = Data

H = Hypothesis

How do we calculate the probability

of our hypothesis about what reality is?

How do we calculate the probability

of our observations, given our

understanding of reality?

rationalism

empiricism

How do we calculate the probability

of our hypothesis about what reality is?

How do we calculate the probability

of our observations, given our

understanding of reality?

Assign a (“prior”) probability to all hypotheses, based on their coherence.

Measure the coherence.

Call it an evaluation metric.

Insist that your grammars be probabilistic: they assign a probability to their generated output.

Kraft’s inequality: If grammars have the “prefix property” (guaranteed local punctuation), then we can assign pr(G) = 2-length(G)

Usage of MDL

If description length of data D, given model M, is equal to

the inverse log probability assigned to D by M +

compressed length of M, then

The process of word-learning is unambiguously one of increasing the probability of the data, and using the length of M as a stopping criterion.

Corpus:

jump, jumps, jumping

laugh, laughed, laughing

sing, sang, singing

the, dog, dogs

total: 62 letters

Analysis:

Stems: jump laugh sing sang dog (20 letters)

Suffixes: s ing ed (6 letters)

Unanalyzed: the (3 letters)

total: 29 letters.

Naïve MDL

3. Morphology

1st approximation: a morphology is:

a list of stems,

a list of affixes (prefixes, suffixes), and

a list of pointers indicating which combinations are permissible.

Unlike the word segmentation problem, now we have no obvious search heuristics.

These are very important (for that reason)—and I will not talk about them.

Model/heuristic

3. Morphology

Size of model

3. Morphology

M[orphology] =

{ Stems T, Affixes F, Signatures S }

stems

affixes

What is a signature,

and what is its length?

sig’s

extensivity

What is the length (=information content) of a signature?

A signature is an ordered pair of two sets of pointers: (i) a set of pointers to stems; and (ii) a set of pointers to affixes.

The length of a pointer p is –log freq (p):

So the total length of the signatures is:

Sum over signatures

Sum over stem ptrs

Generation 1 Linguistica

http://linguistica.uchicago.edu

Initial pass:

assumes that words are composed of 1 or 2 morphemes;

finds all cases where signatures exist with at least 2 stems and 2 affixes:

3. Morphology

Generation 1

3. Morphology

Then it refines this initial approximation in a large number of ways, always trying to decrease the description length of the initial corpus.

French roots

3. Morphology

3. Morphology

4. Detect allomorphy

Signature: <e>ion . NULL

composite concentrate corporate détente

discriminate evacuate inflate opposite

participate probate prosecute tense

What is this?

composite and composition

composite composit  composit + ion

It infers that iondeletes a stem-final ‘e’ before attaching.

4. Morphology

Swahili verb

4. Morphology

Swahili verb

Subject marker

4. Morphology

Swahili verb

Subject marker

Tense marker

4. Morphology

Swahili verb

Subject marker

Tense marker

Object marker

4. Morphology

Swahili verb

Subject marker

Object marker

Tense marker

Root

4. Morphology

Swahili verb

Subject marker

Object marker

Tense marker

Root

Voice (active/passive)

4. Morphology

Swahili verb

Subject marker

Object marker

Tense marker

Root

Voice (active/passive)

Finalvowel

Generalize the signature…

4. Morphology

Sequential FSA: each state has a unique successor.

Alignments

4. Morphology

Conclusion

Learning of morphology, based on form, requires an explicit structural model.

Information theoretic quantities appear to play a major role.

Analogy can play a very useful role in the establishment of learning heuristics