analogy in morphology only a beginning l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Analogy in morphology: Only a beginning PowerPoint Presentation
Download Presentation
Analogy in morphology: Only a beginning

Loading in 2 Seconds...

play fullscreen
1 / 58

Analogy in morphology: Only a beginning - PowerPoint PPT Presentation


  • 146 Views
  • Uploaded on

Analogy in morphology: Only a beginning. John Goldsmith The University of Chicago CNRS/MoDyCo. Analogy in grammar: Form and acquisition Max Planck Institute for Evolutionary Biology Leipzig September 2006. Outline of talk. Word segmentation problem

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Analogy in morphology: Only a beginning' - otto


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
analogy in morphology only a beginning

Analogy in morphology:Only a beginning

John Goldsmith

The University of Chicago

CNRS/MoDyCo

Analogy in grammar:

Form and acquisition

Max Planck Institute for Evolutionary Biology

Leipzig September 2006

outline of talk
Outline of talk
  • Word segmentation problem
  • Minimum Description Length (MDL) framework
  • Learning morphological structure: analogy takes us only so far
slide3

signature

Finite State Automaton

slide5

Input: inprincipioerailverbo

Language-independent

device

Output: in principio era il verbo

word segmentation
Word segmentation

Work by Michael Brent and by Carl de Marcken in the mid-1990s at MIT.

A lexicon Lis a pair of objects (L, pL ): a set L A*, and a probability distribution pL that is defined on A* for which L is the support of pL. We call L the words.

  • We insist that A L: all individual letters are words.
  • We define a language as a subset of L*; its members are sentences.
  • Each sentence can be uniquely associated with an utterance (an element in A *) by a mapping F:
slide8

F:L*A*

S

If F(S) = U

then we say that S is a parse of U.

U

slide9

F:L*A*

S

U

We pull back the measure

from the space of letters to thespace of words.

different lexicons lead to different probabilities of the data
Different lexicons lead to different probabilities of the data

Given an utterance U

The probability of a string of letters is the probabilityassigned to its best parse.

class of models originally studied in the word segmentation problem
Class of models originally studied in the word segmentation problem

Our data is a finite string (“corpus”), generated by a finite alphabet;

We find the best parse for the string;

The probability of the parse is the product of the probability of its words;

The words are assigned a maximum likelihood probability of the simplest sort.

results
Results
  • The Fulton County Grand Ju ry said Friday an investi gation of At l anta 's recent prim arye lectionproduc ed no e videnc e that any ir regul ar it i e s took place .
  • Thejury further s aid in term - end present ment sthatthe City Ex ecutiveCommit t e e ,which had over - all charg eofthee lection , d e serv e s the pra is e and than ksofthe City of At l antafortheman ner in whichthee lection was conduc ted.

Chunks are too big

Chunks are too small

from encarta trained on the first 150 articles
From Encarta: trained on the first 150 articles

La funzione originari a dell'abbigliament o fu for s e quell a di pro t egger e il corpo dalle av vers i tà del c li ma . Ne i paesi cal di uomini e donn e indoss ano gonn ell in i mor bi di e drappeggi ati e per i z om i . In generale gli abit ant i delle zon ecal d e non port ano ma i più di due stra t i di vestit i. Al contr ari o, nei luog h i dove il c li ma è più rigid o sono diffus i abiti ader enti e a più stra ti . C omun e alle due tradizion i è tuttavi a l' abitudin e di ricor re re a mantell i per ri par arsi dagli e le ment i.

3 major categories of failures of mdl word discovery
3 major categories of failures of MDL word-discovery
  • Many failures of word-discovery are correct discovery of morphemes (word-pieces) investi-gation, pro-t-egger-e.
  • Many (thought fewer) failures of word-discovery are discovery of pairs of words that frequently appear together (for example, ofthe).
  • Many failures are too short to be likely words.
slide15
As we add more linguistic sophistication to the class of models considered, MDL makes increasingly better predictions.
part 2 minimum description length mdl analysis
Part 2: Minimum Description Length (MDL) Analysis

Jorma Rissanen (1989) Stochastic complexity in statistical enquiry.

synthetic apriori
Synthetic apriori
  • The mind’s construction of the world is its best understanding of what the senses provides it with.
  • The real world is the one which is most probable, given our observations.

Bayesian,

maximum a posteriori reasoning

bayes rule
Bayes’ Rule

D = Data

H = Hypothesis

bayes rule19
Bayes’ Rule

D = Data

H = Hypothesis

Definition

Define pr(A|B) = pr(A&B)/pr(B)

bayes rule20
Bayes’ Rule

D = Data

H = Hypothesis

Definition

Definition

bayes rule21
Bayes’ Rule

D = Data

H = Hypothesis

Definition

Definition

bayes rule22
Bayes’ Rule

D = Data

H = Hypothesis

if reality is the most probable hypothesis given the evidence
If reality is the most probable hypothesis, given the evidence...
  • we must find the hypothesis for which the following is a maximum:

D = Data

H = Hypothesis

How do we calculate the probability

of our hypothesis about what reality is?

How do we calculate the probability

of our observations, given our

understanding of reality?

rationalism

empiricism

slide24

How do we calculate the probability

of our hypothesis about what reality is?

How do we calculate the probability

of our observations, given our

understanding of reality?

Assign a (“prior”) probability to all hypotheses, based on their coherence.

Measure the coherence.

Call it an evaluation metric.

Insist that your grammars be probabilistic: they assign a probability to their generated output.

Kraft’s inequality: If grammars have the “prefix property” (guaranteed local punctuation), then we can assign pr(G) = 2-length(G)

usage of mdl
Usage of MDL

If description length of data D, given model M, is equal to

the inverse log probability assigned to D by M +

compressed length of M, then

The process of word-learning is unambiguously one of increasing the probability of the data, and using the length of M as a stopping criterion.

na ve mdl
Corpus:

jump, jumps, jumping

laugh, laughed, laughing

sing, sang, singing

the, dog, dogs

total: 62 letters

Analysis:

Stems: jump laugh sing sang dog (20 letters)

Suffixes: s ing ed (6 letters)

Unanalyzed: the (3 letters)

total: 29 letters.

Naïve MDL

3. Morphology

model heuristic
1st approximation: a morphology is:

a list of stems,

a list of affixes (prefixes, suffixes), and

a list of pointers indicating which combinations are permissible.

Unlike the word segmentation problem, now we have no obvious search heuristics.

These are very important (for that reason)—and I will not talk about them.

Model/heuristic

3. Morphology

size of model
Size of model

3. Morphology

M[orphology] =

{ Stems T, Affixes F, Signatures S }

stems

affixes

What is a signature,

and what is its length?

sig’s

extensivity

what is the length information content of a signature
What is the length (=information content) of a signature?

A signature is an ordered pair of two sets of pointers: (i) a set of pointers to stems; and (ii) a set of pointers to affixes.

The length of a pointer p is –log freq (p):

So the total length of the signatures is:

Sum over signatures

Sum over stem ptrs

generation 1 linguistica
Generation 1 Linguistica

http://linguistica.uchicago.edu

Initial pass:

assumes that words are composed of 1 or 2 morphemes;

finds all cases where signatures exist with at least 2 stems and 2 affixes:

3. Morphology

generation 1
Generation 1

3. Morphology

Then it refines this initial approximation in a large number of ways, always trying to decrease the description length of the initial corpus.

french roots
French roots

3. Morphology

slide38

3. Morphology

4. Detect allomorphy

Signature: <e>ion . NULL

composite concentrate corporate détente

discriminate evacuate inflate opposite

participate probate prosecute tense

What is this?

composite and composition

composite composit  composit + ion

It infers that iondeletes a stem-final ‘e’ before attaching.

slide41

4. Morphology

Swahili verb

slide42

4. Morphology

Swahili verb

Subject marker

slide43

4. Morphology

Swahili verb

Subject marker

Tense marker

slide44

4. Morphology

Swahili verb

Subject marker

Tense marker

Object marker

slide45

4. Morphology

Swahili verb

Subject marker

Object marker

Tense marker

Root

slide46

4. Morphology

Swahili verb

Subject marker

Object marker

Tense marker

Root

Voice (active/passive)

slide47

4. Morphology

Swahili verb

Subject marker

Object marker

Tense marker

Root

Voice (active/passive)

Finalvowel

generalize the signature
Generalize the signature…

4. Morphology

Sequential FSA: each state has a unique successor.

alignments
Alignments

4. Morphology

conclusion
Conclusion

Learning of morphology, based on form, requires an explicit structural model.

Information theoretic quantities appear to play a major role.

Analogy can play a very useful role in the establishment of learning heuristics