preliminary experiments in morphological evolution n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
(Preliminary) Experiments in Morphological Evolution PowerPoint Presentation
Download Presentation
(Preliminary) Experiments in Morphological Evolution

Loading in 2 Seconds...

play fullscreen
1 / 62

(Preliminary) Experiments in Morphological Evolution - PowerPoint PPT Presentation


  • 99 Views
  • Uploaded on

Richard Sproat University of Illinois at Urbana-Champaign rws@uiuc.edu 3rd Workshop on "Quantitative Investigations in Theoretical Linguistics" (QITL-3) Helsinki, 2-4 June 2008. (Preliminary) Experiments in Morphological Evolution. Overview. The explananda

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '(Preliminary) Experiments in Morphological Evolution' - senta


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
preliminary experiments in morphological evolution

Richard Sproat

University of Illinois at Urbana-Champaign

rws@uiuc.edu

3rd Workshop on "Quantitative Investigations in Theoretical Linguistics" (QITL-3)

Helsinki, 2-4 June 2008

(Preliminary)Experiments in Morphological Evolution
overview
Overview
  • The explananda
  • Previous work on evolutionary modeling
  • Computational models and preliminary experiments
phenomena
Phenomena
  • How do paradigms arise?
    • Why do words fall into different inflectional “equivalence classes”
  • Why do stem alternations arise?
  • Why is there syncretism?
    • Why are there “rules of referral”?
stem alternations in sanskrit
Stem alternations in Sanskrit

zero

guna

Examples from: Stump, Gregory (2001) Inflectional Morphology:

A Theory of Paradigm Structure. Cambridge University Press.

stem alternations in sanskrit1
Stem alternations in Sanskrit

morphomic(Aronoff, M. 1994. Morphology by Itself. MIT Press.)

vrddhi

lexeme-class

particular

lexeme-class particular

evolutionary modeling a tiny sample
Evolutionary Modeling (A tiny sample)
  • Hare, M. and Elman, J. L. (1995) Learning and morphological change. Cognition, 56(1):61--98.
  • Kirby, S. (1999) Function, Selection, and Innateness: The Emergence of Language Universals. Oxford
  • Nettle, D. "Using Social Impact Theory to simulate language change". Lingua, 108(2-3):95--117, 1999.
  • de Boer, B. (2001) The Origins of Vowel Systems. Oxford
  • Niyogi, P. (2006) The Computational Nature of Language Learning and Evolution. Cambridge, MA: MIT Press.
rules of referral
Rules of referral
  • Stump, Gregory (1993) “On rules of referral”. Language. 69(3), 449-479
    • (After Zwicky, Arnold (1985) “How to describe inflection.” Berkeley Linguistics Society. 11, 372-386.)
are rules of referral interesting
Are rules of referral interesting?
  • Are they useful for the learner?
    • Wouldn’t the learner have heard instances of every paradigm?
  • Are they historically interesting:
    • Does morphological theory need mechanisms to explain why they occur?
another example b stani nominal declension
Another example: Böğüstani nominal declension

sSg Du Pl

sSg Du Pl

sSg Du Pl

Nom

Acc

Gen

Dat

Loc

Inst

Abl

Illat

  • Böğüstani
  • A language of Uzbekistan

ISO 639-3: bgs

Population 15,500 (1998 Durieux).

Comments Capsicum chinense and Coffea arabica farmers

monte carlo simulation generating b stani
Monte Carlo simulation(generating Böğüstani)
  • Select a re-use bias B
  • For each language:
    • Generate a set of vowels, consonants and affix templates
      • a, i, u, e
      • n f r w B s x j D
      • V, C, CV, VC
    • Decide on p paradigms (minimum 3), r rows (minimum 2), c columns (minimum 2)
monte carlo simulation
Monte Carlo simulation
  • For each paradigm in the language:
    • Iterate over (r, c):
      • Let α be previous affix stored for r: with p = B retain α in L
      • Let β be previous affix stored for c: with p = B retain β in L
      • If either L is non-empty, set (r, c) to random choice from L
      • Otherwise generate a new affix for (r, c)
      • Store (r, c)’s affix for r and c
  • Note that P(new-affix) = (1-B)2
sample language bias 0 04
Sample language: bias = 0.04

Consonants x n p w j B t r s S m

Vowels a i u e

Templates V, C, CV, VC

sample language bias 0 041
Sample language: bias = 0.04

Consonants n f r w B s x j D

Vowels a i u e

Templates V, C, CV, VC

sample language bias 0 042
Sample language: bias = 0.04

Consonants r p j d G D

Vowels a i u e o y O

Templates V, C, CV, VC,CVC, VCV, CVCV, VCVC

sample language bias 0 043
Sample language: bias = 0.04

Consonants D k S n b s l t w j B g G d

Vowels a i u e

Templates V, C, CV, VC

interim conclusion
Interim conclusion
  • Syncretism, including rules of referral, may arise as a chance byproduct of tendencies to reuse inflectional exponents --- and hence reduce the number of exponents needed in the system.
  • Side question: is the amount of ambiguity among inflectional exponents statistically different from that among lexemes? (cf. Beard’s Lexeme-Morpheme-Base Morphology)
    • Probably not since inflectional exponents tend to be shorter, so the chances of collisions are much higher
paradigm reduction in multi agent models with scale free networks
Paradigm Reduction in Multi-agent Models with Scale-Free Networks
  • Agents connected in scale-free network
  • Only connected agents communicate
  • Agents more likely to update forms from interlocutors they “trust”
  • Each individual agent has pressure to simplify its morphology by collapsing exponents:
    • Exponent collapse is picked to minimize an increase in paradigm entropy
    • Paradigms may be simplified – removing distinctions and thus reducing paradigm entropy
    • As the number of exponents decreases so does the pressure to reduce
    • Agents analogize paradigms to other words
scale free networks1
Scale-free networks
  • Connection degrees follow the Yule-Simon distribution:

where for sufficiently large k:

i.e. reduces to Zipf’s law (cf. Baayen, Harald (2000) Word Frequency Distributions. Springer.)

relevance of scale free networks
Relevance of scale-free networks
  • Social networks are scale-free
  • Nodes with multiple connections seem to be relevant for language change.
    • cf: James Milroy and Lesley Milroy (1985) “Linguistic change, social network and speaker innovation.” Journal of Linguistics, 21:339–384.
scale free networks in the model
Scale-free networks in the model
  • Agents communicate individual forms to other agents
  • When two agents differ on a form, one agent will update its form with a probability p proportional to how well connected the other agent is:
    • p = MaxP X ConnectionDegree(agent)/MaxConnectionDegree
    • (Similar to Page Rank)
paradigm entropy
Paradigm entropy
  • For exponents φ and morphological functions μ, define the Paradigm Entropy as:

(NB: this is really just the conditional entropy)

  • If each exponent is unambiguous, the paradigm entropy is 0
simulation
Simulation
  • 100 agents in scale-free or random network
    • Roughly 250 connections in either case
  • 20 bases
  • 5 “cases”, 2 “numbers”: each slot associated with a probability
  • Max probability of updating one’s form for a given slot given what another agent has is 0.2 or 0.5
  • Probability of analogizing within one’s own vocabulary is 0.01, 0.02 or 0.05
    • Also a mode where we force analogy every 50 iterations
    • Analogize to words within same “analogy group” (4 such groups in current simulation)
    • Winner-takes all strategy
  • (Numbers in the titles of the ensuing plots are given as UpdateProb/AnalogyProb (e.g. 0.2/0.01))
  • Run for 1000 iterations
features of simulation
Features of simulation
  • At nth iteration, compute:
    • The paradigm distribution over agents for each word.
      • Paradigm purity is the proportion of the “winning paradigm”
    • The number of distinct winning paradigms
sample final state
Sample final state

0.24

0.21

0.095

0.095

0.06

0.12

0.095

0.048

0.024

0.012

interim conclusions
Interim conclusions
  • Scale-free networks don’t seem to matter: convergence behavior seems to be no different from a random network
    • Is that a big surprise?
  • Analogy matters
  • Paradigm entropy (conditional entropy) might be a model for paradigm simplification
synopsis
Synopsis
  • System is seeded with a grammar and small number of agents
    • Initial grammars all show an agglutinative pattern
    • Each agent randomly selects a set of phonetic rules to apply to forms
    • Agents are assigned to one of a small number of social groups
  • 2 parents “beget” child agents.
    • Children are exposed to a predetermined number of training forms combined from both parents
      • Forms are presented proportional to their underlying “frequency”
    • Children must learn to generalize to unseen slots for words
    • Learning algorithm similar to:
      • David Yarowsky and Richard Wicentowski (2001) "Minimally supervised morphological analysis by multimodal alignment." Proceedings of ACL-2000, Hong Kong, pages 207-216.
      • Features include last n-characters of input form, plus semantic class
    • Learners select the optimal surface form to derive other forms from (optimal = requiring the simplest resulting ruleset – a Minimum Description Length criterion)
  • Forms are periodically pooled among all agents and the n best forms are kept for each word and each slot
  • Population grows, but is kept in check by “natural disasters” and a quasi-Malthusian model of resource limitations
    • Agents age and die according to reasonably realistic mortality statistics
phonological rules
Phonological rules
  • c_assimilation
  • c_lenition
  • degemination
  • final_cdel
  • n_assimilation
  • r_syllabification
  • umlaut
  • v_nasalization
  • voicing_assimilation
  • vowel_apocope
  • vowel_coalescence
  • vowel_syncope

K = [ptkbdgmnNfvTDszSZxGCJlrhX]

L = [wy]

V = [aeiouAEIOU&@0âêîôûÂÊÎÔÛãõÕ]

## Regressive voicing assimilation

b -> p / - _ #?[ptkfTsSxC]

d -> t / - _ #?[ptkfTsSxC]

g -> k / - _ #?[ptkfTsSxC]

D -> T / - _ #?[ptkfTsSxC]

z -> s / - _ #?[ptkfTsSxC]

Z -> S / - _ #?[ptkfTsSxC]

G -> x / - _ #?[ptkfTsSxC]

J -> C / - _ #?[ptkfTsSxC]

K = [ptkbdgmnNfvTDszSZxGCJlrhX]

L = [wy]

V = [aeiouAEIOU&@0âêîôûÂÊÎÔÛãõÕ]

[td] -> D / [aeiou&âêîôûã]#? _ #?[aeiou&âêîôûã]

[pb] -> v / [aeiou&âêîôûã]#? _ #?[aeiou&âêîôûã]

[gk] -> G / [aeiou&âêîôûã]#? _ #?[aeiou&âêîôûã]

example run
Example run
  • Initial paradigm:
    • Abog pl+acc Abogmeon
    • Abog pl+dat Abogmeke
    • Abog pl+gen Abogmei
    • Abog pl+nom Abogmeko
    • Abog sg+acc Abogaon
    • Abog sg+dat Abogake
    • Abog sg+gen Abogai
    • Abog sg+nom Abogako
  • NUMBER 'a' sg 0.7 'me' pl 0.3
  • CASE 'ko' nom 0.4 'on' acc 0.3 'i' gen 0.2 'ke' dat 0.1
  • PHONRULE_WEIGHTING=0.60
  • NUM_TEACHING_FORMS=1500
behavior of agent 4517 at 300 years
Behavior of agent 4517 at 300 “years”

Abog pl+acc Abogmeon

Abog pl+dat Abogmeke

Abog pl+gen Abogmei

Abog pl+nom Abogmeko

Abog sg+acc Abogaon

Abog sg+dat Abogake

Abog sg+gen Abogai

Abog sg+nom Abogako

Abog pl+acc Abogmeô

Abog pl+dat Abogmeke

Abog pl+gen Abogmei

Abog pl+nom Abogmeko

Abog sg+acc Abogaô

Abog sg+dat Abogake

Abog sg+gen Abogai

Abog sg+nom Abogako

lArpux pl+acc lArpuxmeô

lArpux pl+dat lArpuxmeGe

lArpux pl+gen lArpuxmei

lArpux pl+nom lArpuxmeGo

lArpux sg+acc lArpuxaô

lArpux sg+dat lArpuxaGe

lArpux sg+gen lArpuxai

lArpux sg+nom lArpuxaGo

lIdrab pl+acc lIdravmeô

lIdrab pl+dat lIdrabmeke

lIdrab pl+gen lIdravmei

lIdrab pl+nom lIdrabmeGo

lIdrab sg+acc lIdravaô

lIdrab sg+dat lIdravaGe

lIdrab sg+gen lIdravai

lIdrab sg+nom lIdravaGo

59 paradigms covering 454 lexemes

another run1
Another run
  • Initial paradigm:
    • Adgar pl+acc Adgarmeon
    • Adgar pl+dat Adgarmeke
    • Adgar pl+gen Adgarmei
    • Adgar pl+nom Adgarmeko
    • Adgar sg+acc Adgaraon
    • Adgar sg+dat Adgarake
    • Adgar sg+gen Adgarai
    • Adgar sg+nom Adgarako
  • PHONRULE_WEIGHTING=0.80
  • NUM_TEACHING_FORMS=1500
behavior of agent 5061 at 300 years
Behavior of agent 5061 at 300 “years”

Abog pl+acc Abogmeon

Abog pl+dat Abogmeke

Abog pl+gen Abogmei

Abog pl+nom Abogmeko

Abog sg+acc Abogaon

Abog sg+dat Abogake

Abog sg+gen Abogai

Abog sg+nom Abogako

Albir pl+acc Elbirmen

Albir pl+dat ElbirmeGe

Albir pl+gen Elbirm

Albir pl+nom ElbirmeGo

Albir sg+acc Elbiran

Albir sg+dat Elbira

Albir sg+gen Elbi

Albir sg+nom Elbira

rIsxuf pl+acc rIsxufamen

rIsxuf pl+dat rIsxufamke

rIsxuf pl+gen rIsxufme

rIsxuf pl+nom rIsxufmeGo

rIsxuf sg+acc rIsxufan

rIsxuf sg+dat rIsxufaGe

rIsxuf sg+gen rIsxufa

rIsxuf sg+nom rIsxufaGo

Utber pl+acc Ubbermen

Utber pl+dat UbbermeGe

Utber pl+gen Ubberme

Utber pl+nom UbberameGo

Utber sg+acc Ubberan

Utber sg+dat UbberaGe

Utber sg+gen Ubbera

Utber sg+nom UbberaGo

109 paradigms covering 397 lexemes

one more example1
One more example
  • Initial paradigm … as before
  • PHONRULE_WEIGHTING=0.80
  • NUM_TEACHING_FORMS=1000
behavior of agent 4195 at 300 years
Behavior of agent 4195 at 300 “years”

Abog pl+acc Abogmeon

Abog pl+dat Abogmeke

Abog pl+gen Abogmei

Abog pl+nom Abogmeko

Abog sg+acc Abogaon

Abog sg+dat Abogake

Abog sg+gen Abogai

Abog sg+nom Abogako

Odeg pl+acc Odm

Odeg pl+dat Ô

Odeg pl+gen Odm

Odeg pl+nom Oxm

Odeg sg+acc O

Odeg sg+dat O

Odeg sg+gen O

Odeg sg+nom O

fApbof pl+acc fAbofdm

fApbof pl+dat fAbofm

fApbof pl+gen fAbofdm

fApbof pl+nom fAbofxm

fApbof sg+acc fAbof

fApbof sg+dat fAbof

fApbof sg+gen fAbof

fApbof sg+nom fAbof

dugfIp pl+acc dikfIdm

dugfIp pl+dat dikfÎ

dugfIp pl+gen dikfIdm

dugfIp pl+nom dikfIxm

dugfIp sg+acc dikfI

dugfIp sg+dat dikfI

dugfIp sg+gen dikfI

dugfIp sg+nom dikfI

unfEr pl+acc ûfEdm

unfEr pl+dat ûfÊ

unfEr pl+gen ûfEtm

unfEr pl+nom ûfExm

unfEr sg+acc ûfE

unfEr sg+dat ûfE

unfEr sg+gen ûfE

unfEr sg+nom ûfE

exgUp pl+acc exgUdm

exgUp pl+dat exgÛ

exgUp pl+gen exgUgm

exgUp pl+nom exgUxm

exgUp sg+acc exgU

exgUp sg+dat exgU

exgUp sg+gen exgU

exgUp sg+nom exgU

66 paradigms covering 250 lexemes

final example
Final example…
  • NUMBER 'a' sg 0.6 'tu' du 0.1 'me' pl 0.3
  • CASE 'ko' nom 0.4 'on' acc 0.3 'i' gen 0.2 'ke' dat 0.1
  • PHONRULE_WEIGHTING=0.80
  • NUM_TEACHING_FORMS=1000
final example some agent or other
Final example (some agent or other)

Abbus du+acc Abbustuon

Abbus du+dat Abbustuke

Abbus du+gen Abbustui

Abbus du+nom Abbustuko

Abbus pl+acc Abbusmeon

Abbus pl+dat Abbusmeke

Abbus pl+gen Abbusmei

Abbus pl+nom Abbusmeko

Abbus sg+acc Abbusaon

Abbus sg+dat Abbusake

Abbus sg+gen Abbusai

Abbus sg+nom Abbusako

Agsaf du+acc Aksaf

Agsaf du+dat AkstuG

Agsaf du+gen Aksaf

Agsaf du+nom Aksaf

Agsaf pl+acc Aksafm

Agsaf pl+dat Aksafm

Agsaf pl+gen Aksafm

Agsaf pl+nom Aksafm

Agsaf sg+acc Aksaf

Agsaf sg+dat Aksaf

Agsaf sg+gen Aksaf

Agsaf sg+nom Aksaf

mampEl du+acc mãpEl

mampEl du+dat mãptuG

mampEl du+gen mãpEl

mampEl du+nom mãpEl

mampEl pl+acc mãpElm

mampEl pl+dat mãpElrm

mampEl pl+gen mãpElm

mampEl pl+nom mãpElm

mampEl sg+acc mãpEl

mampEl sg+dat mãpEl

mampEl sg+gen mãpEl

mampEl sg+nom mãpEl

odEs du+acc odEs

odEs du+dat ottuG

odEs du+gen odEs

odEs du+nom oktuG

odEs pl+acc odEsm

odEs pl+dat odEsrm

odEs pl+gen odEsm

odEs pl+nom odEskm

odEs sg+acc odEs

odEs sg+dat odEs

odEs sg+gen odEs

odEs sg+nom odEs

rIndar du+acc rÎdar

rIndar du+dat rÎttuG

rIndar du+gen rÎdar

rIndar du+nom rÎktuG

rIndar pl+acc rÎdarm

rIndar pl+dat rÎdarm

rIndar pl+gen rÎdarm

rIndar pl+nom rÎdarm

rIndar sg+acc rÎdar

rIndar sg+dat rÎdar

rIndar sg+gen rÎdar

rIndar sg+nom rÎdar

171 paradigms covering

228 lexemes

questions
Questions
  • Are there too many paradigms?
  • Is there too much irregularity?
how many paradigms can there be
How many paradigms can there be?
  • Russian: “nouns belong to one of three declension patterns”. (Wade, Terence (1992) Comprehensive Russian Grammar. Blackwell, Oxford)
    • Wade discusses many subclasses
  • From Zaliznjak, A. (1987) Gramaticheskij slovar russkogo jazyka, Russki jazyk, Moscow:
    • at least 500 classes spread over 55,000 nouns
future work
Future work
  • More realistic learning
  • Incorporate paradigm reduction and analogy mechanisms from Experiment 2
  • Add other sources of variation, such as borrowing of other forms
  • Develop evaluation metrics:
    • Can we go beyond “look Ma, it learns”?
acknowledgments
Acknowledgments
  • Center for Advanced Studies for release time Fall 2007
  • “The National Science Foundation through TeraGrid resources provided by the National Center for Supercomputing Applications”
  • Google Research grant (for infrastructure originally associated with another project…)
  • For helpful discussion/suggestions:
    • Chen Li
    • Shalom Lappin
    • Juliette Blevins
    • Les Gasser & the LEADS group
    • Audience at UIUC Linguistics Seminar