Learning Phonological Alternations in Korean Nouns: A Three-Stage Model
This study by Young Ah Do from MIT explores phonological alternation in Korean noun paradigms, focusing on how children learn to inflect obstruent and non-obstruent-final nouns. It outlines a three-stage learning process observed in children aged 4 to 8, highlighting differences in production and understanding of noun forms. The research investigates how constraints like Output-Output Faithfulness impact learning, and examines variations in morpheme realization influenced by phonological contexts. The findings aim to predict learner behavior in acquiring complex inflection patterns without bias.
Learning Phonological Alternations in Korean Nouns: A Three-Stage Model
E N D
Presentation Transcript
Learning Alternation without Bias Young Ah Do MIT youngah@mit.edu WCCFL 29 April. 22-24. 2011
Learning the Pattern of Alternation • Alternation: the realization of the same morpheme in various phonological contexts. • like[t] buzze[d] hunt[ɪd] • Learning alternation • Korean noun inflection Young Ah Do, MIT
Korean Noun Paradigm • Stem-final obstruents • UR /k’oth/ (Ko 1989) • [k’ot] ‘flower’ • [k’oth-ɨl]~[k’os-ɨl] [k’och-ɨl], [k’oc-ɨl] ‘flower-acc’ (Hayes 1998, Albright 2005, Choe 2004) • Stem-final nonobstruents • UR /pal/ • [pal] ‘foot’ • [paɾ-ɨl] ‘foot-acc’ intersonorant flapping Young Ah Do, MIT
Learning Challenge: Obstruent–final Nouns • Phonotactically unpredictable • An order of usage frequency • s >>ch, th>> c, t (Jun 2007) • Different preference for each suffix • ch-ɨl >> th-il vs. th-e>> ch-e (Jun 2007) • Type frequency highest for lateral final nouns. (Kenstowicz & Sohn 2007) Young Ah Do, MIT
Learning Stages of Alternations • Children (4;2-7;8) produce adult inflection of obstruent-final nouns slower than nonobstruent-final nouns (Do, ms). • Nonobsruent-final nouns: Identical inflectional form across all age groups. • Obstruent-final nouns: Three-stage learning Young Ah Do, MIT
Production of Obstruent-final Nouns • Three-stage learning • Adults : Variations [k’os-ɨl] ~ [k’och-ɨl] ~ [k’oth-ɨl]‘flower-acc’ • 4;2-5;6 : Attempt an isolation form [k’ot] • 6;2-7;8 : Most frequent adult form [k’os-ɨl] Young Ah Do, MIT
Goal • Predict three-stage learning by training a learner with • Constraint-based grammar • The statistical distribution of alternations without • Assuming intrinsic bias Young Ah Do, MIT
Overview • Alternations in Korean noun paradigms and three-stage learning • Analysis • The initial stage is due to incorrectly promoted Output-Output Faithfulness constraints (McCarthy 1999). • The intermediate stage results from demoting Output-Output Faithfulness constraints (OO-F). Young Ah Do, MIT
Overview • MaxEnt Grammar Tool (Hayes 2009) • Simulation • Constraints: OO-F, Markedness, and IO-F constraints. • Type frequency of adult variants in corpus • Learning alternations without bias Young Ah Do, MIT
Korean Noun Paradigm • Three-way laryngeal contrast among obstruents (Jun 2009, p.3) Young Ah Do, MIT
Alternations in Nouns • Neutralization of obstruents to their homorganic lenis stop counterparts in coda position • /suph/ [sup] ‘forest’ • /path/ [pat] ‘field’ • /puəkh/ [puək] ‘kitchen’ Young Ah Do, MIT
Variation in Nouns • Prevocalic allomorphs of the noun stems show variation in final obstruents. Young Ah Do, MIT
Observations about Variation • Variants: [s, th, ch, c, t], excluding tense consonants. • [s] >> [ch], [th] >> [c] , [t] • [ch]-ɨX >> [th]-ɨX • [th]-eX >> [ch]-eX • Relative frequency of variants matches corpus frequency (Jun 2009). Young Ah Do, MIT
Experimental Results • Adults • Variation matchescorpus frequency • Young (4;2-5;6) • Attempt unmarked forms. • [pas-ɨl] ‘field-acc’ [pat], [pat an-ɨl] ‘field inside-acc’ Young Ah Do, MIT
Alternation at Initial Stage Young Ah Do, MIT
Alternation at Intermediate Stage • Intermediate (6;2-7;8) • Attempt the most frequent variant among adult forms. • Nom: [s-i] • Acc: [s-ɨl] • Loc: [s-e], [th-e] • Source: [th-esə] Young Ah Do, MIT
Alternation at Intermediate Stage Young Ah Do, MIT
Learning Path • The pattern of production Young: Unmarked form Intermediate: The most frequent variant Adults: Variants with different preference Young Ah Do, MIT
The 1st stage: Unmarked form • The unmarked form is the most frequent in child directed speech • Unmarked 75%, Nom 20%, Acc 5% (I. Lee 1999) • But, this is not a frequency effect. • Children do know both unmarked and correct inflected forms. • Children innovate [d-i], [d-ɨl] and [d-e] by incorrectly choosing the unmarked form, rather than choosing the most frequent form. Young Ah Do, MIT
Faithfulness to a Base • Claim: Child avoidance and selection • In order to satisfy highly ranked OO-F constraints (McCarthy 1999). • Assumption: unmarked form as a base of Korean noun paradigm (Albright 2008). • BD-Ident constraints, a kind of OO-F, start high in children’s grammar. Young Ah Do, MIT
BD-Ident and Child outputs • BD-Ident[obs,cor]/_[+high,-back, +syl] >> *[obs,cor][+high,-back, +syl] Young Ah Do, MIT
Toward Adults’ Stage • Claim • Child outputs are due to wrongly promoted BD-Ident constraints. • Adults’ grammar • BD-Idents constraints are demoted to their target low ranking (i.e., Markedness >> BD-Ident) • Intermediate stage • Attempt on the most frequent variant. • (maybe add something here)? Young Ah Do, MIT
Question • Question • Assuming a set of BD-Ident, Markedness, Ident-IO constraints, and assuming an input corpus with alternations of varying frequencies, do we predict a three-stage learning process? • Answer: Yes! • (Assuming learning alternations as ranking constraints in their target positions, how can a learner re-rank constraints correctly?) Young Ah Do, MIT
The Goal of Learning Simulation • To see if children can learn alternations purely by the distribution of alternations. • Specifically, • The more a given alternation occurs in the data, the more the relevant BD-Ident constraints will be demoted. Young Ah Do, MIT
The Learner • Given • Grammar consisting of a set of constraints with initial weight • Initial weight: BD-Ident >> M >> IO-F (McCarthy 1999, ? for M >> IO ) • Constraints based on feature specification BD-Ident (Base as unmarked form :Albright 2008) Markedness IO-F (Input as underlying form: Ko 1989) Young Ah Do, MIT
Learning as Weighting Constraints • Given • Outputs with frequency • Frequency of variants according to Corpus count (Jun 2009) • Learning • The learner encounters surface forms violating constraints • Violation of constraints are accessed by Perl script. Young Ah Do, MIT
Weighting Constraints • Learning • Weight constraints according to the frequency of violations in the data. • Weighting constraints • Using MaxEnt (Maximum Entropy: Goldwater & Johnson 2003) • MaxEnt Grammar Tool (Hayes 2009) • After learning • A set of trained (MaxEnt) weights for a grammar • the predicted probabilities assigned to each candidate Young Ah Do, MIT
Weighting Constraints using MaxEnt Young Ah Do, MIT
Learning Data • Korean nouns • Obstruent-final (labial, coronal & velar) • Lateral-final • Nasal-final • Inflection • Nominative : -i • Accusative : -ɨl • Locative/goal : -e • Locative/source : -esə Young Ah Do, MIT
Learning Phonotactics • Phonotactics • Intersonorant flapping: tal taɾ-i ‘moon-nom’ • Intersonorant voicing : pap pab-i ‘rice-nom’ mak mag-i ‘scene-nom’ • Palatalization : tikɨt tikɨci ‘alphabet [t]-nom’ kɨth kɨch-i ‘end-nom’ mas maʃ-i ‘taste-nom’ Young Ah Do, MIT
Learning Alternations • Alternations of obstruent-finals • Violation of BD-Ident: [t] [s, ch, th, c, t] • Violation of IO-F: /s, ch, th, c, t/ [s, ch, th, c, t] • Alternations of non-obstruent-finals • Violation of BD-Ident: [p,k] [ph, kh, k’] • Violation of IO-F: /ph, kh, k’/ [p, k] Young Ah Do, MIT
Three Age Groups • Simulation of different age groups • The older, the more inputs they get. • An example of accusatives • Ratio adapted from corpus count s 52%, ch21 %, th 18%, c 2 %, t 0% 5.2, 2.1, 1.8, 0.2, 0 .. 52, 21, 18, 2, 0 …… 5200, 2100, 1800, 200, 0 Young Ah Do, MIT
The 1st Stage • BD-Ident >>M >> IO-F Interaction of BD-Ident and Markedness constraints. • IO-F constraints are too low to influence the outcome. • URs (Input) are assumed to be discovered later (ref ?). Young Ah Do, MIT
The 1st Stage • Mastery of some phonotactics • Intersonorant voicing *[+son][-v][+son] >> *[+son, +v] >> BD-Id [v] • Intersonorant flapping *[+son][+lat][+son] >> *[ɾ] >> BD-Id [lat] • Partial mastery of palatalization *[+son][+ant, -dis][+son] >> *[ʃ] >> BD-Id [+ant,-dis] *[th][+high, -back, +syl] >> BD-Id [th]/_ [+high, -back, +syl] Young Ah Do, MIT
Faithful Form to a Base • No mastery of palatalization [ti] [ci] BD-Id [obs,cor]/_ [+high, -back, +syl] >> *[obs,cor][+high, -back, +syl] • The most probable outputs that the grammar predicts • Nom: d-i (illegal) • Acc: d-ɨl (unattested) • Loc: d-e(unattested) • Source: d-esə (unattested) Young Ah Do, MIT
Success of Predicting Child Forms • Predicted forms are what young children attempt in early stage. • [pat] unmarked • [pad-i] using base form • Finding • Young children’s outputs are predicted by feeding the learner a small number of alternation data of Korean. Young Ah Do, MIT
The 2nd Stage Feeding more data …… Young Ah Do, MIT
The 2nd Stage • IO-F constraints are still inactive. • Mastery of palatalization BD-Id [obs, cor]/_ [+high, -back, +syl] >> *[obs, cor][+high, -back, +syl] • Mastery of some alternations Young Ah Do, MIT
The 2nd Stage • The most probable outputs are the most frequent adult variants, in general. CorpusSimulation s-i s-i s-ɨl s-ɨl s-e, th-e th-e th-esə s-esə Young Ah Do, MIT
The Final Stage Feeding more data …… Young Ah Do, MIT
The Final Stage • M >> BD-Ident • Some M constraints remain higher than IO-F constraints and some are demoted. • Variation predicted! Young Ah Do, MIT
Predictability of Variants • /path-e/ Observed Predicted • [path-e] 74350.5614804325079656 • [pas-e] 2165 0.3885079149244645 • [pach-e] 120 0.056734328452862 [pad-e] 0 1.0198046038491932E-9 • [pac-e] 20 3.0068321707463542E-4 Young Ah Do, MIT
Predicted Preference • Pattern match with corpus • Overall • [s] >> [ch], [th] >> [c] , [t] • Before –e, -esə • [th-e] >> [ch-e], [th-esə] >> [ch-esə] • Before -ɨl • [ch-ɨl] >> [th-ɨl] Young Ah Do, MIT
Conclusions • Child form due to highly ranked OO-F constraints. • Children can demote constraints just by exposing the rankings to probabilistic data from an adult speech corpus. • That is, without intrinsic bias, the statistics of Korean give rise to the attested learning stages. Young Ah Do, MIT
Simpler Modeling? • Simpler alternative without any reference to OT or OO-F constraints? • Children use the frequencies of alternant to determine which phonological rules to acquire first (suggested by anonymous WCCFL reviewer). Young Ah Do, MIT
Simpler Modeling ? • Pre-[-ɨl] constraints (Jun 2007) • s/_ɨl >> ch/_ɨl >> th/_ɨl >> c/_ɨl, t/_ɨl • p/_ɨl >> ph/_ɨl • k/_ɨl >> kh/_ɨl Earlier Later Young Ah Do, MIT
Evidence for arguing against this proposal?? Young Ah Do, MIT
Reference Young Ah Do, MIT
Thanks Young Ah Do, MIT