280 likes | 292 Views
Explaining L2 perceptual development: Machine learning vs. computational Stochastic OT vs. human learners. Paola Escudero, Jelle Kastelein & Klara Weiand University of Amsterdam. Introduction. Comparison of models for L2 sound perception development
E N D
Explaining L2 perceptual development: Machine learning vs. computational Stochastic OT vs. human learners Paola Escudero, Jelle Kastelein & Klara Weiand University of Amsterdam
Introduction • Comparison of models for L2 sound perception development • Part of the human data presented in the talk yesterday • Classical machine learning: Naive Bayesian, Nearest Neighbor • Stochastic OT: Linguistic theory
Listeners 23 European Spanish learners of Dutch 22 native Dutch adults Different proficiency levels according to the EU measure of language proficiency
Analysis • We measured the listeners’ perceptual space, i.e. the distance between the F1 & F2 values which they categorized as the 12 Dutch vowels • We first compute the mean and variation for the perception of each vowel ➝ ellipses • Then, we calculated the distances between the mean perception of the Dutch central vowel /ø/ and the mean perception of the other 11vowels • Here we present the variation and distances for the corner vowels /a/, /i/ and /u/ and the central vowel /ø/, statistics are performed on the 11 distances between vowels
Explaining L2 perception • Three different learning algorithms • Different levels of abstraction from the training input • Process: • Model a native listener of Spanish • Beginning learner of Dutch: Map responses of „native speaker“ to Dutch vowel space • Advanced learner: train native speaker model with native Dutch data
Nearest Neighbour • „Lazy learner“ • Training: Save examples in Euclidean space • Classification: Assign class most frequent among nearest neighbors • No abstraction from data
Naive Bayesian • Statistical model • Assumption: class of data point can be inferred from its attributes. Example: fruits • Training: Observe how often each class appears and what attribute values correspond to which class • Classification: Maximize vowel class probability given the attributes • Training data is abstracted into a stochastic model
Stochastic OT • Computational linguistic framework • Training: Best class is the class with least serious constraint violations, Constraint rankings are adapted according to training data • Classification: Select candidate class with least serious violations • More abstract than previous two, no explicit probabilities, but constraint rankings which reflect them
Human vs. simulated data Human: Solid red line OT: Solid black line Naive Bayes: Dashed line Nearest Neighbor: Dotted line
Results • Naive Bayesian is significantly different from human data (Wilcoxon Matched Pairs Signed Ranks test) • No significant difference between humans and Nearest Neighbor and stochastic OT
Results • No significant difference between humans and either classifier
Results • Nearest Neighbor differs significantly from the humans • No significant difference between humans and Naive Bayes and stochastic OT
Conclusion • The most abstract model, stochastic OT, gives the best results: it resembles humans in all simulations • Distance measure helps to quantify difference between vowels
Acknowledgements: Netherlands Organization for Scientific Research Research assistants: Jeannette Elsenburg, Annemarieke Samason, Titia Benders, Marieke Gerrits email: escudero@uva.nl kweiand@science.uva.nl