Comparison of L2 Sound Perception Development Models: Machine Learning vs Human Learners

Explaining L2 perceptual development: Machine learning vs. computational Stochastic OT vs. human learners Paola Escudero, Jelle Kastelein & Klara Weiand University of Amsterdam

Introduction • Comparison of models for L2 sound perception development • Part of the human data presented in the talk yesterday • Classical machine learning: Naive Bayesian, Nearest Neighbor • Stochastic OT: Linguistic theory

Listeners 23 European Spanish learners of Dutch 22 native Dutch adults Different proficiency levels according to the EU measure of language proficiency

Spanish vs. Dutch vowels

113 synthetic stimuli

Vowel categorization task

Analysis • We measured the listeners’ perceptual space, i.e. the distance between the F1 & F2 values which they categorized as the 12 Dutch vowels • We first compute the mean and variation for the perception of each vowel ➝ ellipses • Then, we calculated the distances between the mean perception of the Dutch central vowel /ø/ and the mean perception of the other 11vowels • Here we present the variation and distances for the corner vowels /a/, /i/ and /u/ and the central vowel /ø/, statistics are performed on the 11 distances between vowels

Beginners vs. Dutch

Advanced vs. Dutch

Explaining L2 perception • Three different learning algorithms • Different levels of abstraction from the training input • Process: • Model a native listener of Spanish • Beginning learner of Dutch: Map responses of „native speaker“ to Dutch vowel space • Advanced learner: train native speaker model with native Dutch data

Spanish vowel categorization task

Nearest Neighbour • „Lazy learner“ • Training: Save examples in Euclidean space • Classification: Assign class most frequent among nearest neighbors • No abstraction from data

Nearest Neighbor

Naive Bayesian • Statistical model • Assumption: class of data point can be inferred from its attributes. Example: fruits • Training: Observe how often each class appears and what attribute values correspond to which class • Classification: Maximize vowel class probability given the attributes • Training data is abstracted into a stochastic model

Naive Bayesian

Stochastic OT • Computational linguistic framework • Training: Best class is the class with least serious constraint violations, Constraint rankings are adapted according to training data • Classification: Select candidate class with least serious violations • More abstract than previous two, no explicit probabilities, but constraint rankings which reflect them

Stochastic OT

Human vs. simulated data Human: Solid red line OT: Solid black line Naive Bayes: Dashed line Nearest Neighbor: Dotted line

Simulated vs. Human Dutch

Results • Naive Bayesian is significantly different from human data (Wilcoxon Matched Pairs Signed Ranks test) • No significant difference between humans and Nearest Neighbor and stochastic OT

Beginning Learners

Results • No significant difference between humans and either classifier

Advanced Learners

Results • Nearest Neighbor differs significantly from the humans • No significant difference between humans and Naive Bayes and stochastic OT

Conclusion • The most abstract model, stochastic OT, gives the best results: it resembles humans in all simulations • Distance measure helps to quantify difference between vowels

Acknowledgements: Netherlands Organization for Scientific Research Research assistants: Jeannette Elsenburg, Annemarieke Samason, Titia Benders, Marieke Gerrits email: escudero@uva.nl kweiand@science.uva.nl

Comparison of L2 Sound Perception Development Models: Machine Learning vs Human Learners

Comparison of L2 Sound Perception Development Models: Machine Learning vs Human Learners

Presentation Transcript