1 / 28

LING 696B: Gradient phonotactics and well-formedness

LING 696B: Gradient phonotactics and well-formedness. Vote on remaining topics. Topics that have been fixed: Morpho-phonological learning (Emily) + (LouAnn’s lecture) + Bayesian learning Rule induction (Mans) + decision tree Learning and self-organization (Andy’s lecture).

ciel
Download Presentation

LING 696B: Gradient phonotactics and well-formedness

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 696B: Gradient phonotactics and well-formedness

  2. Vote on remaining topics • Topics that have been fixed: • Morpho-phonological learning (Emily) + (LouAnn’s lecture) + Bayesian learning • Rule induction (Mans) + decision tree • Learning and self-organization (Andy’s lecture)

  3. Voting on remaining topics • Select 2-3 from the following (need a ranking): • OT and Stochastic OT • Alternatives to OT: random fields/maximum entropy • Minimal Description Length word chopping • Feature-based lexical access

  4. Well-formedness of words (following Mike’s talk) • A word “sounds like English” if: • It is a close neighbor of some words that sound really English. E.g. “pand” is neighbor of sand, band, pad, pan, … • It agrees with what English grammar says what an English word should look like, e.g. gradient phonotactics says blick > bnick

  5. Well-formedness of words (following Mike’s talk) • A word “sounds like English” if: • It is a close neighbor of some words that sound really English. E.g. “pand” is neighbor of sand, band, pad, pan, … • It agrees with what English grammar says what an English word should look like, e.g. gradient phonotactics says blick > bnick • Today: relate these two ideas to the non-parametric and parametric perspectives

  6. Many ways of calculating probability of a sequence • Unigrams, bigrams, trigrams, syllable parts, transition probabilities … • No bound on the number of creative ways

  7. Many ways of calculating probability of a sequence • Unigrams, bigrams, trigrams, syllable parts, transition probabilities … • No bound on the number of creative ways • What does it mean to say the “probability” of a phonological word? • Objective/frequentist v.s. subjective/ Bayesian: philosophical (but important)

  8. Many ways of calculating probability of a sequence • Unigrams, bigrams, trigrams, syllable parts, transition probabilities … • No bound on the number of creative ways • What does it mean to say the “probability” of a phonological word? • Objective/frequentist v.s. subjective/ Bayesian: philosophical (but important) • Thinking “parametrically” may clarify things • “likelihood” = “probability” calculated from a model

  9. Parametric approach to phonotactics • Example: “bag of sounds” assumption/ exchangable distributions • p(blik) = p(lbik) = p(kbli)

  10. Parametric approach to phonotactics • Example: “bag of sounds” assumption/ exchangable distributions • p(blik) = p(lbik) = p(kbli) • Unigram models: N-1 parameters What is ? How to get (hat)? How to assign prob to “blick”? B L I K

  11. Parametric approach to phonotactics • Unigram model with overlapping observations: N2 - 1 parameters What is ? How to get (hat)? How to assign prob to “blick”? B L I K Note: input is #B BL LI IK K#

  12. Parametric approach to phonotactics • Unigram with annotated observations (Coleman and Pierrehumbert) “rsif” “osif” Input: segment annotated with a syllable parse BL IK Onset of strong Initial/final syllable Rhyme of strong Initial/final syllable

  13. Parametric approach to phonotactics • Bigram model: N(N-1) parameters {p(wn|wn-1)} (how many for trigram?) B L I K Input: segment sequence

  14. Ways that theory might help calculate probability • Probability calculation must be based on an explicit model • Need a story about what sequences are • How can phonology help with calculating sequence probability? • More delicate representations • More complex models

  15. Ways that theory might help calculate probability • Probability calculation must be based on an explicit model • Need a story about what sequences are • How can phonology help with calculating sequence probability? • More delicate representations • More complex models • But: phonology is not quite about what sequences are …

  16. More delicate representations • Would CV phonology help? • Auto-segmental tiers, features, gestures? • The chains no longer independent: more sophisticated models are needed • Limit: generative model of speech production (very hard) B L I K I T

  17. More complex models • Mixture of unigrams • Used in document classification Lexical strata Unigram B L I K

  18. More complex models • More structure in the Markov chain • Can also model the length distribution with the so-called semi-Markov models “rhyme VC” “rhyme V” “onset” BL IK

  19. More complex models • Probabilistic context free grammar • Syllable --> C + VC (0.6) • Syllable --> C + V (0.35) • Syllable --> C + C (0.05) • C --> _ (0.01) • C --> b (0.05) • … • See 439/539

  20. What’s the benefit for doing more sophisticated things? • Recall: maximum likelihood need more data to produce a better estimate • Data sparsity problem: training data often insufficient for estimating all the parameters, e.g. zero counts • Lexicon size: we don’t have infinitely many words to estimate phonotactics • Smoothing: properly done, has a Bayesian interpretation (often not)

  21. Probability and well-formedness • Generative modeling: characterize a distribution over strings • Why should we care about this distribution? • Hope: this may have something to do with grammaticality judgements • But: judgements also affected by what other words “sound like”. • Puzzle of mrupect/mrupation • It may be easier to model a function with input = string, output = judgements

  22. Bailey and Hahn • Tried all kinds of ways of calculating phonotatics and neighborhood density, and see which combination “works the best” • Typical reasoning: “metric X and Y as factors explain 15% variance”

  23. Bailey and Hahn • Tried all kinds of ways of calculating phonotatics and neighborhood density, and see which combination “works the best” • Typical reasoning: “metric X and Y as factors explain 15% variance” • Methodology: ANOVA • Model (1-way): data = overall mean + effect + error • What can ANOVA do for us? • How do we check if ANOVA makes sense? • What is the “explained variance”?

  24. Non-parametric approach to similarity neighborhood • A hint from B&H: the neighborhood model • dij is weighted edit distance • A,B,C,D estimated from polynomial regression • Recall: radial basis functions F(x) = i ai K(x, xi), with K(x, xi) = e -d(x, xi) • Quadratic weighting ad hoc, should just do general nonlinear regression with RBF

  25. Non-parametric approach to similarity neighborhood • Recall: RBF as a “soft” neighborhood model • Now think of strings also as data points, with neighborhood defined by some string distance (e.g. edit) • Same kind of regression with RBF

  26. Non-parametric approach to similarity neighborhood • Key technical point: choosing the right kernel • Edit-distance kernel: K(x, xi) = e -edit(x, xi) • Sub-string kernel: measuring the length of common sub-sequence (mrupation) • Key experimental data: controlled stimuli, split into training and test sets (equal phonotactic prob) • No need to transform rating scale

  27. Non-parametric approach to similarity neighborhood • An enterprise of questions open up with the non-parametric perspective: • Would yes/no task lead to word “anchor” like support vectors? • Would the new words interact with each other, as seen in the transductive inference? • What type of metric most appropriate for inferring well-formedness from neighborhoods?

  28. Integration • Hard to integrate with a probabilistic (parametric) model • Neighborhood density has a strong non-parametric character -- grows with data • Possible to integrate phonotactic prob in a non-parametric model: kernel algebra • aK1(x,y) + bK2(x,y), K1(x,y)*K2(x,y) are also kernels • p kernel: K(x1, x2)= i p(x2|h)p(x1|h)p(h) p comes from parametric model

More Related