ling 696b gradient phonotactics and well formedness n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
LING 696B: Gradient phonotactics and well-formedness PowerPoint Presentation
Download Presentation
LING 696B: Gradient phonotactics and well-formedness

Loading in 2 Seconds...

play fullscreen
1 / 28

LING 696B: Gradient phonotactics and well-formedness - PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on

LING 696B: Gradient phonotactics and well-formedness. Vote on remaining topics. Topics that have been fixed: Morpho-phonological learning (Emily) + (LouAnn’s lecture) + Bayesian learning Rule induction (Mans) + decision tree Learning and self-organization (Andy’s lecture).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'LING 696B: Gradient phonotactics and well-formedness' - ciel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
vote on remaining topics
Vote on remaining topics
  • Topics that have been fixed:
    • Morpho-phonological learning (Emily) + (LouAnn’s lecture) + Bayesian learning
    • Rule induction (Mans) + decision tree
    • Learning and self-organization (Andy’s lecture)
voting on remaining topics
Voting on remaining topics
  • Select 2-3 from the following (need a ranking):
    • OT and Stochastic OT
    • Alternatives to OT: random fields/maximum entropy
    • Minimal Description Length word chopping
    • Feature-based lexical access
well formedness of words following mike s talk
Well-formedness of words (following Mike’s talk)
  • A word “sounds like English” if:
    • It is a close neighbor of some words that sound really English. E.g. “pand” is neighbor of sand, band, pad, pan, …
    • It agrees with what English grammar says what an English word should look like, e.g. gradient phonotactics says blick > bnick
well formedness of words following mike s talk1
Well-formedness of words (following Mike’s talk)
  • A word “sounds like English” if:
    • It is a close neighbor of some words that sound really English. E.g. “pand” is neighbor of sand, band, pad, pan, …
    • It agrees with what English grammar says what an English word should look like, e.g. gradient phonotactics says blick > bnick
  • Today: relate these two ideas to the non-parametric and parametric perspectives
many ways of calculating probability of a sequence
Many ways of calculating probability of a sequence
  • Unigrams, bigrams, trigrams, syllable parts, transition probabilities …
    • No bound on the number of creative ways
many ways of calculating probability of a sequence1
Many ways of calculating probability of a sequence
  • Unigrams, bigrams, trigrams, syllable parts, transition probabilities …
    • No bound on the number of creative ways
  • What does it mean to say the “probability” of a phonological word?
    • Objective/frequentist v.s. subjective/ Bayesian: philosophical (but important)
many ways of calculating probability of a sequence2
Many ways of calculating probability of a sequence
  • Unigrams, bigrams, trigrams, syllable parts, transition probabilities …
    • No bound on the number of creative ways
  • What does it mean to say the “probability” of a phonological word?
    • Objective/frequentist v.s. subjective/ Bayesian: philosophical (but important)
  • Thinking “parametrically” may clarify things
    • “likelihood” = “probability” calculated from a model
parametric approach to phonotactics
Parametric approach to phonotactics
  • Example: “bag of sounds” assumption/ exchangable distributions
    • p(blik) = p(lbik) = p(kbli)
parametric approach to phonotactics1
Parametric approach to phonotactics
  • Example: “bag of sounds” assumption/ exchangable distributions
    • p(blik) = p(lbik) = p(kbli)
  • Unigram models: N-1 parameters

What is ?

How to get (hat)?

How to assign prob

to “blick”?

B L I K

parametric approach to phonotactics2
Parametric approach to phonotactics
  • Unigram model with overlapping observations: N2 - 1 parameters

What is ?

How to get (hat)?

How to assign prob

to “blick”?

B L I K

Note: input is #B BL LI IK K#

parametric approach to phonotactics3
Parametric approach to phonotactics
  • Unigram with annotated observations (Coleman and Pierrehumbert)

“rsif”

“osif”

Input: segment annotated with a syllable parse

BL IK

Onset of strong

Initial/final syllable

Rhyme of strong

Initial/final syllable

parametric approach to phonotactics4
Parametric approach to phonotactics
  • Bigram model: N(N-1) parameters {p(wn|wn-1)} (how many for trigram?)

B L I K

Input: segment sequence

ways that theory might help calculate probability
Ways that theory might help calculate probability
  • Probability calculation must be based on an explicit model
    • Need a story about what sequences are
  • How can phonology help with calculating sequence probability?
    • More delicate representations
    • More complex models
ways that theory might help calculate probability1
Ways that theory might help calculate probability
  • Probability calculation must be based on an explicit model
    • Need a story about what sequences are
  • How can phonology help with calculating sequence probability?
    • More delicate representations
    • More complex models
  • But: phonology is not quite about what sequences are …
more delicate representations
More delicate representations
  • Would CV phonology help?
  • Auto-segmental tiers, features, gestures?
    • The chains no longer independent: more sophisticated models are needed
    • Limit: generative model of speech production (very hard)

B L I K I T

more complex models
More complex models
  • Mixture of unigrams
    • Used in document classification

Lexical strata

Unigram

B L I K

more complex models1
More complex models
  • More structure in the Markov chain
    • Can also model the length distribution with the so-called semi-Markov models

“rhyme VC”

“rhyme V”

“onset”

BL IK

more complex models2
More complex models
  • Probabilistic context free grammar
    • Syllable --> C + VC (0.6)
    • Syllable --> C + V (0.35)
    • Syllable --> C + C (0.05)
    • C --> _ (0.01)
    • C --> b (0.05)
  • See 439/539
what s the benefit for doing more sophisticated things
What’s the benefit for doing more sophisticated things?
  • Recall: maximum likelihood need more data to produce a better estimate
  • Data sparsity problem: training data often insufficient for estimating all the parameters, e.g. zero counts
    • Lexicon size: we don’t have infinitely many words to estimate phonotactics
    • Smoothing: properly done, has a Bayesian interpretation (often not)
probability and well formedness
Probability and well-formedness
  • Generative modeling: characterize a distribution over strings
  • Why should we care about this distribution?
    • Hope: this may have something to do with grammaticality judgements
  • But: judgements also affected by what other words “sound like”.
    • Puzzle of mrupect/mrupation
    • It may be easier to model a function with input = string, output = judgements
bailey and hahn
Bailey and Hahn
  • Tried all kinds of ways of calculating phonotatics and neighborhood density, and see which combination “works the best”
    • Typical reasoning: “metric X and Y as factors explain 15% variance”
bailey and hahn1
Bailey and Hahn
  • Tried all kinds of ways of calculating phonotatics and neighborhood density, and see which combination “works the best”
    • Typical reasoning: “metric X and Y as factors explain 15% variance”
  • Methodology: ANOVA
    • Model (1-way): data = overall mean + effect + error
    • What can ANOVA do for us?
    • How do we check if ANOVA makes sense?
    • What is the “explained variance”?
non parametric approach to similarity neighborhood
Non-parametric approach to similarity neighborhood
  • A hint from B&H: the neighborhood model
    • dij is weighted edit distance
    • A,B,C,D estimated from polynomial regression
  • Recall: radial basis functions F(x) = i ai K(x, xi), with K(x, xi) = e -d(x, xi)
    • Quadratic weighting ad hoc, should just do general nonlinear regression with RBF
non parametric approach to similarity neighborhood1
Non-parametric approach to similarity neighborhood
  • Recall: RBF as a “soft” neighborhood model
  • Now think of strings also as data points, with neighborhood defined by some string distance (e.g. edit)
    • Same kind of regression with RBF
non parametric approach to similarity neighborhood2
Non-parametric approach to similarity neighborhood
  • Key technical point: choosing the right kernel
    • Edit-distance kernel: K(x, xi) = e -edit(x, xi)
    • Sub-string kernel: measuring the length of common sub-sequence (mrupation)
  • Key experimental data: controlled stimuli, split into training and test sets (equal phonotactic prob)
    • No need to transform rating scale
non parametric approach to similarity neighborhood3
Non-parametric approach to similarity neighborhood
  • An enterprise of questions open up with the non-parametric perspective:
    • Would yes/no task lead to word “anchor” like support vectors?
    • Would the new words interact with each other, as seen in the transductive inference?
    • What type of metric most appropriate for inferring well-formedness from neighborhoods?
integration
Integration
  • Hard to integrate with a probabilistic (parametric) model
    • Neighborhood density has a strong non-parametric character -- grows with data
  • Possible to integrate phonotactic prob in a non-parametric model: kernel algebra
    • aK1(x,y) + bK2(x,y), K1(x,y)*K2(x,y) are also kernels
    • p kernel: K(x1, x2)= i p(x2|h)p(x1|h)p(h) p comes from parametric model