Probabilistic models in phonology. John Goldsmith LSA Institute 2003. Aims for today. Explain the basics of a probabilistic model: notion of a distribution, etc. List some possible uses for such a model Explore English data, perhaps French, Japanese, Dutch
LSA Institute 2003
W = dog; W = ‘d’; W = ‘o’; W= ‘g’;
Prob(W) = pr(d)*pr (o)*pr(g) =
More standard notation:
Pr ( W[i] = L1 | W[i-1] = L2 )
This says: the probability that the ith letter is L1, given that the i-1th letter is L2.
Pr ( W[i] = L1 | W[i-1] = L2 ) =
This is a measure of the stickiness between X and Y in the data: they can attract (MI>0) or repel (MI < 0).
Important question: in a given set of data, how much does the quality of the analysis (which is the plog you’ve calculated) depend on a given MI? The answer is: the number of times which that bigram occurred times the MI. Let’s call that the weighted mutual information.
The relationship between
(plog(a), plog(b), plog(c), … , plog(z) )
Same thing for the mutual information (MI) parameters, living in a bigger space.
2 * 0.22 + 3*0.1 + 1* -0.5 = 0.24
It’s also directly connected to the angle between the vectors: the inner product is the product of the lengths of the 2 vectors times the cosine of the angle between them.
(Constraint conjunction effects will emerge otherwise.)
Candidate with lowest score wins: its plog is minimal.
Consider representations on two levels, and suppose we know the correct UR. Then we wish to choose the representation with the highest probability that includes the UR (i.e., given the UR).
The simplest formulation: prob of a representation is the product of the probability of its surface form times the probability of the UR/SR corresondances:
in second declension nouns vs. hypocoristics
second declension nouns vs. female names