N-gram model limitations. Q: What do we do about N-grams which were not in our training corpus? A: We distribute some probability mass from seen N-grams to previously unseen N-grams. This leads to another question: how do we do this?. Unsmoothed bigrams.
Bigram counts (figure 6.4 from text)
p( eat | to ) = c( to eat ) / c( to ) = 860 / 3256 = .26
p( to | eat ) = c( eat to ) / c(eat) = 2 / 938 = .0021
Bigram probabilities (figure 6.5 from text): p( wn | wn-1 )
Add-one smoothed bigram counts (figure 6.6 from text)
p( eat | to ) = c( to eat ) / c( to ) = 861 / 4872 = .18 (was .26)
p( to | eat ) = c( eat to ) / c( eat ) = 3 / 2554 = .0012 (was .0021)
p( eat | lunch ) = c( lunch eat ) / c( lunch ) = 1 / 2075 = .00048 (was 0)
p( eat | want ) = c( want eat ) / c( want ) = 1 / 2931 = .00034 (was 0)
Add-one smoothed bigram probabilities (figure 6.7 from text)
i:c(wxwi)=0p*(wi|wx) = T(wx) / (N(wx) + T(wx))
Witten-Bell smoothed (discounted) bigram counts (figure 6.9 from text)
Notice that counts which were 0 unsmoothed are <1 smoothed; contrast with Add-One Smoothing.