1 / 17

IBM Clustering: after Brown et al

IBM Clustering: after Brown et al. Word-based n-gram models seem to be willfully obtuse: they use the information that words contain, but overlook the information that in certain contexts we’re very likely to get a noun (or adjective, etc.), even if we don’t know which one.

quinto
Download Presentation

IBM Clustering: after Brown et al

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IBM Clustering:after Brown et al

  2. Word-based n-gram models seem to be willfully obtuse: they use the information that words contain, but overlook the information that in certain contexts we’re very likely to get a noun (or adjective, etc.), even if we don’t know which one.

  3. The following model tries to take that into account. We model the probability of word S[i] being word wn as being wn’s probability, given that it’s a member of Category k, times the probability that a Category k should follow the preceding category.

  4. Joint probability of a word/category sequence • We could calculate the joint probability of the sequence(s): C1 C2 C3 C4 W1 W2 W3 W4

  5. Prob (Wi, Ci) = Prob (Wi | Ci ) * Prob (Ci|Ci-2); but if we want this to help us compute a distribution of the probabilities for the next word in a sentence, we have to sum over all relevant category sequences…

  6. In addition, we could look at category trigrams; or we could use this (category-based) method as a back-off strategy…

  7. Category trigrams: Prob (Wi, Ci) = Prob (Wi | Ci ) * Prob (Ci|Ci-2 Ci-1)

  8. How to find categories? • Brown et al 1990 suggest essentially this: Set up 1,000 different “lexical categories”, each with one member: the 1,000 most frequent words. (Why 1,000?) Consider all 1000*999/2 ways of collapsing these, and pick the one which minimizes the decrease in the mutual information that you get when you pass from a system with 1,000 categories to one with 999 categories…. Repeat until you’re done.

  9. pointwise mutual information for finding collocations

  10. Mexican hat neighborhood

  11. Examples of inferred categories • Friday Monday Thursday Tuesday Saturday Sunday weekends Sundays Saturdays • People guys folks fellows CEOs chaps doubters commies unfortunates blokes • Down backwards ashore sideways southward northward overboard aloft • That that heat • Head body hands eyes voice arm seat eye hair mouth • Water coal gas liquid acid sand carbon steam shale iron

More Related