1 / 20

Lexical Acquisition

Lexical Acquisition. Extending our information about words, particularly quantitative information. Why lexical acquisition?. “one cannot learn a new language by reading a bilingual dictionary” -- Mercer Parsing ‘postmen’ requires context

gwidon
Download Presentation

Lexical Acquisition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lexical Acquisition Extending our information about words, particularly quantitative information

  2. Why lexical acquisition? • “one cannot learn a new language by reading a bilingual dictionary” -- Mercer • Parsing ‘postmen’ requires context • quantitative information is difficult to collect by hand • e.g., priors on word senses • productivity of language • Lexicons need to be updated for new words and usages

  3. Machine-readable Lexicons contain... • Lexical vs syntactic information • Word senses • Classifications, subclassifications • Collocations • Arguments, preferences • Synonyms, antonyms • Quantitative information

  4. Gray area between lexical and syntactic • The rules of grammar are syntactic. • S ::= NP V NP • S ::= NP [V NP PP] • But which one to use, when? • The children ate the cake with their hands. • The children ate the cake with blue icing.

  5. Outline of chapter • verb subcategorization • Which arguments (e.g. infinitive, DO) does a particular verb admit? • attachment ambiguity • What does the modifier refer to? • selectional preferences • Does a verb tend to restrict its object to a certain class? • semantic similarity between words • This new word is most like which words?

  6. Verb subcategorization frames • Assign to each verb the sf’s legal for it. (see diagram) • Crucial for parsing. • She told the man where Peter grew up. • (NP NP S) • She found the place where Peter grew up. • (NP NP)

  7. Brent’s method (1993) • Learn subcategorizations given a corpus, lexical analyzer, and cues. • A cue is a pair <L,SF>: • L is a star-free regular expression over lexemes • (OBJ | SUBJ-OBJ | CAP) (PUNC | CC) • SF is a subcategorization frame • NP NP • Strategy: find verb sf’s for which the cues provide strong evidence.

  8. Brent’s method (cont’d) • Compute the error rate of the cue E = Pr(false positives) • For each verb v and cue c = <L,SF>, • Test the hypothesis H0 that verb v does not admit SF. • pE = • If pE < a threshold, reject H0.

  9. Subcategorization Frames: Ideas • Hypothesis testing gives high precision, low recall. • Unreliable cues are necessary and helpful (independence assumption) • Find SF’s for verb classes, rather than verbs, using a buggy tagger. • As long as error estimates are incorporated into pE, it works great. • Manning did this, and improved recall.

  10. Attachment Ambiguity: PPs • NP V NP PP -- Does PP mdify V or NP? • Assumption: there is only one meaningful parse for each sentence: • The children ate the cakewith a spoon. • Bush sent 100,000 soldiersinto Kuwait. • Brazil honored their dealwith the IMF. • Straw man: compare co-occurrence counts between pairs <send, into> and <soldiers, into>.

  11. Bias defeats simple counting • Prob(into | send) > Prob(into | soldiers). • Sometimes there will be strong association between PP and both V and NP. • Ford ended its venturewith Fiat. • In this case, there is a bias toward “low attachment” -- attaching PP to the nearer referent, NP.

  12. Hindle and Ruth (1993) • Elegant (?) method of quantifying the low attachment bias • Express P(first PP after object attaches to object) and P(first PP after object attaches to verb) as a function of P(NA) = P(there is a PP following the object attaching to object) and P(VA) = P(there is a PP following the object attaching to verb) • Estimate P(NA) and P(VA) based on counting

  13. Estimating P(NA) and P(VA) • <v,n,p> are a particular verb, noun, and preposition • P(VAp | v) = • (# times p attaches to v)/(# occs of v) • P(NAp | n) = • (# times p attaches to n)/(# occs of v) • The two are treated as independent!

  14. Attachment of first PP • P(Attach(p,n) | v,n) = P(NAp | n) • Whenever there is a PP attaching to the noun, the first such PP attaches to the noun! • P(Attach(p,v) | v,n) = P((not NAp) | n) P(VAp | v) • Whenever there is no PP attaching to the noun, AND a PP attaching to verb… • I (put the [book on the table) on WW2]

  15. Selectional Preferences • Verbs prefer classes of subjects, objects: • Objects of ‘eat’ tend to be food items • Subjects of ‘think’ tend to be people • Subjects of ‘bark’ tend to be dogs • Used to • disambiguate word sense • infer class of new words • rank multiple parses

  16. Disambiguate the class (Resnick) • She interrupted the chair. • A(nc) = D(P(nc | v) || P(nc)) = P(nc|v)log(P(nc|v)/P(nc)) • Relative entropy, or Kullback Leibler distance • A(furniture) = P(furniture | interrupted) * log((P(furniture | interrupted) / P(furniture))

  17. Estimating P(nc | v) • P(nc | v) = P(nc,v) / P(v) • P(v) is estimated to be the proportion of occurrences v among all verbs • P(nc,v) is proposed to be • 1/N Σ(n in nc) C(v,n)/|classes(n)| • Now just take the class with highest A(nc) for maximum likelihood word sense.

  18. Semantic similarity • Uses • classifying a new word • expand queries in IR • Are two words similar... • When they are used together? • IMF and Brazil • When they are on the same topic? • astronaut and spacewalking • When they function interchangeably? • Soviet and American • When they are synonymous? • astronaut and cosmonaut

  19. Cosine is no panacea • Corresponds to Euclidean distance between points • Should document-space vectors be treated as points? • Alternative: treat them as probability distributions (after normalizing) • Now, no reason to use cosine. Why not try information-theoretic approach?

  20. Alternatives distance metrics to cosine • Cosine of square roots (Goldszmidt) • L1 norm -- Manhattan distance • Sum of absolute value of difference of components • KL Distance • D(p || q) • Mutual information (why not?) • D(p ^ q || pq) • Information radius -- information lost describing both p and q by their midpoint. • IRAD(p,q) = D(p||m) + D(q||m)

More Related