1 / 37

Effectiveness of Indirect Dependency for Automatic Synonym Acquisition

Effectiveness of Indirect Dependency for Automatic Synonym Acquisition. Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama Graduate School of Information Science, Nagoya University. Outline. Introduction Comparison of contextual information Sentence co-occurrence, proximity, dependency

rollin
Download Presentation

Effectiveness of Indirect Dependency for Automatic Synonym Acquisition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Effectiveness of Indirect Dependencyfor Automatic Synonym Acquisition Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama Graduate School of Information Science, Nagoya University

  2. Outline • Introduction • Comparison of contextual information • Sentence co-occurrence, proximity, dependency • Indirect dependency • Formalization • Context extraction • Synonym acquisition method • Evaluation • Experiment • Conclusion

  3. Outline • Introduction • Comparison of contextual information • Sentence co-occurrence, proximity, dependency • Indirect dependency • Formalization • Context extraction • Synonym acquisition method • Evaluation • Experiment • Conclusion

  4. Introduction • Automatic synonym acquisition • Basic technique for automatic thesaurus/ontology construction and various NLP tasks • Based on Distributional Hypothesis [Harris 1985] • “Semantically similar words share similar contexts” • Extracts and uses various contexts of words

  5. Contextual Information Co-occurrences (w, c) Surr. words Sent. coocr. (breakfast, dobj:have:*:_) (lunch, iobj:for:go:*) (tea, dobj:have:*:_) ・・・ Selection prox sent Extraction Dependency(dep) Similarity Calculation (Similarity measures or Language models) subj obj mod etc Little attention Extensively studied Common approach for synonym acquisition Corpus Effective contextual information needs to be investigated

  6. Contextual Information Co-occurrences (w, c) Surr. words Sent. coocr. (breakfast, dobj:have:*:_) (lunch, iobj:for:go:*) (tea, dobj:have:*:_) ・・・ Selection prox sent Extraction Dependency(dep) Similarity Calculation (Similarity measures and Language models) subj obj mod etc First goal • Investigation of effective contexts for automatic synonym acquisition • Sentence co-occurrence • Proximity • Dependency Corpus

  7. Outline • Introduction • Comparison of contextual information • Sentence co-occurrence, proximity, dependency • Indirect dependency • Formalization • Context extraction • Synonym acquisition method • Evaluation • Experiment • Conclusion

  8. (context) (word) Category(1) – Sentence co-occurrence (sent) dep prox sent • Sentences in which words appear • Assume that words that commonly appear in the similarsentences are semantically similar

  9. (target) (prox) January January January January January January L1:since L2:level L3:relatively R1:, R2:the R3:Commerce Category(2) – Proximity (prox) dep prox sent • Words that appear in the vicinity of the target word • Consider a window centered at the target,and extract the words located within Shipments have been relatively level since January, the Commerce Department noted. Window:3 tokens on the both sides of target

  10. ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj (word) (dependency) since since January the Commerce Department Department (dobj * January) (ncmod _ note *) (dobj since *) (det Department *) (ncmod _ Department *) (det * the) (ncmod _ * Commerce) Category(3) – Dependency (dep) dep prox sent • Dependency between words Shipments have been relatively level since January, the Commerce Department noted.

  11. Proximity(prox) performed better thanthe widely-used Dependency(dep) Performance evaluation experiment dep prox sent Which category of context (sent, prox, dep) is effective for synonym acquisition?

  12. Difference of prox and dep dep prox sent • Why is prox better than dep? Dependency ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj Shipments have been relatively level since January, the Commerce Department noted.

  13. Surrounding words with no dependency with the target may cause performance difference prox subsumes most of dep prox-based context dep-based context Difference of prox and dep dep prox sent • Why is prox better than dep? Dependencycontained in prox (distance≦3) ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj Shipments have been relatively level since January, the Commerce Department noted.

  14. What kind of syntactical relationship do these contexts have? January <comma> <period> Department noted the Difference of prox and dep dep prox sent • Why is prox better than dep? Dependency ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj Shipments have been relatively level since January, the CommerceDepartment noted. Target: Commerce dep prox

  15. What kind of syntactical relationship do these contexts have? January <comma> <period> Department noted the Difference of prox and dep dep prox sent Consider indirect dependency oftwo continuous dependency relations • Why is prox better than dep? Dependency ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj Shipments have been relatively level since January, the CommerceDepartmentnoted. Target: Commerce dep prox

  16. Indirect dependency causesperformance increase? Difference of prox and dep prox d.dep i.dep Consider Indirect dependency oftwo continuous dependency relations • Why is prox better than dep? Dependency ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj Shipments have been relatively level since January, theCommerceDepartmentnoted. Target: Commerce January <comma> <period> Department noted the direct dep prox indirect dep

  17. Objectives • Propose indirect dependency as a wayto enhance the contextual information • Indirect dependency = multiple steps of dependency • Direct dependency = single dependency step • Investigate the effectiveness of indirect dependency for automatic synonym acquisition

  18. Outline • Introduction • Comparison of contextual information • Sentence co-occurrence, proximity, dependency • Indirect dependency • Formalization • Context extraction • Synonym acquisition method • Evaluation • Experiment • Conclusion

  19. ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj aux-of ncmod-of dobj-of ncmod-of ncsubj-of det-of xcomp-of ncsubj-of ncmod-of ccomp-of Corresponding inverse relationsare also included Indirect dependency Direct dependency → a binary relation D Shipments have been relatively level since January, the Commerce Department noted.

  20. Indirect dependency Direct dependency → a binary relation D Indirect dependency → composition D2 ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj Shipments have been relatively level since January, the Commerce Department noted. aux-of ncmod-of dobj-of ncmod-of ncsubj-of det-of xcomp-of ncsubj-of ncmod-of ccomp-of

  21. Label composition ncmod∘ncsubj Indirect dependency Direct dependency → a binary relation D Indirect dependency → composition D2 ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj Shipments have been relatively level since January, the Commerce Department noted. aux-of ncmod-of dobj-of ncmod-of ncsubj-of det-of xcomp-of ncsubj-of ncmod-of ccomp-of

  22. Indirect dependency Direct dependency → a binary relation D Indirect dependency → composition D2 ccomp ncsubj∘xcomp-of ncmod ncmod∘ncsubj ncsubj xcomp det aux ncmod dobj ncmod ncsubj Shipments have been relatively level since January, the Commerce Department noted. aux-of ncmod-of dobj-of ncmod-of ncsubj-of det-of xcomp-of ncsubj-of ncmod-of ccomp-of

  23. dobj∘ncmod iobj∘dobj Indirect dependency Direct dependency → a binary relation D Indirect dependency → composition D2 ncsubj det ncmod dobj aux iobj dobj det The driver on the truck was looking for me.

  24. Outline • Introduction • Comparison of contextual information • Sentence co-occurrence, proximity, dependency • Indirect dependency • Formalization • Context extraction • Synonym acquisition method • Evaluation • Experiment • Conclusion

  25. Analyzed by RASP2 GR (n-ary) word context C1 (ncsubj be Shipment _) (aux be have) (xcomp _ be level) (ncmod _ be relatively) (ccomp _ level note) (ncmod _ note since) (ncsubj note Department _) Shipment - (ncsubj be * _) be - (ncsubj * Shipment _) have - (aux be *) be - (aux * have) be - (xcomp _ * level) level - (xcomp _ be *) ... ... Context extraction for indirect dependency Shipments have been relatively level since January, the Commerce Department noted.

  26. ncsubj∘xcomp-of ncsubj xcomp xcomp-of ncsubj-of context C2 - (ncsubj (xcomp _ * level) * _) substitute Context extraction for indirect dependency Shipments have been relatively level since January, the Commerce Department noted. word context C1 Shipment - (ncsubj be * _) be - (ncsubj * Shipment _) have - (aux be *) be - (aux * have) be - (xcomp _ * level) level - (xcomp _ be *) ...

  27. - (xcomp _ (ncsubj * Shipment) *) Context extraction for indirect dependency xcomp∘ncsubj-of ncsubj xcomp Shipments have been relatively level since January, the Commerce Department noted. xcomp-of ncsubj-of word context C1 context C2 - (ncsubj (xcomp _ * level) * _) Shipment - (ncsubj be * _) be - (ncsubj * Shipment _) have - (aux be *) be - (aux * have) be - (xcomp _ * level) level - (xcomp _ be *) substitute Cn(n≧3) is similarly generated ...

  28. Outline • Introduction • Comparison of contextual information • Sentence co-occurrence, proximity, dependency • Indirect dependency • Formalization • Context extraction • Synonym acquisition method • Evaluation • Experiment • Conclusion

  29. Used the commonly-used combination ofvector space model,tf.idf,and cosine similarity. Vector construction: Similarity calculation: Synonym acquisition Language models and similarity measures arenot within the scope of this study

  30. Outline • Introduction • Comparison of contextual information • Sentence co-occurrence, proximity, dependency • Indirect dependency • Formalization • Context extraction • Synonym acquisition method • Evaluation • Experiment • Conclusion

  31. Evaluation measures • Measure(1) – Average precision (AP) • Averaged precision values over 11 recall points • Based on the “reference set” created from three existing thesauri: WordNet, Roget’s, and COBUILD thesaurus • Measure(2) – Correlation coefficient (CC) • Correlation between “reference similarity” and cosine similarity • Reference similarity … calculated based on the depth ofword nodes in WordNet tree structure

  32. Outline • Introduction • Comparison of contextual information • Sentence co-occurrence, proximity, dependency • Indirect dependency • Formalization • Context extraction • Synonym acquisition method • Evaluation • Experiment • Conclusion

  33. Experiment – Conditions • Corpora (1) Brown Corpus (BROWN) (approx. 60,000 sentences (2) Wall Street Journal (WSJ) (approx. 68,000 sentences) (3) WordBank (WB) (approx. 190,000 sentences) • Limited to noun synonyms • Frequency cutoff … removed words and contextsappearing less than θftimes • θf = 5 (BROWN, WSJ), θf = 15 (WB)

  34. Drastic improvementby indirect dependency Significantly betterthan prox dep3 affects little Experiment – Result • Compared the performance of: • prox, dep1, dep2, dep12, dep123

  35. Experiment – Result • Compared the performance of: • prox, dep1, dep2, dep12, dep123 Effectiveness of indirect dependencyfor synonym acquisition Consistent results with all the corpora

  36. word-contextco-occurrence matrix Non-zero elements of the matrix ≒computational complexity Experiment – Comparison of data size dep12 … context with good quality achieves higher performance with lower cost

  37. Conclusion • Effectiveness of indirect dependency forautomatic synonym acquisition • Indirect dependency = Composition ofdirect dependency • Performance improvement over direct dependency • Achieves higher performance with lower costthan prox Future Works • Confirmation using other parsers and similarity measures • Other kinds of contexts and their performance

More Related