1 / 26

Evaluating the Inferential Utility of Lexical-Semantic Resources

Shachar Mirkin Joint work with: Ido Dagan, Eyal Shnarch EACL-09. Evaluating the Inferential Utility of Lexical-Semantic Resources. You are here. Quick Orientation. Quick Orientation – Lexical Inference. Who won the football match between Israel and Greece on Wednesday?.

dash
Download Presentation

Evaluating the Inferential Utility of Lexical-Semantic Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Shachar Mirkin Joint work with: Ido Dagan, Eyal Shnarch EACL-09 Evaluating the Inferential Utility of Lexical-Semantic Resources

  2. You are here Quick Orientation

  3. Quick Orientation – Lexical Inference Who won the football match between Israel and Greece on Wednesday?

  4. Quick Orientation – Lexical Inference Who won the footballmatch between Israel and Greece on Wednesday? ATHENS, April 1 (Reuters) – Greece beats Israel 2-1 in their World Cup Group Two qualifier.

  5. Motivation • Common knowledge: Lexical relations are useful for semantic inference • Common practice: Exploit lexical-semantic resources • WordNet - synonymy, hyponymy • Distributional-similarity • Yet, no clear picture: • Which semantic relations are needed? • How and when they should be utilized? • What’s available in current resources and what’s missing? • Our goal - clarify the picture • thru comparative evaluation

  6. A Textual Entailment Perspective • Generic framework for semantic inference • Recognizing that one text (h) is entailed by another (t) • Addresses variability in text • Applied semantic inference reducible to entailment • Useful for generic evaluation of lexical inference

  7. Lexical-semantic relationships t: Dear EACL 2009 Participant, We are sorry to inform you that an Air Traffic and Public Transportation strike has been announced for Thursday 2 April, 2009. h: Athens’ Metro services disrupted in April 2009.

  8. verb entailment located in • Terminology: • Lexical Entailment • Entailment Rules : LHS RHS • strike disrupt • Rule Application hypernymy Lexical-semantic relationships t: Dear EACL 2009 Participant, We are sorry to inform you that an Air Traffic and Public Transportationstrike has been announced for Thursday 2 April, 2009. h: Athens’ Metro services disrupted in April 2009. • Should be found in knowledge resources • but often not available

  9. Lexical-semantic relationships t: Dear EACL 2009 Participant, We are sorry to inform you that an Air Traffic and Public Transportation strike has been announced for Thursday 2 April, 2009. h: Athens’ Metro services disrupted in April 2009.

  10. Lexical-semantic relationships t: Dear EACL 2009 Participant, We are sorry to inform you that an Air Traffic and Public Transportation strike has been announced for Thursday 2 April, 2009. h: Athens’ Metro disruptions • Same Inference when h is a lexical phrase (e.g. IR)

  11. Evaluating Lexical-semantic Resources

  12. Resources for Lexical Semantic Relationships • Plenty of resources are out there • None dedicated for lexical entailment inference • We evaluated 7 popular resources, of varying nature: • Construction method • Relation types • Extracted relations which: • Are commonly used in applications • Correspond to lexical entailment

  13. Evaluated Resources Statistical extension of WordNet Corpus-based Snow Based on human knowledge CBC Lin-Dep Lin-Prox Wiki WordNet XWN

  14. Evaluation Rational • Evaluation Goal • Assess the practical utility of resources • Resource’s utility • Depends on the validity of its rule applications • Vs. % of correct rules • Many correct & incorrect rules may hardly be applied • Simulate rule applications and judge their validity • Instance-based evaluation (rather than rule-based)

  15. Evaluation Scheme Input: • Entailment rules from each resource • A sample of test hypotheses • 25 noun-noun queries from TREC 1-8 • railway accidents; outpatient surgery; police deaths • Texts from which the hypotheses may be inferred • TREC corpora Evaluation flow: • Apply rules to find possibly entailing texts • Judge rule applications • Utilize human annotation to avoid dependence on a specific system

  16. Rules Resource r1 = lakewater r2 = soilwater + - Evaluation Methodology Generate intermediate hypotheses h’1= lake pollution h’2 = soil pollution Test Hypotheses h = water pollution for each word in h … corpus Retrieve matching texts t1, t2, t3, … … does t entail h’ ? does t entail h? sample texts valid rule application yes yes Chemicals dumped into the lake are the main cause for its pollution High levels of air pollution were measured around the lake no no invalid rule application t is discarded Soilpollution happens when contaminants adhere to the soil

  17. Results - Metrics • Precision: • Percentage of valid rule applications for the resource • Total number of texts entailing the hypothesis is unknown • Absolute recall cannot be measured • Recall-share: • % of entailing sentences retrieved by the resource rules, relative to all entailing texts retrieved by both the original hypothesis and the rules • Macro-average figures

  18. Results • Precision: • Precision generally quite low • Relatively high precision for resources based on human knowledge • Vs. corpus-based methods • Snow – still high precision • Recall: • Some resources’ obtain very little recall • WordNet’s recall limited • Many more relations are found within (inaccurate) distributional-similarity resources

  19. Results Analysis: Current Scope and Gaps

  20. Missing Relations • Coverage of most resources is limited • Lin’s coverage substantially larger than WordNet’s • But not usable due to low precision • Missing instances of existing WordNet relations • Proper names • Open class words • Missing non-standard relation types  next slide

  21. Non-Standard Entailment Relations • Such relations had significant impact on recall • Don’t comply with any WordNet relation • Mostly in Lin’s resources (1/3 of their recall) • Sub-types examples: • Topical entailment - IBM (company) computers • Consequential - childbirth  motherhood • Entailments of arguments by predicate – breastfeeding  baby • Often non-substitutable

  22. Required Auxiliary Info (1) • Additional information needed for proper rule application: • Should be attached to rules in resources; • and considered by Inference systems • Rules’ priors • Likelihood of a rule to be correctly applied in arbitrary context • Some information is available (WordNet’s sense order, Lin’s ranks) • Empirically tested - not sufficient on its own (too much recall lost) • Using top-50 rules, Lin-prox loses 50% of relative recall • Using first-sense: WordNet loses 60%

  23. Required Auxiliary Info (2) Lexical context • Known issue: rules should be applied only in appropriate contexts • Main reason for relatively low precision of WordNet • Addressed by WSD or context-matching models Logical context • Some frequently-ignored relations in WordNet are significant: • efficacy  ineffectiveness (antonymy) • arms  guns (hypernymy) • government  official (holonymy) • 1/7 of Lin-Dep recall • Require certain logical conditions to occur • Include info about suitable lexical & logical contexts of rules • Combine prior with context models scores (Szpektor et al. 2008) • Needed: typology of relations by inference types

  24. Conclusions • Current resources far from being sufficient • Lexical relations should be evaluated relative to applied inference • Rather than on correlations with human associations or WordNet • Need dedicated resources for lexical inference rules • Acquire additional missing rule instances • Specify and add missing relation types • Add auxiliary information needed for rule application

  25. Conclusions – Community Perspective • Observation: missing feedback about resource utility for inference in applications • Resources and applications typically developed separately • Need tighter feedback between them • Community effort required: • Publicly available resources for lexical inference • Publicly available inference applications • Application-based evaluation datasets • Standardize formats/protocols for their integration

  26. Shachar Mirkin mirkins@cs.biu.ac.il Thank you!

More Related