1 / 18

Xiaofeng Yang, Jian Su ACL 2007

Coreference Resolution Using Semantic Relatedness Information from Automatically Discovered Patterns. Xiaofeng Yang, Jian Su ACL 2007. Introduction. Coreference resolution is the process of determining whether two expressions in natural language refer to the same entity in the world.

gita
Download Presentation

Xiaofeng Yang, Jian Su ACL 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coreference Resolution Using Semantic Relatedness Information from Automatically Discovered Patterns Xiaofeng Yang, Jian Su ACL 2007

  2. Introduction • Coreference resolution is the process of determining whether two expressions in natural language refer to the same entity in the world. •  Semantic relatedness  • Noun phrases used to refer to the same entity should have a certain semantic relation • How to we determine semantic relatedness? • WordNet (Ng and Cardie, 2002 etc.) • Search for patterns (Hearst 1998. etc) •  Pattern Selection done in adhoc manner  • Concerns: accuracy and breadth  • Objectives of this paper: • Automatically acquire and evaluate patterns • Mine patterns for semantic relatedness information

  3. Some Examples • Multiple cultivars of fruitssuch as apples are sometimes grafted on a single tree • EU institutions and other bodies.. • Disasters such as the earthquake and tsunami in Japan 

  4. Baseline Coreference System • i{NPi, NPj} where  • NPi = antecedent candidate • NPj = anaphor • Training • For each NPj • Create a single positive training instance for its closest antecedent • Add negative training instances for every intervening NP between NPjand its antecedent • Testing • Process input document from first NP to last • For each encountered NPj, • Create a test instance for each antecedent candidate

  5. NPi-2, NPi-1, NPi, NPi+1, NPi+2,NPi+3, NPj Training NPi, NPj(+) NPi+1, NPj (-) Ni+2, NPj (-) NPi+3, NPj (-) Testing • NPi-2, NPj • NPi-1, NPj NPi, NPj NPi+1, NPj Ni+2, NPj NPi+3, NPj

  6. Incorporate Non-Anaphors in Training • Apply learned classifier to all the non-anaphors in the training documents. • For each non-anaphor that is classified as (+) • Pair the non-anaphor and its false antecedent to create a negative example • Add these negative examples to original training set • Classifier capable of • Antecedent identification • Non-anaphor identification

  7. Acquiring Patterns • Derive patterns to indicate a specific semantic relation • Use NP pairs in the training instances as seeds • Except: • When NPi  or NPj are pronouns • NPi  and NPj have the same head word • i{NPi, NPj} = seed (Ei: Ej) • i{“Bill Clinton”, “the former president”}  (“Bill Clinton”:“president”) • S+ and S- : Set of seed pairs derived from the positive and the negative training instances

  8. Acquiring Patterns • A seed pair could belong to S+ can S- at the same time? • For each of the seed NP pairs (Ei : Ej ) • Search a large corpus for the strings • Match the regular expression “Ei * * * Ej” or “Ej * * * Ei” • For each retrieved string • Extract a surface pattern by replacing expression Ei with a mark <#t1#> and Ej with <#t2#>. • If the string is followed by a symbol, the symbol will be also included in the pattern.

  9. (Bill Clinton : president) (S1) “Bill Clinton is elected President of the United States.” (S2) “The US President, Mr. Bill Clinton, today advised India to move towards nuclear nonproliferation and begin a dialogue with Pakistan to ...” • Patterns are • P1: <#t1#> is elected <#t2#> • P2: <#t2#> , Mr <#t1#> • |(Ei , p, Ej )| = number of strings matched by a pattern p instantiated with (Ei :Ej ) • Reference patterns = All the patterns derived from the positive seed pairs

  10. Scoring Patterns • Frequency • Freqency(p) = |{s|s ∈ S+, p ∈ P List(s)} • Reliability • Pointwise mutual information (pmi) • pmi(x, y) = log P (x, y)/ P (x)P (y) • (PMI) between pattern p and a (+) seed pair • pmi(p, (Ei : Ej )) = log |(Ei,p,Ej )| / |(∗,∗,∗)| |(Ei,∗,Ej )| |(∗,p,∗)| |(∗,∗,∗)| |(∗,∗,∗)|

  11. Reliability

  12. Pattern Features • Directly use the reference patterns as a set of features • select the most effective patterns • rank the patterns according to their • scores and then choose the top patterns • if a pattern also occurs frequently with (-) seed pairs • may lead to many false positive pairs during resolution. • filter the patterns based on their accuracy

  13. Semantic Relatedness Feature • Single feature to reflect reliability that a NP pair is related in semantics. • Only reference patterns among PList(Ei:Ej ) are involved in the feature computing. • SRel(i{NPi, NPj}) = 1000 ∗ ∑p∈P List(Ei:Ej ) pmi(p, (Ei : Ej )) ∗ r(p) • pmi(p, (Ei : Ej )) is the PMI • r(p) is the reliability score of p

  14. Experimental setup • ACE-2 V1.0 corpus (NIST, 2003) • newswire (NWire), newspaper (NPaper), and broadcast news (BNews) • pattern extraction and feature computing, we used Wikipedia (220 Million words). • Raw text preprocessed by NLP pipeline • Sentence boundary detection, POS-tagging, Text Chunking and Named-Entity Recognition • Two different classifiers were learned respectively for resolving pronouns and non-pronouns. • Pattern based semantic information was only applied to the non-pronoun resolution

  15. Top Patterns

  16. Results

  17. Pattern Features • Evaluated only based on frequency • Top patterns appositive structure • “X, an/a/the Y” • leads to the lowest precision. • Filtered by accuracy • Top patterns with both high frequency and high accuracy are those for the copula structure • “X is/was/are Y” • yields the highest precision with the lowest recall • Low accuracy features prone to false positives eliminated • PMI Reliability: • appositive and copula structures • highest recall with a medium level of precision

  18. Observations • Pattern features only work well for NP pairs containing proper names • error analysis shows that a • non-anaphor is often wrongly resolved to a false antecedent once the two NPs happen to satisfy a pattern feature, which affects precision largely • Patterned based semantic information seems more effective in the NWire domain than the other two

More Related