1 / 57

Relational Entity Linking with Cross Document Coreference

Relational Entity Linking with Cross Document Coreference. Xiao Cheng, Bingling Chen, Rajhans Samdani , Kai-Wei Chang, Zhiye Fei and Dan Roth University of Illinois at Urbana-Champaign (UI_CCG). Talk Outline. Introduction Architecture Entity Linking Approach Preprocessing

ban
Download Presentation

Relational Entity Linking with Cross Document Coreference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relational Entity Linking with Cross Document Coreference Xiao Cheng, Bingling Chen, RajhansSamdani, Kai-Wei Chang, ZhiyeFeiand Dan Roth University of Illinois at Urbana-Champaign (UI_CCG)

  2. Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Reconciliation • Evaluation

  3. Entity Linking Specification <query id="EL13_ENG_0015"> <docid>bolt-eng-DF-170-181137-9030298</docid> <name>Lightning Bolts</name> <beg>15959</beg> <end>15973</end> </query> Query Output

  4. Entity Linking using Wikification and Cross-Doc Coref Cross DocumentCoreference

  5. Wikification Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd(D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

  6. Wikification Challenges • Ambiguity • Concepts outside of KB (NIL) • Blumenthal? • Variability • Scale • Millions of labels Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd(D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. The New York Times The Times Times CT The Nutmeg State Connecticut

  7. Key Innovation • Improved Wikification for Structured EL • Relational Inference for Linking (Cheng and Roth, EMNLP’13) • No retraining • Non-trivial cross-document clustering • Best Latent Left-Linking approach (Samdani et al. ’12)

  8. Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Evaluation

  9. Entity Linking Architecture TAC Query Preprocessing Purposeful Coreference Query Normalization Document Transformation Linking Problem Linking Reconcile Linking Clusters Wikification Supervise Cross-Doc Coreference

  10. Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Evaluation

  11. Preprocessing Obomber, Obamadinejad, Osama Obama, Nobama, Obambi, Obamination, ObaMao, Owe Bama, 0bama, O-balm-a, O-bomb-a • Query normalization • Handling spelling mistakes and slangs – one of the reasons we did not achieve expected performance • In document coreference – some coreferent mentions are easier to link than the query mention

  12. Preprocessing Original Opening Coreferent Context Query Context • Document transformation • Document can be as long as 100k characters for a single query • Need to truncate documents but minimize the loss of critical contexts

  13. Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Reconciliation • Evaluation

  14. Wikification Bottleneck • State-of-the-art Wikification systems (Ratinov et al. 2011) can achieve the above with local and global statistical features • Reaches bottleneck around 70%~ 85% F1 on non-wiki datasets • What is missing? Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd(D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

  15. Motivating Example Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … Egyptian President Hosni Mubarak , the of deposed , … Mubarak wife • What are we missing with Bag of Words (BOW) models? • Who is Mubarak? • Constraining interaction between concepts • (Mubarak, wife, Hosni Mubarak)

  16. Relational Inference for Wikification Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … • (Mubarak, wife, Hosni Mubarak) • Our contribution • Identify key textual relations for Wikification • A global inference framework to incorporate relational knowledge • Significant improvement over state-of-the-art Wikification systems

  17. Traditional Wikification 1 - Mention Segmentation ...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Sub noun phrase chunks NER Capitalized phrases

  18. Traditional Wikification 1 - Mention Segmentation ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party… Obtains nested mentions

  19. Traditional Wikification 2 - Candidate Generation ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party… • Approach • Collect known mappings from Wikipedia page titles, hyperlinks… • Limit to top-K candidates based on frequency of links (Ratinov et al. 2011)

  20. Traditional Wikification 3 - Candidate Ranking ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party… Local and global statistical features

  21. Traditional Wikification 4 – Determine NILs ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party… • This answer is wrong • We did not generate the correct candidate based on top-K prior • Is the top candidate really what the text referred to? • Binary classifier

  22. Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Reconciliation • Evaluation

  23. Formulation (0) Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … • (Mubarak, wife, Hosni Mubarak) • Intuition • Promote pairs of candidate concepts coherent with textual relations

  24. Formulation (1) weight to output Whether to output th candidate of the th mention weight of a relation Whether a relation exists between and Formulate as an Integer Linear Program (ILP): If no relation exists, collapse to the unstructured decision

  25. Formulation (2) ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party… r(1,2)34 r(4,3)34 • eki: whether a concept is chosen • ski : score of a concept • r(k,l)ij: whether a relation is present • w(k,l)ij: score of a relation

  26. Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Reconciliation • Evaluation

  27. Overall Approach

  28. Relation Identification • ACE style in-document coreference (Chang et al. ‘13) • Extract named entity-only coreference relations with high precision • Syntactico-Semantic relations (Chan & Roth ‘10) • Easy to extract with high precision • Aim for high recall, as false-positives will be filtered • Sparse, but covers ~80% relation instances in ACE2004

  29. Relation Identification ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party…

  30. Overall Approach

  31. Relation Retrieval ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party… • What concepts can “Socialist Party” refer to? • More robust candidate generation • Identified relations are verified against a knowledge base (DBPedia)

  32. Relation Retrieval ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party… q1=(Socialist Party of France,?, *Milošević*) q2=(Slobodan Milošević,?,*Socialist Party*) • Query Pruning • Only 2 queries per pair necessary due to strong baseline.

  33. Relation Retrieval

  34. Relation Retrieval ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party…

  35. Overall Approach

  36. Relational Inference - coreference ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party…

  37. Determine unknown concepts (NILs) Dorothy Byrne, a state coordinator for the Florida Green Party,… nominal mention • How to capture the fact: • “Dorothy Byrne” does not refer to any concept in Wikipedia • Identify coreferent nominal mention relations • Generate better features for NIL classifier

  38. Determine unknown concepts (NILs) Dorothy Byrne, a state coordinator for the Florida Green Party,… nominal mention • Create NIL candidate for structured inference • e.g. corrects other coreferent “Dorothy” later in the document

  39. Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Reconciliation • Evaluation

  40. Cross Document Coreference • Naomi Campbell to give evidence at Charles Taylortrial: spokeswoman. • Supermodel Campbell says 'nothing to gain' from Taylortrial testimony. • NILs can be viewed as KB entries with partial information • A uniform model for entity representation • Shared features with Entity Linking system • Can be supervised using existing EL systems • Cross document coreference cluster example:

  41. Cross Document Coreference Approach • Run document-level coreference • Aggregate all features in a document-level coreferent cluster • Use both mention-level features and document-level features • String similarity features (NESim, Do et al. ‘09) • Context TF-IDF similarity features • Document-level cluster features • Training: using both TAC data and Wikifier generated data

  42. Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Reconciliation • Evaluation

  43. Query mapping reconciliation Seattle (0.7) [Seattle] has won… [Seattle] Seahawks ended the game… Seattle Seahawks (0.8) … cheered for [Seattle]… Seattle(0.2) • Max • {0.8,0.7,0.2} = Seattle Seahawks • Sum • {0.8,0.7+0.2} = Seattle • No Threshold • NIL classifier always outputs “non-NIL” • Same as Max otherwise

  44. Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Reconciliation • Evaluation

  45. Evaluation – TAC KBP 2011 Entity Linking *Median of top 14 systems • Run Relational Inference (RI) Wikifier “as-is”: • No retraining using TAC data

  46. Evaluation – TAC 2012 Entity Linking Error Analysis

  47. Official 2013 Performance

More Related