480 likes | 605 Views
This paper presents a multilingual annotation scheme for anaphora resolution aimed at enhancing question answering (QA) systems in Romance languages, specifically Italian, Spanish, and English. The authors discuss the principles and methodologies behind this corpus-based approach, which includes analyzing how anaphora operates in different languages. The discussed corpus consists of 200 topic-related questions per language, and the study evaluates the system's ability to identify antecedents and provide correct answers. Additionally, the paper highlights both the advantages and challenges of multilingual applications in QA.
E N D
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages AQA: a multilingual Anaphora annotation scheme for Question Answering • E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas Sierra [eboldrini/patricio/borja/marcel/]@dlsi.ua.es chelo.vargas@ua.es
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Outline • Introduction • Corpus • Principles • Previous work • Problematic cases • Evaluation • Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introductioninteraction • AQA: multilingual annotation scheme for anaphora resolution that can be applied in machine learning for the improvement of QA systems • To understand and annotate the way anaphora is used in each language • To be able to detect the antecedent of each the anaphora and find the correct answer • INTERACTION between the user and the system Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introductionlanguages • Languages: Italian, Spanish, English • Advantages: participate successfully in competitions in which the question is formulated in a language and the system shows the answer in another language • Disadvantages: languages with different characteristics Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introductionlanguages • Languages: Italian, Spanish, English • Advantages: can participate successfully in competitions in which the question is formulated in a language and the system shows you the answer in another language • Disadvantages: languages with different characteristics <t> <q id="q065"> ¿Qué medio de transporte se utilizó en la Expedición Kon-tiki? </q> <q id="q066"> ¿Cuántas personas <link rel="dir" status="ok" type="pron" ref="" ant="a" refq="q065">la</link> tripulaban? </q> </t> <t> <q id="q265"> Quale mezzo di trasporto venne usato nella spedizione Kon-Tiki? </q> <q id="q266"> Quanti membri d'equipaggio aveva <link rel="dir" status="ok" type="elips" ref="" ant="a" refq="q265">0</link>? </q> </t> <t> <q id="q465"> What transport was used in the Kon-Tiki Expedition? </q> <q id="q466"> How many people crewed <link rel="dir" status=”no" type="pron" ref="" ant=”q" refq="q465">it</link>? </q> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Corpus • Corpus for CLEF 2008 in English, Italian and Spanish • 200 questions per language • Topic-related questions • Categories of questions: factoid, definition, and list Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elements • Each group has a topic Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elements • Each group has a topic <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elements • If there is a subtopic, we mark it <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • Each question (question/answer pair) has a number Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • Each question (question/answer pair) has a number <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • Each anaphora has a number, the same of its antecedent <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We indicate if the antecedent is in the question or in the answer Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We indicate if the antecedent is in the question or in the answer <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We indicate if the antecedent is in the question or in the answer <t> <q id="q482"> Which city is the headquarters of the China's Eastern Fleet? </q> <q id="q483"> How far from China's capital city is <link rel="dir" status="ok" ant="a" refq="q482" type="pron" ref="">it</link>? </q> <q id="q484"> What was <link rel="indir" status="ok" ant="a" refq="q482" type="dd" ref="">its population</link> in 2002? </q> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We indicate the number of the question or the answer where the antecedent is situated Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We indicate the number of the question or the answer where the antecedent is situated <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of anaphora Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of anaphora <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of anaphora <t> <q id="q453"> In which country is <de id="n28">the Colditz Castle</de>? </q> <q id="q454"> Exactly in which state is <link rel="dir" status="ok" type="pron" ref="n28" ant="q" refq="q453">it</link>? </q> <q id="q455"> Who was the first who escaped from <link rel="dir" status="ok" type="adv" ref="n28" ant="q" refq="q453">there</link> ? </q> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of anaphora <t> <q id="q412"> Who published the Evangelium Vitae <de id="n6">encyclical</de>? </q> <q id="q413"> How many <link rel="dir" status="ok" ant="q" refq="q412" type="elips" ref="n6">0</link> did <link rel="dir" status="ok" ant="a" refq="q412" type="pron" ref="">he</link> publish? </q> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of relation Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of relation <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of relation <t> <q id="q416"> Which islands are in <de id="n9">the Pelagie Islands</de>? </q> <q id="q417"> Which is <link rel="indir" status="ok" type="dd" ref="n9" ant="q" refq="q416">the biggest one</link>? Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We underline if the annotator has doubts or not Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We underline if the annotator has doubts or not <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Previuos work • UCREL (Fligelstone, 1992; Garside et al., 1997): first scheme for anaphora resolution • MUC: inclusion of the coreference task in MUC-6 and MUC-7 • Last decade of 20th century: anaphora resolution project for French (Popescu, Belis and Robba, 1997). • Martínez-Barco and Palomar (2001): An annotation scheme for dialogues applied to anaphora resolution algorithm. • MATE/GNOME (Poesio, 2004): meta-model Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Previuos workwhat we added • MATE/GNOME (Poesio, 2004): meta-model • Element link in the text with the information about the anaphora • Identification of the question/answer pair • Topic/subtopic • Antecedent in the question or in the answer • Status of the annotation • Applied to three languages • Applied to collections of questions Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases • World knowledge • An antecedent contains another one • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases • World knowledge <t> <q id="q404"> Which was <de id="n2">the "gordo" in the 1995 Christmas</de>? </q> <q id="q405"> Which was <link rel="indir" status="no" type="dd" ref="n2" ant="q" refq="q404">the prize</link>? </q> </t> • An antecedent contains another one • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases • World knowledge • An antecedent contains another one <t> <q id="q427"> Who were <de id="n14">the founders of <de id="n15">Magnum Photos</de></de>? </q> <q id="q428"> In what year did <link rel="dir" status="ok" ant="q" refq="q427" type="pron" ref="n14">they</link> found <link rel="dir" status="ok" type="pron" ref="n15" ant="q" refq="q427">it</link>? </q> </t> • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases • World knowledge • An antecedent contains another one <t> <q id="q432"> What is <de id="n18">the starring cast</de> of the film Beetlejuice? </q> <q id="q433"> Who of <link rel="dir" status="ok" type="pron" ref="n18" ant="q" refq="q432">them</link> is the main character? </q> </t> • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> • World knowledge • An antecedent contains another one • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases ? • World knowledge • An antecedent contains another one • Collective nouns • Two antecedents, but separated ? <t> <q id="q465"> What transport was used in the Kon-Tiki Expedition? </q> <q id="q466"> How many people crewed <link rel="dir" status=”no" type="pron" ref="" ant=”q" refq="q465">it</link>? </q> </t> • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases • World knowledge • An antecedent contains another one • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent <t> <q id="q434"> What is <de id="n19">a censer</de> ? </q> <q id="q435"> What name is given to <de id="n20"><link rel="dir" status="no" type="pron" ref="n19" ant="q" refq="q434">the one</link> of the Cathedral of Santiago de Compostela </de>? </q> <q id="q436"> How much does <link rel="dir" status="ok" type="pron" ref="n20" ant="q" refq="q434">it</link> weight? </q> </t> • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluation • Annotation • 2 annotators • Blind annotation • Evaluation • Each language independently • Global results Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationsubdivision • Topic boundary • Anaphora detection • Anaphora attibutes • Antecedent recognition Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationtopic boundary • Class N: new topic • Class S: same topic Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationanaphora detection Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationanaphora attributes (antecedent) Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationanaphora attributes (type) Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationanaphora attributes (relation) • Dir: direct relation • Indir: bridging relation Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationantecedent recognition Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
Evaluationglobal results • Total agreement results • Spanish: 60/70 = 0,857 • Italian: 60/69 = 0,869 • English: 59/67 = 0,880 • Average: 0,868 Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Conclusion • Multilingual annotation scheme for anaphora resoultion • For the improvement of QA system: the system can detect the antecedent of each anaphora and extract the correct answer • For a true interaction between the system and the user • Simple but complete • Positive results of the evaluation Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
Future work • Integration of other languages • Application of the annotation scheme to other corpora Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationmeasure used • Kappa Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion