CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation

CLEF 2012, RomeQA4MRE, QuestionAnsweringfor Machine Reading Evaluation Anselmo Peñas (UNED, Spain) EduardHovy(USC-ISI, USA) Pamela Forner (CELCT, Italy) Álvaro Rodrigo (UNED, Spain) Richard Sutcliffe(U. Limerick, Ireland) Roser Morante (U. Antwerp, Belgium) Walter Daelemans(U. Antwerp, Belgium) CarolineSporleder(U. Saarland, Germany) Corina Forascu(UAIC, Romania) YassineBenajiba(Philips, USA) PetyaOsenova(BulgarianAcademy of Sciences)

QuestionAnsweringTrack at CLEF

0.8 x 0.8 x 1.0 = 0.64 Portrayal Along the years, we learnt that the architecture is one of the main limitations for improving QA technology So webeton a reformulation: Question Question analysis Answer Passage Retrieval Answer Extraction Answer Ranking

Hypothesis generation + validation Answer Searching space of candidate answers Hypothesis generation functions + Answer validation functions Question

Wefocusonvalidation … Isthecandidateanswercorrect? QA4MRE setting: MultipleChoice Reading ComprehensionTests • Measure progress in two reading abilities • Answer questions about a single text • Capture knowledge from text collections

… and knowledge • Why capture knowledgefromtextcollections? • Weneedknowledgetounderstandlanguage • The ability of making inferences about texts is correlated to the amount of knowledge considered • Texts always omit information we need to recover • Tobuildthe complete storybehindthedocument • And be sureabouttheanswer

Text as source of knowledge Text Collection (background collection) • Set of documents that contextualize the one under reading (20,000-100,000 docs.) • We can imagine this done on the fly by the machine • Retrieval • Big and diverse enough to acquire knowledge • Define a scalable strategy: topic by topic • Reference collection per topic

Background Collections • They must serve to acquire • General facts (with categorization and relevant relations) • Abstractions (such as • This is sensitive to occurrence in texts • Thus, also to the way we create the collection • Key: Retrieve all relevant documents and only them • Classical IR • Interdependence with topic definition • Thetopicisdefinedbythe set of queriesthat produce thecollection

Example: Biomedical Alzheimer’s Disease Literature Corpus Search PubMed about Alzheimer Query: (((((("Alzheimer Disease"[Mesh] OR "Alzheimer's disease antigen"[Supplementary Concept] OR "APP protein, human"[Supplementary Concept] OR "PSEN2 protein, human"[Supplementary Concept] OR "PSEN1 protein, human"[Supplementary Concept]) OR "Amyloid beta-Peptides"[Mesh]) OR "donepezil"[Supplementary Concept]) OR ("gamma-secretase activating protein, human"[Supplementary Concept] OR "gamma-secretase activating protein, mouse"[Supplementary Concept])) OR "amyloid beta-protein (1-42)"[Supplementary Concept]) OR "Presenilins"[Mesh]) OR "Neurofibrillary Tangles"[Mesh] OR "Alzheimer's disease"[All Fields] OR "Alzheimer's Disease"[All Fields] OR "Alzheimer s disease"[All Fields] OR "Alzheimers disease"[All Fields] OR "Alzheimer's dementia"[All Fields] OR "Alzheimer dementia"[All Fields] OR "Alzheimer-type dementia"[All Fields] NOT "non-Alzheimer"[All Fields] NOT ("non-AD"[All Fields] AND "dementia"[All Fields]) AND (hasabstract[text] AND English[lang]) 66,222 abstracts

Questions (MainTask) Distribution of questiontypes 27 PURPOSE 30 METHOD 36 CAUSAL 36 FACTOID 31 WHICH-IS-TRUE Distribution of answertypes 75 REQUIRE NO EXTRA KNOWLEDGE 46 REQUIRE BACKGROUND KNOWLEDGE 21 REQUIRE INFERENCE 20 REQUIRE GATHERING INFORMATION FROM DIFFERENT SENTENCES

Questions (BiomedicalTask) Questiontypes • Experimental evidence/qualifier • Protein-proteininteraction • Gene synonymyrelation • Organismsourcerelation • Regulatoryrelation • Increase (higherexpression) • Decrease (reduction) • Inhibition Answertypes Simple:Theanswerisfoundalmostverbatim in thepaper Medium: Theanswerisrephrased Complex:Requirecombiningpieces of evidence and inference Theyinvolve a predefined set of entitytypes

Main Task 16 test documents, 160 questions, 800 candidate answers 4 Topics • AIDS • Music and Society • Climate Change • Alzheimer (divulgative sources: blogs, web, news, …) 4 Reading tests per topic Document + 10 questions 5 choices per question 6 Languages English, German, Spanish, Italian, Romanian, Arabic new new

Biomedical Task • Same setting • Scientific language • Focus on one disease: Alzheimer • Alzheimer's Disease Literature Corpus (ADLC) • 66,222 abstracts from PubMed • 9,500 full articles • Most of them processed: • Dependency parser GDep (Sagae and Tsujii 2007) • UMLS-based NE tagger (CLiPS) • ABNER NE tagger (Settles 2005)

Task on Modality and Negation Givenanevent in thetext decide whetheritis • Asserted (NONE: no negation and no speculation) • Negated (NEG: negation and no speculation) • Speculated but negated (NEGMOD) • Speculated and not negated (MOD) Is the event present as certain? Yes No Did it happen? Is it negated? Yes No Yes No NONE NEG NEGMOD MOD

Participation ~100% increase

Evaluation and results QA perspectiveevaluation c@1 overallquestions (random 0.2) Reading perspectiveevaluation Aggregating results test by test (pass if c@1 > 0.5)

More details during the workshop Monday 17th Sep. 17:00 - 18:00 Poster Session Tuesday 18th Sep. 10:40 – 12:40 Invited Talk + Overviews 14:10 – 16:10 Reports from participants (Main + Bio) 16:40 – 17:15 Reports from participants (Mod&Neg) 17:15 – 18:10 Breakout session • Thanks!

CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation