1 / 17

CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation

CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation. Anselmo Peñas (UNED, Spain ) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT, Italy ) Álvaro Rodrigo (UNED, Spain ) Richard Sutcliffe (U. Limerick, Ireland ) Roser Morante (U. Antwerp, Belgium)

dacia
Download Presentation

CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CLEF 2012, RomeQA4MRE, QuestionAnsweringfor Machine Reading Evaluation Anselmo Peñas (UNED, Spain) EduardHovy(USC-ISI, USA) Pamela Forner (CELCT, Italy) Álvaro Rodrigo (UNED, Spain) Richard Sutcliffe(U. Limerick, Ireland) Roser Morante (U. Antwerp, Belgium) Walter Daelemans(U. Antwerp, Belgium) CarolineSporleder(U. Saarland, Germany) Corina Forascu(UAIC, Romania) YassineBenajiba(Philips, USA) PetyaOsenova(BulgarianAcademy of Sciences)

  2. QuestionAnsweringTrack at CLEF

  3. 0.8 x 0.8 x 1.0 = 0.64 Portrayal Along the years, we learnt that the architecture is one of the main limitations for improving QA technology So webeton a reformulation: Question Question analysis Answer Passage Retrieval Answer Extraction Answer Ranking

  4. Hypothesis generation + validation Answer Searching space of candidate answers Hypothesis generation functions + Answer validation functions Question

  5. Wefocusonvalidation … Isthecandidateanswercorrect? QA4MRE setting: MultipleChoice Reading ComprehensionTests • Measure progress in two reading abilities • Answer questions about a single text • Capture knowledge from text collections

  6. … and knowledge • Why capture knowledgefromtextcollections? • Weneedknowledgetounderstandlanguage • The ability of making inferences about texts is correlated to the amount of knowledge considered • Texts always omit information we need to recover • Tobuildthe complete storybehindthedocument • And be sureabouttheanswer

  7. Text as source of knowledge Text Collection (background collection) • Set of documents that contextualize the one under reading (20,000-100,000 docs.) • We can imagine this done on the fly by the machine • Retrieval • Big and diverse enough to acquire knowledge • Define a scalable strategy: topic by topic • Reference collection per topic

  8. Background Collections • They must serve to acquire • General facts (with categorization and relevant relations) • Abstractions (such as • This is sensitive to occurrence in texts • Thus, also to the way we create the collection • Key: Retrieve all relevant documents and only them • Classical IR • Interdependence with topic definition • Thetopicisdefinedbythe set of queriesthat produce thecollection

  9. Example: Biomedical Alzheimer’s Disease Literature Corpus Search PubMed about Alzheimer Query: (((((("Alzheimer Disease"[Mesh] OR "Alzheimer's disease antigen"[Supplementary Concept] OR "APP protein, human"[Supplementary Concept] OR "PSEN2 protein, human"[Supplementary Concept] OR "PSEN1 protein, human"[Supplementary Concept]) OR "Amyloid beta-Peptides"[Mesh]) OR "donepezil"[Supplementary Concept]) OR ("gamma-secretase activating protein, human"[Supplementary Concept] OR "gamma-secretase activating protein, mouse"[Supplementary Concept])) OR "amyloid beta-protein (1-42)"[Supplementary Concept]) OR "Presenilins"[Mesh]) OR "Neurofibrillary Tangles"[Mesh] OR "Alzheimer's disease"[All Fields] OR "Alzheimer's Disease"[All Fields] OR "Alzheimer s disease"[All Fields] OR "Alzheimers disease"[All Fields] OR "Alzheimer's dementia"[All Fields] OR "Alzheimer dementia"[All Fields] OR "Alzheimer-type dementia"[All Fields] NOT "non-Alzheimer"[All Fields] NOT ("non-AD"[All Fields] AND "dementia"[All Fields]) AND (hasabstract[text] AND English[lang]) 66,222 abstracts

  10. Questions (MainTask) Distribution of questiontypes 27 PURPOSE 30 METHOD 36 CAUSAL 36 FACTOID 31 WHICH-IS-TRUE Distribution of answertypes 75 REQUIRE NO EXTRA KNOWLEDGE 46 REQUIRE BACKGROUND KNOWLEDGE 21 REQUIRE INFERENCE 20 REQUIRE GATHERING INFORMATION FROM DIFFERENT SENTENCES

  11. Questions (BiomedicalTask) Questiontypes • Experimental evidence/qualifier • Protein-proteininteraction • Gene synonymyrelation • Organismsourcerelation • Regulatoryrelation • Increase (higherexpression) • Decrease (reduction) • Inhibition Answertypes Simple:Theanswerisfoundalmostverbatim in thepaper Medium: Theanswerisrephrased Complex:Requirecombiningpieces of evidence and inference Theyinvolve a predefined set of entitytypes

  12. Main Task 16 test documents, 160 questions, 800 candidate answers 4 Topics • AIDS • Music and Society • Climate Change • Alzheimer (divulgative sources: blogs, web, news, …) 4 Reading tests per topic Document + 10 questions 5 choices per question 6 Languages English, German, Spanish, Italian, Romanian, Arabic new new

  13. Biomedical Task • Same setting • Scientific language • Focus on one disease: Alzheimer • Alzheimer's Disease Literature Corpus (ADLC) • 66,222 abstracts from PubMed • 9,500 full articles • Most of them processed: • Dependency parser GDep (Sagae and Tsujii 2007) • UMLS-based NE tagger (CLiPS) • ABNER NE tagger (Settles 2005)

  14. Task on Modality and Negation Givenanevent in thetext decide whetheritis • Asserted (NONE: no negation and no speculation) • Negated (NEG: negation and no speculation) • Speculated but negated (NEGMOD) • Speculated and not negated (MOD) Is the event present as certain? Yes No Did it happen? Is it negated? Yes No Yes No NONE NEG NEGMOD MOD

  15. Participation ~100% increase

  16. Evaluation and results QA perspectiveevaluation c@1 overallquestions (random 0.2) Reading perspectiveevaluation Aggregating results test by test (pass if c@1 > 0.5)

  17. More details during the workshop Monday 17th Sep. 17:00 - 18:00 Poster Session Tuesday 18th Sep. 10:40 – 12:40 Invited Talk + Overviews 14:10 – 16:10 Reports from participants (Main + Bio) 16:40 – 17:15 Reports from participants (Mod&Neg) 17:15 – 18:10 Breakout session • Thanks!

More Related