1 / 21

Overview of the Multilingual Question Answering Track

Overview of the Multilingual Question Answering Track. Danilo Giampiccolo. Outline. Tasks Test set preparation Participants Evaluation Results Final considerations Future perspectives. QA 2006: Organizing Committee. ITC-irst (Bernardo Magnini): main coordinator

Download Presentation

Overview of the Multilingual Question Answering Track

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of the Multilingual Question Answering Track Danilo Giampiccolo QA@CLEF 2006 Workshop

  2. Outline • Tasks • Test set preparation • Participants • Evaluation • Results • Final considerations • Future perspectives QA@CLEF 2006 Workshop

  3. QA 2006: Organizing Committee • ITC-irst (Bernardo Magnini): main coordinator • CELCT (D. Giampiccolo, P. Forner): general coordination, Italian • DFKI (B. Sacalenau): German • ELDA/ELRA (C. Ayache): French • Linguateca (P. Rocha): Portuguese • UNED (A. Penas): Spanish • U. Amsterdam (Valentin Jijkoun): Dutch • U. Limerick (R. Sutcliff): English • Bulgarian Academy of Sciences (P. Osenova): Bulgarian • Only Source Languages: • Depok University of Indonesia (M. Adriani): Indonesian • IASI, Romania (D. Cristea): Romanian • Wrocław University of Technology (J. Pietraszko): Polish QA@CLEF 2006 Workshop

  4. QA@CLEF-06: Tasks • Main task: • Monolingual: the language of the question (Source language) and the language of the news collection (Target language) are the same • Cross-lingual: the questions were formulated in a language different from that of the news collection • One pilot task: • WiQA: coordinated by Maarten de Rijke • Two exercises: • Answer Validation Exercise (AVE): coordinated by Anselmo Peñas • Real Time: a “time-constrained” QA exercise coordinated by the University of Alicante (coordinated by Fernando Llopis) QA@CLEF 2006 Workshop

  5. Data set: Question format 200 Questions of three kinds • FACTOID (loc, mea, org, oth, per, tim;ca. 150): • What party did Hitler belong to? • DEFINITION (ca. 40):Who is Josef Paul Kleihues? • reduced in number (-25%) • two new categories added: • Object: What is a router? • Other: What is a tsunami? • LIST (ca. 10): Name works by Tolstoy • Temporally restricted (ca. 40): by date, by period, by event • NIL (ca. 20): questions that do not have any known answer in the target document collection • input format: question type (F, D, L) not indicated NEW! NEW! NEW! NEW! QA@CLEF 2006 Workshop

  6. Data set: run format • Multiple answers:from one to ten exact answers per question • exact = neither more nor less than the information required • each answer has to be supported by • docid • one to ten text snippets justifying the answer (substrings of the specified document giving the actual context) NEW! NEW! QA@CLEF 2006 Workshop

  7. Activated Tasks (at least one registered participant) • 11 Source languages (10 in 2005) • 8 Target languages (9 in 2005) • No Finnish task / New languages: Polish and Romanian QA@CLEF 2006 Workshop

  8. Activated Tasks • questions were not translated in all the languages • Gold Standard: questions in multiple languages only for tasks were there was at least one registered participant NEW! More interest in cross-linguality QA@CLEF 2006 Workshop

  9. Participants QA@CLEF 2006 Workshop

  10. List of participants Industrial Companies QA@CLEF 2006 Workshop

  11. Submitted runs QA@CLEF 2006 Workshop

  12. 1 snippet 2 snippets 3 snippets > 4 snippets Number of answers and snippets per question Number of RUNS with respect to number of answers 1 answer between 2 and 5 answers more than 5 answers Number of SNIPPETS for each answer QA@CLEF 2006 Workshop

  13. Evaluation • As in previous campaigns • runs manually judged by native speakers • each answer: Right, Wrong, ineXact, Unsupported • up to two runs for each participating group • Evaluation measures • Accuracy (for F,D); main evaluation score, calculated for the FIRST ANSWER only • excessive workload: some groups could manually assess only one answer (the first one) per question • 1 answer: Spanish and English • 3 answers: French • 5 answers: Dutch • all answers: Italian, German, Portoguese • P@N for List questions Additional evaluation measures • K1 measure • Confident Weighted Score (CWS) • Mean Reciprocal Rank (MRR) NEW! QA@CLEF 2006 Workshop

  14. Question Overlapping among Languages 2005-2006 QA@CLEF 2006 Workshop

  15. Results: Best and Average scores * 49,47 * This result is still under validation. QA@CLEF 2006 Workshop

  16. Best results in 2004-2005-2006 * 22,63 * This result is still under validation. QA@CLEF 2006 Workshop

  17. Participants in 2004-2005-2006: compared best results QA@CLEF 2006 Workshop

  18. List questions • Best: 0.8333 (Priberam, Monolingual PT) • Average: 0.138 Problems • Wrong classification of List Questions in the Gold Standard • Mention a Chinese writer is not a List question! • Definition of List Questions • “closed” List questions asking for a finite number of answers Q: What are the names of the two lovers from Verona separated by family issues in one of Shakespeare’s plays? A: Romeo and Juliet. • “open” List questions requiring a list of items as answer Q: Name books by Jules Verne. A: Around the World in 80 Days. A:Twenty Thousand Leagues Under The Sea. A:Journey to the Centre of the Earth. QA@CLEF 2006 Workshop

  19. Final considerations • Increasing interest in multilingual QA • More participants (30, + 25%) • Two new languages as source (Romanian and Polish) • More activated tasks (24, they were 23 in 2005) • More submitted runs (77, +13%) • More cross-lingual tasks (35, +31.5%) • Gold Standard: questions not translated in all languages • No possibility of activating tasks at the last minutes • Useful as reusuable resource: available in the near future. QA@CLEF 2006 Workshop

  20. Final considerations:2006 main task innovations • Multiple answers: • good response • limited capacity of assessing large numbers of answers. • feedback welcome from participants • Supporting snippets: • faster evaluation • feedback from participants • “F/D/L/” labels not given in the input format: • positive, as apparently there was no real impact on • List questions QA@CLEF 2006 Workshop

  21. Future perspective: main task • For discussion: • Romanian as target • Very hard questions (implying reasoning and multiple document answers) • Allow collaboration among different systems • Partial automated evaluation (right answers) QA@CLEF 2006 Workshop

More Related