slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Overview of the Multilingual Question Answering Track PowerPoint Presentation
Download Presentation
Overview of the Multilingual Question Answering Track

Loading in 2 Seconds...

play fullscreen
1 / 21

Overview of the Multilingual Question Answering Track - PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on

Overview of the Multilingual Question Answering Track. Danilo Giampiccolo. Outline. Tasks Test set preparation Participants Evaluation Results Final considerations Future perspectives. QA 2006: Organizing Committee. ITC-irst (Bernardo Magnini): main coordinator

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Overview of the Multilingual Question Answering Track' - richard-clay


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Overview of the

Multilingual Question Answering Track

Danilo Giampiccolo

QA@CLEF 2006 Workshop

outline
Outline
  • Tasks
  • Test set preparation
  • Participants
  • Evaluation
  • Results
  • Final considerations
  • Future perspectives

QA@CLEF 2006 Workshop

qa 2006 organizing committee
QA 2006: Organizing Committee
  • ITC-irst (Bernardo Magnini): main coordinator
  • CELCT (D. Giampiccolo, P. Forner): general coordination, Italian
  • DFKI (B. Sacalenau): German
  • ELDA/ELRA (C. Ayache): French
  • Linguateca (P. Rocha): Portuguese
  • UNED (A. Penas): Spanish
  • U. Amsterdam (Valentin Jijkoun): Dutch
  • U. Limerick (R. Sutcliff): English
  • Bulgarian Academy of Sciences (P. Osenova): Bulgarian
  • Only Source Languages:
    • Depok University of Indonesia (M. Adriani): Indonesian
    • IASI, Romania (D. Cristea): Romanian
    • Wrocław University of Technology (J. Pietraszko): Polish

QA@CLEF 2006 Workshop

qa@clef 06 tasks
QA@CLEF-06: Tasks
  • Main task:
    • Monolingual: the language of the question (Source language) and the language of the news collection (Target language) are the same
    • Cross-lingual: the questions were formulated in a language different from that of the news collection
  • One pilot task:
    • WiQA: coordinated by Maarten de Rijke
  • Two exercises:
    • Answer Validation Exercise (AVE): coordinated by Anselmo Peñas
    • Real Time: a “time-constrained” QA exercise coordinated by the University of Alicante (coordinated by Fernando Llopis)

QA@CLEF 2006 Workshop

data set question format
Data set: Question format

200 Questions of three kinds

    • FACTOID (loc, mea, org, oth, per, tim;ca. 150):
    • What party did Hitler belong to?
    • DEFINITION (ca. 40):Who is Josef Paul Kleihues?
      • reduced in number (-25%)
      • two new categories added:
        • Object: What is a router?
        • Other: What is a tsunami?
    • LIST (ca. 10): Name works by Tolstoy
    • Temporally restricted (ca. 40): by date, by period, by event
    • NIL (ca. 20): questions that do not have any known answer in the target document collection
  • input format: question type (F, D, L) not indicated

NEW!

NEW!

NEW!

NEW!

QA@CLEF 2006 Workshop

data set run format
Data set: run format
  • Multiple answers:from one to ten exact answers per question
    • exact = neither more nor less than the information required
    • each answer has to be supported by
        • docid
        • one to ten text snippets justifying the answer (substrings of the specified document giving the actual context)

NEW!

NEW!

QA@CLEF 2006 Workshop

activated tasks at least one registered participant
Activated Tasks (at least one registered participant)
  • 11 Source languages (10 in 2005)
  • 8 Target languages (9 in 2005)
  • No Finnish task / New languages: Polish and Romanian

QA@CLEF 2006 Workshop

activated tasks
Activated Tasks
  • questions were not translated in all the languages
  • Gold Standard: questions in multiple languages only for tasks were there was at least one registered participant

NEW!

More interest in cross-linguality

QA@CLEF 2006 Workshop

participants
Participants

QA@CLEF 2006 Workshop

list of participants
List of participants

Industrial Companies

QA@CLEF 2006 Workshop

submitted runs
Submitted runs

QA@CLEF 2006 Workshop

number of answers and snippets per question

1 snippet

2 snippets

3 snippets

> 4 snippets

Number of answers and snippets per question

Number of RUNS with respect to number of answers

1 answer

between

2 and 5 answers

more than

5 answers

Number of SNIPPETS for each answer

QA@CLEF 2006 Workshop

evaluation
Evaluation
  • As in previous campaigns
    • runs manually judged by native speakers
    • each answer: Right, Wrong, ineXact, Unsupported
    • up to two runs for each participating group
  • Evaluation measures
    • Accuracy (for F,D); main evaluation score, calculated for the FIRST ANSWER only
      • excessive workload: some groups could manually assess only one answer (the first one) per question
        • 1 answer: Spanish and English
        • 3 answers: French
        • 5 answers: Dutch
        • all answers: Italian, German, Portoguese
    • P@N for List questions

Additional evaluation measures

      • K1 measure
      • Confident Weighted Score (CWS)
      • Mean Reciprocal Rank (MRR)

NEW!

QA@CLEF 2006 Workshop

results best and average scores
Results: Best and Average scores

*

49,47

* This result is still under validation.

QA@CLEF 2006 Workshop

best results in 2004 2005 2006
Best results in 2004-2005-2006

*

22,63

* This result is still under validation.

QA@CLEF 2006 Workshop

list questions
List questions
  • Best: 0.8333 (Priberam, Monolingual PT)
  • Average: 0.138

Problems

  • Wrong classification of List Questions in the Gold Standard
    • Mention a Chinese writer is not a List question!
  • Definition of List Questions
    • “closed” List questions asking for a finite number of answers

Q: What are the names of the two lovers from Verona separated by family issues in one of Shakespeare’s plays?

A: Romeo and Juliet.

    • “open” List questions requiring a list of items as answer

Q: Name books by Jules Verne.

A: Around the World in 80 Days.

A:Twenty Thousand Leagues Under The Sea.

A:Journey to the Centre of the Earth.

QA@CLEF 2006 Workshop

final considerations
Final considerations
  • Increasing interest in multilingual QA
    • More participants (30, + 25%)
    • Two new languages as source (Romanian and Polish)
    • More activated tasks (24, they were 23 in 2005)
    • More submitted runs (77, +13%)
    • More cross-lingual tasks (35, +31.5%)
  • Gold Standard: questions not translated in all languages
    • No possibility of activating tasks at the last minutes
    • Useful as reusuable resource: available in the near future.

QA@CLEF 2006 Workshop

final considerations 2006 main task innovations
Final considerations:2006 main task innovations
  • Multiple answers:
    • good response
    • limited capacity of assessing large numbers of answers.
    • feedback welcome from participants
  • Supporting snippets:
    • faster evaluation
    • feedback from participants
  • “F/D/L/” labels not given in the input format:
    • positive, as apparently there was no real impact on
  • List questions

QA@CLEF 2006 Workshop

future perspective main task
Future perspective: main task
  • For discussion:
    • Romanian as target
    • Very hard questions (implying reasoning and multiple document answers)
    • Allow collaboration among different systems
    • Partial automated evaluation (right answers)

QA@CLEF 2006 Workshop