slide1
Download
Skip this Video
Download Presentation
Overview of the Multilingual Question Answering Track

Loading in 2 Seconds...

play fullscreen
1 / 21

Overview of the Multilingual Question Answering Track - PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on

Overview of the Multilingual Question Answering Track. Danilo Giampiccolo. Outline. Tasks Test set preparation Participants Evaluation Results Final considerations Future perspectives. QA 2006: Organizing Committee. ITC-irst (Bernardo Magnini): main coordinator

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Overview of the Multilingual Question Answering Track' - richard-clay


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Overview of the

Multilingual Question Answering Track

Danilo Giampiccolo

[email protected] 2006 Workshop

outline
Outline
  • Tasks
  • Test set preparation
  • Participants
  • Evaluation
  • Results
  • Final considerations
  • Future perspectives

[email protected] 2006 Workshop

qa 2006 organizing committee
QA 2006: Organizing Committee
  • ITC-irst (Bernardo Magnini): main coordinator
  • CELCT (D. Giampiccolo, P. Forner): general coordination, Italian
  • DFKI (B. Sacalenau): German
  • ELDA/ELRA (C. Ayache): French
  • Linguateca (P. Rocha): Portuguese
  • UNED (A. Penas): Spanish
  • U. Amsterdam (Valentin Jijkoun): Dutch
  • U. Limerick (R. Sutcliff): English
  • Bulgarian Academy of Sciences (P. Osenova): Bulgarian
  • Only Source Languages:
    • Depok University of Indonesia (M. Adriani): Indonesian
    • IASI, Romania (D. Cristea): Romanian
    • Wrocław University of Technology (J. Pietraszko): Polish

[email protected] 2006 Workshop

qa@clef 06 tasks
[email protected]: Tasks
  • Main task:
    • Monolingual: the language of the question (Source language) and the language of the news collection (Target language) are the same
    • Cross-lingual: the questions were formulated in a language different from that of the news collection
  • One pilot task:
    • WiQA: coordinated by Maarten de Rijke
  • Two exercises:
    • Answer Validation Exercise (AVE): coordinated by Anselmo Peñas
    • Real Time: a “time-constrained” QA exercise coordinated by the University of Alicante (coordinated by Fernando Llopis)

[email protected] 2006 Workshop

data set question format
Data set: Question format

200 Questions of three kinds

    • FACTOID (loc, mea, org, oth, per, tim;ca. 150):
    • What party did Hitler belong to?
    • DEFINITION (ca. 40):Who is Josef Paul Kleihues?
      • reduced in number (-25%)
      • two new categories added:
        • Object: What is a router?
        • Other: What is a tsunami?
    • LIST (ca. 10): Name works by Tolstoy
    • Temporally restricted (ca. 40): by date, by period, by event
    • NIL (ca. 20): questions that do not have any known answer in the target document collection
  • input format: question type (F, D, L) not indicated

NEW!

NEW!

NEW!

NEW!

[email protected] 2006 Workshop

data set run format
Data set: run format
  • Multiple answers:from one to ten exact answers per question
    • exact = neither more nor less than the information required
    • each answer has to be supported by
        • docid
        • one to ten text snippets justifying the answer (substrings of the specified document giving the actual context)

NEW!

NEW!

[email protected] 2006 Workshop

activated tasks at least one registered participant
Activated Tasks (at least one registered participant)
  • 11 Source languages (10 in 2005)
  • 8 Target languages (9 in 2005)
  • No Finnish task / New languages: Polish and Romanian

[email protected] 2006 Workshop

activated tasks
Activated Tasks
  • questions were not translated in all the languages
  • Gold Standard: questions in multiple languages only for tasks were there was at least one registered participant

NEW!

More interest in cross-linguality

[email protected] 2006 Workshop

list of participants
List of participants

Industrial Companies

[email protected] 2006 Workshop

number of answers and snippets per question

1 snippet

2 snippets

3 snippets

> 4 snippets

Number of answers and snippets per question

Number of RUNS with respect to number of answers

1 answer

between

2 and 5 answers

more than

5 answers

Number of SNIPPETS for each answer

[email protected] 2006 Workshop

evaluation
Evaluation
  • As in previous campaigns
    • runs manually judged by native speakers
    • each answer: Right, Wrong, ineXact, Unsupported
    • up to two runs for each participating group
  • Evaluation measures
    • Accuracy (for F,D); main evaluation score, calculated for the FIRST ANSWER only
      • excessive workload: some groups could manually assess only one answer (the first one) per question
        • 1 answer: Spanish and English
        • 3 answers: French
        • 5 answers: Dutch
        • all answers: Italian, German, Portoguese
    • [email protected] for List questions

Additional evaluation measures

      • K1 measure
      • Confident Weighted Score (CWS)
      • Mean Reciprocal Rank (MRR)

NEW!

[email protected] 2006 Workshop

results best and average scores
Results: Best and Average scores

*

49,47

* This result is still under validation.

[email protected] 2006 Workshop

best results in 2004 2005 2006
Best results in 2004-2005-2006

*

22,63

* This result is still under validation.

[email protected] 2006 Workshop

list questions
List questions
  • Best: 0.8333 (Priberam, Monolingual PT)
  • Average: 0.138

Problems

  • Wrong classification of List Questions in the Gold Standard
    • Mention a Chinese writer is not a List question!
  • Definition of List Questions
    • “closed” List questions asking for a finite number of answers

Q: What are the names of the two lovers from Verona separated by family issues in one of Shakespeare’s plays?

A: Romeo and Juliet.

    • “open” List questions requiring a list of items as answer

Q: Name books by Jules Verne.

A: Around the World in 80 Days.

A:Twenty Thousand Leagues Under The Sea.

A:Journey to the Centre of the Earth.

[email protected] 2006 Workshop

final considerations
Final considerations
  • Increasing interest in multilingual QA
    • More participants (30, + 25%)
    • Two new languages as source (Romanian and Polish)
    • More activated tasks (24, they were 23 in 2005)
    • More submitted runs (77, +13%)
    • More cross-lingual tasks (35, +31.5%)
  • Gold Standard: questions not translated in all languages
    • No possibility of activating tasks at the last minutes
    • Useful as reusuable resource: available in the near future.

[email protected] 2006 Workshop

final considerations 2006 main task innovations
Final considerations:2006 main task innovations
  • Multiple answers:
    • good response
    • limited capacity of assessing large numbers of answers.
    • feedback welcome from participants
  • Supporting snippets:
    • faster evaluation
    • feedback from participants
  • “F/D/L/” labels not given in the input format:
    • positive, as apparently there was no real impact on
  • List questions

[email protected] 2006 Workshop

future perspective main task
Future perspective: main task
  • For discussion:
    • Romanian as target
    • Very hard questions (implying reasoning and multiple document answers)
    • Allow collaboration among different systems
    • Partial automated evaluation (right answers)

[email protected] 2006 Workshop

ad