Overview of the
1 / 21

Overview of the Multilingual Question Answering Track - PowerPoint PPT Presentation

  • Uploaded on

Overview of the Multilingual Question Answering Track. Danilo Giampiccolo. Outline. Tasks Test set preparation Participants Evaluation Results Final considerations Future perspectives. QA 2006: Organizing Committee. ITC-irst (Bernardo Magnini): main coordinator

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Overview of the Multilingual Question Answering Track' - richard-clay

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Overview of the

Multilingual Question Answering Track

Danilo Giampiccolo

[email protected] 2006 Workshop


  • Tasks

  • Test set preparation

  • Participants

  • Evaluation

  • Results

  • Final considerations

  • Future perspectives

[email protected] 2006 Workshop

Qa 2006 organizing committee
QA 2006: Organizing Committee

  • ITC-irst (Bernardo Magnini): main coordinator

  • CELCT (D. Giampiccolo, P. Forner): general coordination, Italian

  • DFKI (B. Sacalenau): German

  • ELDA/ELRA (C. Ayache): French

  • Linguateca (P. Rocha): Portuguese

  • UNED (A. Penas): Spanish

  • U. Amsterdam (Valentin Jijkoun): Dutch

  • U. Limerick (R. Sutcliff): English

  • Bulgarian Academy of Sciences (P. Osenova): Bulgarian

  • Only Source Languages:

    • Depok University of Indonesia (M. Adriani): Indonesian

    • IASI, Romania (D. Cristea): Romanian

    • Wrocław University of Technology (J. Pietraszko): Polish

[email protected] 2006 Workshop

Qa@clef 06 tasks
[email protected]: Tasks

  • Main task:

    • Monolingual: the language of the question (Source language) and the language of the news collection (Target language) are the same

    • Cross-lingual: the questions were formulated in a language different from that of the news collection

  • One pilot task:

    • WiQA: coordinated by Maarten de Rijke

  • Two exercises:

    • Answer Validation Exercise (AVE): coordinated by Anselmo Peñas

    • Real Time: a “time-constrained” QA exercise coordinated by the University of Alicante (coordinated by Fernando Llopis)

[email protected] 2006 Workshop

Data set question format
Data set: Question format

200 Questions of three kinds

  • FACTOID (loc, mea, org, oth, per, tim;ca. 150):

  • What party did Hitler belong to?

  • DEFINITION (ca. 40):Who is Josef Paul Kleihues?

    • reduced in number (-25%)

    • two new categories added:

      • Object: What is a router?

      • Other: What is a tsunami?

  • LIST (ca. 10): Name works by Tolstoy

  • Temporally restricted (ca. 40): by date, by period, by event

  • NIL (ca. 20): questions that do not have any known answer in the target document collection

  • input format: question type (F, D, L) not indicated

  • NEW!




    [email protected] 2006 Workshop

    Data set run format
    Data set: run format

    • Multiple answers:from one to ten exact answers per question

      • exact = neither more nor less than the information required

      • each answer has to be supported by

        • docid

        • one to ten text snippets justifying the answer (substrings of the specified document giving the actual context)



    [email protected] 2006 Workshop

    Activated tasks at least one registered participant
    Activated Tasks (at least one registered participant)

    • 11 Source languages (10 in 2005)

    • 8 Target languages (9 in 2005)

    • No Finnish task / New languages: Polish and Romanian

    [email protected] 2006 Workshop

    Activated tasks
    Activated Tasks

    • questions were not translated in all the languages

    • Gold Standard: questions in multiple languages only for tasks were there was at least one registered participant


    More interest in cross-linguality

    [email protected] 2006 Workshop


    [email protected] 2006 Workshop

    List of participants
    List of participants

    Industrial Companies

    [email protected] 2006 Workshop

    Submitted runs
    Submitted runs

    [email protected] 2006 Workshop

    Number of answers and snippets per question

    1 snippet

    2 snippets

    3 snippets

    > 4 snippets

    Number of answers and snippets per question

    Number of RUNS with respect to number of answers

    1 answer


    2 and 5 answers

    more than

    5 answers

    Number of SNIPPETS for each answer

    [email protected] 2006 Workshop


    • As in previous campaigns

      • runs manually judged by native speakers

      • each answer: Right, Wrong, ineXact, Unsupported

      • up to two runs for each participating group

    • Evaluation measures

      • Accuracy (for F,D); main evaluation score, calculated for the FIRST ANSWER only

        • excessive workload: some groups could manually assess only one answer (the first one) per question

          • 1 answer: Spanish and English

          • 3 answers: French

          • 5 answers: Dutch

          • all answers: Italian, German, Portoguese

      • [email protected] for List questions

        Additional evaluation measures

        • K1 measure

        • Confident Weighted Score (CWS)

        • Mean Reciprocal Rank (MRR)


    [email protected] 2006 Workshop

    Question overlapping among languages 2005 2006
    Question Overlapping among Languages 2005-2006

    [email protected] 2006 Workshop

    Results best and average scores
    Results: Best and Average scores



    * This result is still under validation.

    [email protected] 2006 Workshop

    Best results in 2004 2005 2006
    Best results in 2004-2005-2006



    * This result is still under validation.

    [email protected] 2006 Workshop

    Participants in 2004 2005 2006 compared best results
    Participants in 2004-2005-2006: compared best results

    [email protected] 2006 Workshop

    List questions
    List questions

    • Best: 0.8333 (Priberam, Monolingual PT)

    • Average: 0.138


    • Wrong classification of List Questions in the Gold Standard

      • Mention a Chinese writer is not a List question!

    • Definition of List Questions

      • “closed” List questions asking for a finite number of answers

        Q: What are the names of the two lovers from Verona separated by family issues in one of Shakespeare’s plays?

        A: Romeo and Juliet.

      • “open” List questions requiring a list of items as answer

        Q: Name books by Jules Verne.

        A: Around the World in 80 Days.

        A:Twenty Thousand Leagues Under The Sea.

        A:Journey to the Centre of the Earth.

    [email protected] 2006 Workshop

    Final considerations
    Final considerations

    • Increasing interest in multilingual QA

      • More participants (30, + 25%)

      • Two new languages as source (Romanian and Polish)

      • More activated tasks (24, they were 23 in 2005)

      • More submitted runs (77, +13%)

      • More cross-lingual tasks (35, +31.5%)

    • Gold Standard: questions not translated in all languages

      • No possibility of activating tasks at the last minutes

      • Useful as reusuable resource: available in the near future.

    [email protected] 2006 Workshop

    Final considerations 2006 main task innovations
    Final considerations:2006 main task innovations

    • Multiple answers:

      • good response

      • limited capacity of assessing large numbers of answers.

      • feedback welcome from participants

    • Supporting snippets:

      • faster evaluation

      • feedback from participants

    • “F/D/L/” labels not given in the input format:

      • positive, as apparently there was no real impact on

    • List questions

    [email protected] 2006 Workshop

    Future perspective main task
    Future perspective: main task

    • For discussion:

      • Romanian as target

      • Very hard questions (implying reasoning and multiple document answers)

      • Allow collaboration among different systems

      • Partial automated evaluation (right answers)

    [email protected] 2006 Workshop