Spanish question answering evaluation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

Spanish Question Answering Evaluation PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on
  • Presentation posted in: General

Spanish Question Answering Evaluation. CICLing 2004, Seoul. Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain. Question Answering task. Give an answer to a question Approach: Find (search) an answer in a document collection

Download Presentation

Spanish Question Answering Evaluation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Spanish question answering evaluation

Spanish Question Answering Evaluation

CICLing 2004, Seoul

Anselmo Peñas, Felisa Verdejo and Jesús Herrera

UNED NLP Group

Distance Learning University of Spain


Question answering task

Question Answering task

  • Give an answer to a question

    • Approach: Find (search) an answer in a document collection

    • A document must support the answer

    • Where is Seoul?

      • South Korea (correct)

      • Korea (responsive?)

      • Asia (non responsive)

      • Population of South Korea (inexact)

      • Oranges of China (incorrect)


Qa system architecture

Pre-processing

/ indexing

Key-terms

Question

Analysis

Passage

retrieval

Answer type

/ structure

Answer

extraction

Answer validation

/ scoring

QA system architecture

Documents

Opportunity for natural language techniques

Question

Answer


Overview

Overview

  • Evaluation forums: objectives

  • QA evaluation methodology

  • The challenge of multilingualism

  • QA at CLEF 2003

  • QA at CLEF 2004

  • Conclusion


Evaluation forums objectives

Evaluation Forums: Objectives

  • Stimulate research

  • Establish shared working lines

  • Generate resources for evaluation and for training

  • Compare different approaches and obtain some evidences

  • Serve as a meeting point for collaboration and exchange

    (CLEF, TREC, NTCIR)


Qa evaluation methodology

QA Evaluation Methodology

  • Test suite production:

    • Document collection (hundreds of thousands)

    • Questions (hundreds)

  • Systems answering

    • (Answer + Document id)

    • Limited time

  • Judgment of answers

    • Human assessors

    • Correct, inexact, Unsupported, Incorrect

  • Measuring of systems behavior

    • % of questions correctly answered

    • % of NIL questions correctly detected

    • Precision Recall, F, MRR (Mean Reciprocal Rank), Confidence-weighted score, ...

  • Results comparison


  • Qa evaluation methodology1

    QA Evaluation Methodology

    Considerations on task definition (I)

    • Quantitative evaluation constrains the type of questions

      • Questions valuable in terms of correctness, completeness and exactness

      • e.g. “Which are the causes of the Iraq war?”

  • Human resources available

    • Test suite generation

    • Assessment (# of questions, # of answers per question)

  • Collection

    • Restricted vs. unrestricted domains

    • News vs. patents

    • Multilingual QA: Comparable collections available


  • Qa evaluation methodology2

    QA Evaluation Methodology

    Considerations on task definition (II)

    • Research direction

      • “Do it better” versus “How to get better results?”

      • Systems are tuned according the evaluation task.

      • e.g. evaluation measure, external resources (web)

  • Roadmap versus state of the art

    • What systems should do in future? (Burger, 2000-2002)

    • When is it realistic to incorporate new features in the evaluation?

      Type of questions, temporary restrictions, confidence in answer, encyclopedic knowledge and inference, different sources and languages, consistency between different answers, ...


  • The challenge of multilingualism

    The challenge of multilingualism

    May I continue this talk in Spanish?

    Then multilingualism still remains a challenge...


    The challenge of multilingualism1

    The challenge of multilingualism

    • Feasible with current QA state of the art

    • Challenge for systems but ...

    • ... challenge from the evaluation point of view

      • What is the possible roadmap to achieve fully multilingual systems?

        • QA at CLEF (Cross-Language Evaluation Forum)

        • Monolingual  Bilingual  Multilingual systems

      • Whattasks can be proposed according the current state of the art?

        • Monolingual other than English? Bilingual considering English?

        • Any bilingual?Fully multilingual?

      • Which new resources are needed for the evaluation?

        • Comparable corpus? Unrestricted domain?

        • Parallel corpus? Domain specific? Size?

        • Human resources: Answers in any language make difficult the assessment by native speakers


    The challenge of multilingualism2

    10

    10

    10

    10

    10

    10

    10

    10

    10

    10

    10

    10

    10

    10

    10

    10

    10

    10

    10

    10

    10

    10

    10

    10

    10

    The challenge of multilingualism

    (cont.)

    • How to ensure that fully multilingual systems receive better evaluation?

      • Some answers in just one language? How?

        • Hard pre-assessment?

        • Different languages for different domains?

        • Different languages for different dates or localities?

        • Parallel collections extracting a controlled subset of documents different for each language?

      • How to balance type and difficulty of questions in all languages?

    Ouch!


    The challenge of multilingualism3

    The challenge of multilingualism

    • Fortunately (unfortunately), with the current state of the art is not realistic to plan such evaluation...

      Very few systems are able to deal with several target languages

    • ...yet

    • While we try to answer the questions...

    • Plan a separate evaluation for each target language seems more realistic

    • Option followed by QA at CLEF in the short term


    Overview1

    Overview

    • Evaluation forums: objectives

    • QA evaluation methodology

    • The challenge of multilingualism

    • QA at CLEF 2003

    • QA at CLEF 2004

    • Conclusion


    Qa at clef groups

    QA at CLEF groups

    • ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento, Italy

    • UNED, Universidad Nacional de Educación a Distancia, Madrid, Spain

    • ILLC, Language and Inference Technology Group, U. of Amsterdam

    • DFKI, Deutsches Forschungszentrum für Künstliche Intelligenz, Saarbrücken, Germany

    • ELDA/ELRA, Evaluations and Language Resources Distribution Agency, Paris, France

    • Linguateca, Oslo (Norway), Braga, Lisbon & Porto (Portugal)

    • BulTreeBank Project, CLPP, Bulgarian Academy of Sciences, Sofia, Bulgaria

    • University of Limerick, Ireland

    • ISTI-CNR, Istituto di Scienza e Tecnologie dell'Informazione "A. Faedo“, Pisa, Italy

    • NIST, National Institute of Standards and Technology, Gaithersburg, USA


    Qa at clef 2003

    QA at CLEF 2003

    • Task

      • 200 factoid questions, up to 3 answers per question

      • Exact answer / answer in 50-byte long string

  • Document collection

    • [Spanish] >200,000 news (EFE, 1994)

  • Questions

    • DISEQuA corpus (available in web) (Magnini et al. 2003):

      • Coordinated work between ITC-IRST (Italian), UNED (Spanish) and U.Amsterdam (Dutch)

      • 450 questions and answers translated into English, Spanish, Italian and Dutch

    • 200 questions from DISEQuA corpus (20 NIL)

  • Assessment

    • Incorrect, Unsupported, Non-exact, Correct


  • Multilingual pool of questions

    Questions with known answer in each target language

    Translation into English

    Translation into the rest of languages

    Multilingual

    Pool (500x6)

    Spanish

    Italian

    Dutch

    German

    French

    English

    Spanish (100)

    Italian (100)

    English

    pool

    (500)

    Dutch (100)

    German (100)

    French (100)

    Multilingual pool of questions

    • Final questions are selected from pool

      • For each target language

      • After pre-assessment

    Coordination between several groups


    Qa at clef 20031

    QA at CLEF 2003


    Qa at clef 2004 tasks

    QA at CLEF 2004: tasks


    Qa at clef 2004

    QA at CLEF 2004

    • 200 questions

      • Factual: person, object, measure, organization ...

      • Definition: person, organization

      • How-to

    • 1 answer per question (without manual intervention)

    • Up to two runs

    • Exact answers

    • Assessment: correct, inexact, unsupported, incorrect

    • Evaluation:

      • Fraction of correct answers

      • Measures based on systems self-scoring


    Qa at clef 20041

    QA at CLEF 2004

    • Schedule


    Conclusion

    Conclusion

    Information and resources

    • Cross-Language Evaluation Forum

      • http://clef-qa.itc.it/2004

      • DISEQuA Corpus: Dutch, Italian, Spanish, English

  • Spanish QA at CLEF

    • http://nlp.uned.es/QA

      ([email protected])


  • Login