1 / 17

Overview of QAST 2007 - Question Answering on Speech Transcriptions -

Overview of QAST 2007 - Question Answering on Speech Transcriptions -. J. Turmo, P. Comas (1), C. Ayache, D. Mostefa (2), L. Lamel and S. Rosset (3) (1) UPC, Spain (2) ELDA, France (3) LIMSI, France QAST Website : http://www.lsi.upc.edu/~qast/. Task Participants Results

rod
Download Presentation

Overview of QAST 2007 - Question Answering on Speech Transcriptions -

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of QAST 2007 - Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), C. Ayache, D. Mostefa (2), L. Lamel and S. Rosset (3) (1) UPC, Spain (2) ELDA, France (3) LIMSI, France QAST Website : http://www.lsi.upc.edu/~qast/

  2. Task Participants Results Conclusion and future work Outline

  3. Task: QAST 2007 Organization Task jointly organized by : • UPC, Spain (J. Turmo, P. Comas) Coordinator • ELDA, France (C. Ayache, D. Mostefa) • LIMSI-CNRS, France (S. Rosset, L. Lamel)

  4. 4 tasks were proposed: T1 : QA in manual transcriptions of lectures T2 : QA in automatic transcriptions of lectures T3 : QA in manual transcriptions of meetings T4 : QA in automatic transcriptions of meetings 2 data collections: The CHIL corpus: around 25 hours (1 hour per lecture) Domain of lectures: Speech and language processing The AMI corpus: around 100 hours (168 meetings) Domain of meetings: Design of television remote control Task: Evaluation Protocol

  5. For each task, 2 sets of questions were provided: Development set (1 February 2007): Lectures: 10 lectures, 50 questions Meetings: 50 meetings, 50 questions Evaluation set (18 June 2007): Lectures: 15 lectures, 100 questions Meetings: 118 meetings, 100 questions Task: Questions and answer types

  6. Factual questions Who is a guru in speech recognition? Expected answers = named entities. List of NEs: person, location, organization, language, system/method, measure, time, color, shape, material. No definition questions. Task: Questions and answer types

  7. Assessors used QASTLE, an evaluation tool developed in Perl (by ELDA), to evaluate the data. Four possible judgments: Correct Incorrect Inexact (too short or too long) Unsupported (correct answers but wrong document) Task: Human judgment

  8. Two metrics were used: Mean Reciprocal Rank (MRR): measures how well ranked is a right answer. Accuracy: the fraction of correct answers ranked in the first position in the list of 5 possible answers Participants could submit up to 2 submissions per task and 5 answers per question. Task: Scoring

  9. Five teams submitted results for one or more QAST tasks: CLT, Center for Language Technology, Australia ; DFKI, Germany ; LIMSI-CNRS, Laboratoire d’Informatique et de Mécanique des Sciences de l’Ingénieur, France ; Tokyo Institute of Technology, Japan ; UPC, Universitat Politècnica de Catalunya, Spain. In total, 28 submission files were evaluated: Participants

  10. Due to some problems (typos, answer types and also missing time information at word level for some AMI meetings) some questions have been deleted from test set for scoring. Final counts: T1 and T2: 98 questions T3: 96 questions T4: 93 questions Results

  11. QA on CHIL manual transcriptions: Results for T1

  12. QA on CHIL automatic transcriptions: Results for T2

  13. QA on AMI manual transcriptions: Results for T3

  14. QA on AMI automatic transcriptions: Results for T4

  15. 5 participants from 5 different countries (France, Germany, Spain, Australia and Japan) => 28 runs Very encouraging results QA technology can be useful to deal with spontaneous speech transcripts. High loss in accuracy with automatically transcribed speech Conclusion and future work

  16. Future work aims at including: Other languages than English Oral questions Other question types: definition, list, etc. Other domains of data collections: European Parliament, broadcast news, etc. Conclusion and future work

  17. The QAST Website: http://www.lsi.upc.edu/~qast/ For more information

More Related