1 / 13

DQR test suites for spoken dialogue system evaluation : A paradigm for a qualitative evaluation

DQR test suites for spoken dialogue system evaluation : A paradigm for a qualitative evaluation. Jean-Yves Antoine VALORIA U. Bretagne Sud Vannes, France. Jérôme Zeiliger INRS-Telecom Quebec, Canada. Jean Caelen CLIPS Institut IMAG Grenoble, France. Quantitative evaluation.

menefer
Download Presentation

DQR test suites for spoken dialogue system evaluation : A paradigm for a qualitative evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DQR test suites for spoken dialogue system evaluation :A paradigm for a qualitative evaluation Jean-Yves Antoine VALORIA U. Bretagne Sud Vannes, France Jérôme Zeiliger INRS-Telecom Quebec, Canada Jean Caelen CLIPS Institut IMAG Grenoble, France

  2. Quantitative evaluation • Overall performance of the system • Accuracy rates outputs / predefinite references • Advantages • Objective evaluation • Overall improvements over time • Drawbacks • Lack of predictive power • Lack of genericness

  3. Predictability : some questions • Overall accuracy rate of the system How does it depend on the performances of its components ? • Overall accuracy rate of a specific component • How does it depend on the testing data ? • How does it depend on the application ? • How should it enlighten us about future improvements ?

  4. Predictability : a solution Quantitative evaluation Qualitative evaluation • Assessment of the Overall improvements of the technology • Appropriateness to a specific task / application Evaluation of the system’s behaviour on EVERY specific phenomenon PREDICTABILITY

  5. DQR methodology • Qualitative Evaluation in NLP TSNLP — FRACAS — AUPELF-UREF • DQR test suites • Declaration D : the utterance the system should understand. D concerns a specific phenomenon • Peter is attending a meeting. He is to chair it. • Question Q : assesses the understanding of D • Is Peter to chair a meeting ?. • Reply R : [Yes] / [No]

  6. DQR Evaluation and Speech EXTENSIONS OF THE DQR METHODOLOGY Specificity of the spoken language interaction Specificity of the speech technologies Structural Analysis spontaneous unexpected structures Dialog Strategy Practical adaptation of the DQR test suites

  7. Multi-level Evaluation • Speech Understanding • Literal understanding (structural analysis) • Implicit understanding (anaphora, ellipses) • Inference- common sense reasonning (logical inferences) • - pragmatic reasonning • - multiple turns inferences • Dialogue • Speech acts interpretation (intention in action) • Speaker’s intention recognition (preliminary intention) • Relevance- reply of the system • - dialogue strategy

  8. Practical achievement Simplicity of the question Q (D) I need to go to Granada tomorrow morning (Q) Go to Granada (R) [Yes] Simplicity of the evaluation • Computation of the answer : mere unification • Accuracy rate : specific to each phenomenon Rsystem = UNIF ( D, Q )

  9. Genericity Unification of the intrinsic representations of the system No predefinite references No common representations Complete independance

  10. Predicatibility: literal understanding • Key information retrieval • (D) I need to go to Granada tomorrow morning • (Q) Go to Granada • (R) [Yes] • Sharper understanding • (D) Turn on right after the building with the red shutters • (Q) Red shutters • (R) [Yes] • (Q) Building with shutters • (R) [Yes]

  11. Predicatibility: negative tests Positive Tests Tracking the errors Negative Tests Explaining the errors Example : literal understanding • (D) Turn on right after the building with the red shutters • (Q) Red building • (R) [No] • (D) Move the circle and the triangle on the right • (Q) Move the right triangle • (R) [No]

  12. Predicatibility: spoken constructions • Repetitions, self-corrections • (D) I want to leave tomorrow evening … no sorry … morning • (Q) Tomorow morning • (R) [Yes] • Word-order alterations • (D) On the right of the circle, draw a red triangle • (Q) Draw a circle • (R) [No]

  13. Conclusion • A predictive and generic paradigm of evaluation • Already in use in NLP (Fracas, 1996) • Adaptable to spoken language understanding • AUPELF-UREF French-speaking evaluation • Adaptable to spoken dialog ???? • Lack of interactive abilities of the present systems

More Related