1 / 33

Research Challenges for Spoken Language Dialog Systems

Research Challenges for Spoken Language Dialog Systems. Julie Baca, Ph.D. Center for Advanced Vehicular Systems Mississippi State University Computer Science Graduate Seminar November 27, 2002. Overview. Define dialog systems Describe research issues Present current work

vine
Download Presentation

Research Challenges for Spoken Language Dialog Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research Challenges for Spoken Language Dialog Systems Julie Baca, Ph.D. Center for Advanced Vehicular Systems Mississippi State University Computer Science Graduate Seminar November 27, 2002

  2. Overview • Define dialog systems • Describe research issues • Present current work • Give conclusions and discuss future work

  3. What is a Dialog System? • Current commercial voice products require adherence to “command and control” language, e.g., • User: “Plan Route” • Such interfaces are not robust to variations from the fixed words and phrases.

  4. What is a Dialog System? • Dialog systems seek to provide a natural conversational interaction between the user and the computer system, e.g., • User: “Is there a way I can get to Canal Street from here?

  5. Domains for Dialog Systems • Travel reservation • Weather forecasting • In-vehicle driver assistance • On-line learning environments

  6. Dialog Systems: Information Flow • Must model two-way flow of information • User-to-system • System-to-user

  7. Dialog Manager Dialog System NLP Speech Recognition Application Database TTS Response Generation

  8. Research Issues Many fundamental problems must be solved for these systems to mature. Three general areas include: • Automatic Speech Recognition (ASR) • Natural Language Processing (NLP) • Human-computer Interaction (HCI)

  9. NLP Issue for Dialog Systems: Semantics • Must assess meaning, not just syntactic correctness. • Therefore, must handle ungrammatical inputs, e.g., • “The ……nearest .....station is… …is there a gas station nearby?”

  10. NLP Issue: Semantic Representation 1 • For NLP, use semantic grammars • Semantic frame with slots and fillers: • <destination> -> <prep> <place> <prep>-> “nearest” <place>-> “gas station”

  11. NLP Issue: Semantic Representation 2 • Must also represent: • “How do I get from Canal Street to Royal Street?” <directions> -> <start> <destination> <destination> -> <prep><place> <place> -> <street_name> | <business> <street_name>-> “Canal St”| “Royal St” <prep> -> <to_prep><near_prep> <near-prep> -> “nearest”|“closest” …

  12. NLP Issue: Semantic Representation 3 • Two Approaches: • Hand-craft the grammar for the application, using robust parsing to understand meaning [1,2]. • Problem: time, expense • Use statistical approach, generating initial rules and using annotated tree-banked data to discover the full rule set [3,4]. • Problem: annotated training data

  13. ASR/NLP Issue: Reducing Errors • Most systems use a loose coupling of ASR and NLP. • Try earlier integration of semantics with recognizer. • Incorporate dialog “state” into underlying statistical model. • Problems: • Increases search space • Training Data

  14. NLP Issue: Resolving Meaning Using Context • Must maintain knowledge of the conversational context. • After request for nearest gas station, user says, “What is it close to?” • Resolving “it” - anaphora • Another follow-up by the user, “How about …restaurant?” • Resolving “…” with “nearest”- ellipsis

  15. Resolving Meaning: Discourse Analysis • To resolve such requests, system must track context of the conversation. • This is typically handled by a discourse analysis component in the Dialog Manager.

  16. Dialog Manager: Discourse Analysis • Anaphora resolution approach: Use focus mechanism, assuming conversation has focus [5]. • For our example, “gas station” is current focus. • But how about: • “I’m at Food Max. How do I get to a gas station close to it and a video store close to it?” • Problem: Resolving the two “its”.

  17. Dialog System Discourse Analysis NLP Speech Recognition Dialog Manager Application Database Response Generation TTS

  18. Dialog Manager: Clarification • Often cannot satisfy request in one iteration. • The previous example may require clarification from the user, • “Do you want to go to the gas station first?”

  19. HCI Issue:System vs. User Initiative • What level of control do you provide user in the conversation? Initiative Computer Human C: "Please say departure city" U:"Tell me how to get to the Hilton."

  20. Mixed Initiative • Total system initiative provides low usability. • Total user initiative introduces higher error rate. • Thus, mixed initiative approach, balancing usability and error rate, is taken most often. • Allowing user to adapt the level explicitly has also shown merit [6].

  21. ASR/HCI Issue:Error Handling • How to handle possible errors? • Assign confidence score to result of recognizer. • For results with lower confidence score, request clarification or revert to system-oriented initiative. • Can incorporate dialog state in computing confidence score [7].

  22. HCI Issue: Response Generation • How to present response to user in a way that minimizes cognitive load? • Varies depending on whether output is speech-only or speech /visual. • Speech-only output must respect user short-term memory limitations, e.g., lists must be short, timed appropriately, and allow repetition. • Speech/visual output must be complimentary, e.g., importance of redundancy and timing.

  23. HCI Issue: Evaluating Dialog Systems • How to compare and evaluate dialog systems? • PARADISE (Paradigm for Dialog Systems Evaluation) provides a standard framework [8].

  24. PARADISE: Evaluating Dialog Systems • Task success • Was the necessary information exchanged? • Efficiency/Cost • Number dialog turns, task completion time • Qualitative • ASR rejections, timeouts, helps • Usability • User satisfaction with ASR, task ease, interaction pace, system response

  25. Current Work • Sponsored by CAVS • Examining: • In-vehicle Environment • Manufacturing Environment • Multidisciplinary Team: • CS , ECE, IE • Baca, Picone, Duffy • ECE graduate students • Hualin Gao, Zheng Feng

  26. Current Work: In-vehicle Dialog System • Specific ASR Issues for In-vehicle Environment: • Real-time performance • Noise cancellation

  27. Current Work: In-vehicle Dialog System • Other Significant Issues: • Reducing error rate • Graceful error handling and mixed initiative strategy • Response generation to reduce user cognitive load • Evaluation

  28. Current Work: In-vehicle Dialog System • Approach • Develop prototype in-vehicle system • Initial focus on ASR and NLP issues • Integrate real-time recognizer [9] • Employ noise-cancellation techniques [10] • Use semantic grammar for NLP • Examine tighter integration of ASR and NLP • Incorporate dialog state in underlying statistical models for ASR

  29. Current Work: In-vehicle Dialog System • Second phase, focus on: • Response generation • Mixed initiative strategies • Evaluation

  30. Current Work: Workforce Training Dialog System • Significant issues in manufacturing environment: • Recognition issues: • Real-time performance • Noisy environments • Understanding issues: • Multimodal interface for reducing error rate, e.g., voice and pen [11]. • HCI/Human Factors Issues: • Response generation to integrate speech and visual output

  31. Research Significance • Advance the development of dialog systems technology through addressing fundamental issues as they arise in the automotive domains. • Potential areas: ASR, NLP, HCI

  32. References [1] S.J. Young and C.E. Proctor, “The design and implementation of dialogue control in voice operated database inquiry systems,” Computer Speech and Language, Vol.3, no. 4, pp. 329-353, 1992. [2] W. Ward, “Understanding spontaneous speech,” in Proceedings of International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, 1991, pp. 365-368. [3] R. Pieraccini and E. Levin, “Stochastic representation of semantic structure for speech understanding,” Speech Communication, vol. 11., no.2, pp. 283-288, 1992. [4] Y. Wang and A. Acero, “Evaluation of spoken grammar learning in the atis domain,” in Proceedings International Conference on Acoustics, Speech, and Signal Processing, Orlando, Florida, 2002. [5] C. Sidner, “Focusing in the comprehension of definite anaphora,” in Computational Model of Discourse, M. Brady, Berwick, R., eds, 1983, Cambridge, MA, pp. 267-330, The MIT Press. [6] D. Littman and S. Pan, “Empirically evaluating an adaptable spoken language dialog system,” in The Proceedings of International Conference on User Modeling, UM ’99, Banff, Canada, 1999. [7] S. Pradham and W. Ward, “Estimating Semantic Confidence for Spoken Dialogue Systems, “ Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processijng (ICASSP-2002), Orlando, Florida, USA, May 2002.

  33. References [8] M. Walker, et al., “PARADISE: A Framework for Evaluating Spoken Dialogue Agents, “ Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL-97), pp. 271-289, 1997. [9] F. Zheng, J. Hamaker, F. Goodman, B. George, N. Parihar, and J. Picone, “The ISIP 2001 NRL Evaluation for Recognition of Speech in Noisy Environments,” presented at the Speech In Noisy Environments (SPINE) Workshop, Orlando, Florida, USA, November 2001. [10] F. Zheng and J. Picone, "Robust Low Perplexity Voice Interfaces,“ MITRE Corporation, December 31, 2001. [11] S. Oviatt, “Taming Speech Recognition Errors within a Multimodal Interface, “ Communications of the ACM, Sept. 2000, 43 (9), 45-51 (special issue on "Conversational Interfaces").

More Related