1 / 33

Ronnie W. Smith East Carolina University

Trying to Understand Misunderstanding: How Robust Can Spoken Natural Language Dialogue Systems Be?. Ronnie W. Smith East Carolina University. Sponsors. National Science Foundation Duke University East Carolina University DARPA BBN. Collaborators.

verena
Download Presentation

Ronnie W. Smith East Carolina University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Trying to Understand Misunderstanding: How Robust Can Spoken Natural Language Dialogue Systems Be? Ronnie W. Smith East Carolina University

  2. Sponsors • National Science Foundation • Duke University • East Carolina University • DARPA • BBN

  3. Collaborators 1987-1994: Dr. Alan Biermann, Dr. Ruth Day, Dr. Robert Rodman, Richard Hipp, Barry Koster, Dania Egedi, Robin Gambill, Curry Guinn 1994-2000: Dr. Steve Gordon, Robert Hoggard, Shannon Pollard, Chris Shaffer, Greg Keim, Jeremy Mauget, Devender Singh, Jason Carrow, Brian Manning

  4. Quick Outline • Initial motivation and results • Careful studies • Not so careful studies • What I’d really like to do!

  5. Initial Research Focus Develop a model of dialog processing that: • Uses problem-solving to achieve target goal • Exploits contextual knowledge from the discourse and user • Allows mixed-initiative interaction (computer or user controlled) • Allows voice interaction • Operates in real time to permit experimentation

  6. Experimental Evaluation Issues: • Is computational model practical? • Would users dialog behavior change as they gained experience and were given control of the dialog? Hypotheses --As users gained experience and were given the initiative, they • Would complete the task faster • Would speak fewer utterances • Would speak longer utterances • Would spend more time thinking before producing a response Side effect: analyzable data

  7. Data Collection: The Circuit Fix-It Shop • Constructed at Duke University • Assisted users in repairing an electronic circuit • 125 word vocabulary, speaker-dependent connected speech • Varying initiative fixed per interaction • 141 dialogs, 2840 user utterances • 114 or 84% of the dialogs were successfully completed • 50% utterance recognition rate • 81.5% correct interpretation rate

  8. Sample Interaction – Part 1 C: This is the Circuit Fix-It Shop. How may I help you? U: This RS111 circuit is not working. C: I am familiar with that circuit. The LED is supposed to be displaying alternately flashing one and seven. U: The LED is flashing seven. C: Did you mean to say that the LED is flashing? U: The LED is displaying a flashing seven. C: Is the one on the LED displaying? U: No. C: Is anything else on the LED displaying other than the NOT** flashing seven? U: The seven is flashing. C: Is anything else on the LED displaying other than the flashing seven? U: No. C: What is the switch at when the LED is displaying only a flashing seven? U: It is up. C: What is the knob at when the LED is displaying only a flashing seven? U: The knob is at ten.

  9. Sample Interaction – Part 2 C: Is there a wire between connector one eight and connector four one? U: Where is connector four one? C: Below the white knob with the orange circle is a pink box, called transistors. Now, in the upper right of this pink box, called transistors is connector four one. U: The wire is not there. C: Did you mean to say that there IS** a wire between connectors one eight and four one? U: There is no wire. C: Add a wire between connector one eight and connector four one.

  10. Initial Miscommunication Handling Techniques Error-correcting Parsing (Hipp) • Convert input to “nearest” grammatical utterance • “nearest” is determined by a cost matrix for insertions, deletions, and substitutions of words • Costs are not all the same (e.g., “a” vs. “not”) Tell the user what went wrong • Only tell user what computer’s interpretation was • Only when misrecognition caused contradictory interpretation (but required for only 48% of these)

  11. What to Do Next? Get a better speech recognizer! Well--- • Better is not the same as perfect! • Better => stretch its limits anyway • There will probably always be ungrammatical spoken inputs. • There will always be mismatched speaker/hearer background knowledge.

  12. What to Do Next? Investigate strategies for the prevention, detection, and repair of miscommunication in natural language dialog • Detailed analysis of existing dialogs • Development and evaluation of strategies for handling miscommunication

  13. Effects of Variable Initiative on Linguistic Behavior in Human-Computer Spoken Natural Language Dialog • Smith and Gordon (Computational Linguistics, March 1997) • Based on Circuit Fix-It Shop Data • Based on classifying utterances according to task phase • Introduction: establish task purpose • Assessment: establish current system behavior • Diagnosis: establish cause for errant behavior • Repair: establish completion of correction • Test: establish correctness of behavior

  14. Result 1: Relative Number of Utterances Conclusion: Experienced users tend not to discuss details they can handle themselves.

  15. Result 2: Frequency of User Subdialog Transistions Conclusion: Computer initiates most subdialogs except when experienced users are completing the task.

  16. Result 3: Predictability of Subdialog Transistions Idealized Transition Model I D R A T F

  17. Result 3: Predictability of Subdialog Transistions Empirical Transition Model 100 91 69 62 96 I D R A T F 8 80 75 72 97 19 12 39 53 25 Computer controlled % User controlled % 24 • Percentage “normal” dialogs • Computer-controlled: 64% • User-controlled: 33%

  18. Study Conclusions Computer controlled dialogs--- • Have an orderly pattern of computer-initiated subdialogs • Have terse user responses • Are not amenable to user-correction during miscommunication User controlled dialogs--- • Are less orderly • Contain more user-initiated subdialogs • Indicate user willingness to exploit growing expertise

  19. Analysis of Strategies for Selective Utterance Verification • Smith (ANLP, 1997; IJHCS, 1998) • Motivation---miscommunication due to speech recognition errors Spoken: I want to fix this circuit Recognized: power a six a circuit Spoken: there is no wire on connector one zero four Recognized: stays no wire I connector one zero four

  20. Verification Subdialogs Computer: This is the circuit fix-it shop. How may I help you? Spoken: I want to fix a circuit. Recognized: power a six a circuit. Computer: Did you mean to say there is a power circuit? WHEN TO USE THIS??

  21. Goal: SelectiveVerification • Initiate a verification subdialog only when it is believed to be needed. • Criteria for need: sufficiently unsure you’ve fully understood AND the need to fully understand is sufficiently great. • Terminology • Under-verification---system generates an incorrect meaning that is not verified • Over-verification---a correct meaning is verified • Ideal: minimize under-verifications while keeping over-verifications to a minimum as well

  22. Measurements of Uncertainty • Parse Cost---sum of costs incurred by error-correcting parser in transforming input to a grammatical utterance • Expectation Cost---how expected was the response given the dialog context

  23. Measuring Utterance Importance • Unexplored • Domain-dependent? • Fixed-threshold (depends on risk due to miscommunication)

  24. Basic Results Generalization: This result holds under most threshold values.

  25. Study Conclusions • Key parameters for selecting strategy and threshold value: expectation strength, word recognition rate, and penalty for under-verifications • Results depend on the level of system initiative • Results assume all verification subdialogs succeed • Limits to verification subdialogs => other strategies are needed.

  26. General Problem • “Knowing” when you don’t know. OR • “Suspecting” when you don’t know. BUT • If you don’t know, how do you talk about it? THUS • System must provide some information on its understanding of context

  27. Domain-Level Techniques • Detection of dead ends (suspicious error path checks out OK) C: The LED is supposed to be displaying alternately flashing one and seven. U: LED displaying seven (misrecognized and understood as “LED displaying flashing seven”) C: Is anything else on the LED on? U: No. C: Is there a wire between connectors 18 and 41? U: Yes C: Are you sure that the LED is displaying only a flashing seven?

  28. Domain-Level Techniques • Detection of impossibilities (description not possible given “known” conditions) C: What is the LED displaying? U: The LED is displaying a one on the left hand side. C: Is anything else on the LED on? U: No. (NOTE: misstatement---seven also displaying) C: Is the one on the LED flashing? U: Yes. C: Are you sure that the one on the LED is flashing? U: Yes. C: Are you sure that the seven on the LED is off? U: The LED is displaying alternately flashing one and seven.

  29. Domain-Level Techniques In general, these are detections of logical inconsistencies, but are their general dialogue strategies for discussing them?

  30. Grounding---establishing common context of understanding • Beyond context free acknowledgments U: The LED is displaying a flashing seven. (misunderstood) C: Is the one on the LED displaying? U: No. C: Is anything else on the LED displaying other than the NOT** flashing seven? U: The seven is flashing. C: Is anything else on the LED displaying other than the flashing seven? U: No. • How much grounding is enough?

  31. The Initiative Factor • Explicit semantic grounding leverages human intelligence • But only if we allow it! C: Whether there is a wire between connectors 111 and 120 needs to be observed. U: There is a wire between connectors 111 and 120 (misrecognized). C: There is not supposed to be a wire between connectors 111 and 1. U: (repeats and is understood)

  32. The Initiative Factor • Too much computer control? C: What is the voltage between connectors 121 and 120? (computer suspects problem in power circuit) U: I believe there’s a problem with the LED circuit. C: What is the voltage between connectors 121 and 120? Suppose instead the response is--- U: Voltage is eight (misrecognized as control knob at six). Then computer response is okay.

  33. Final Thoughts/Summary • Design systems to leverage human intelligence • The ability to follow step-by-step instruction? • Varying levels of system initiative • VERY challenging when user expertise evolves. • Menus vs. keyboard shortcuts???? • Explicit semantic grounding • Verification subdialogs, etc. • What’s the right amount? • We still need carefully designed studies with real systems!!!

More Related