1 / 27

sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies

sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies. Dan Bohus www.cs.cmu.edu/~dbohus Alexander I. Rudnicky www.cs.cmu.edu/~air Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15213.

zariel
Download Presentation

sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohus www.cs.cmu.edu/~dbohus Alexander I. Rudnicky www.cs.cmu.edu/~air Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15213

  2. System extracts incorrect information from the user’s turn MIS-understanding S: What city are you leaving from? U: Birmingham [BERLIN PM] • System cannot extract any meaningful information from the user’s turn NON-understanding S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] systems often do not understand correctly • non-understandings and misunderstandings

  3. System cannot extract any meaningful information from the user’s turn NON-understanding S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] systems often do not understand correctly • detection • strategies • policy (knowing how to engage the strategies) • typically trivial; although diagnosis is not • large space of strategies • tradeoffs between them not well understood • simple heuristics: “incremental prompting”

  4. questions under investigation • what are the main causes of non-understandings? • how large is their impact on performance? • how do various recovery strategies compare to each other? • what are the relationships between strategies and user behaviors? • data • can we improve global dialog performance by using a smarter policy? • if yes, can we learn a better policy from data?

  5. data collection • Roomline • phone-based, mixed-initiative system • conference room reservations • experimental design • control group: uninformed recovery policy • wizard group: recovery policy implemented by wizard • 46 participants, first-time users • tasks & experimental procedure • up to 10 scenario-driven interactions

  6. non-understanding recovery strategies S: For when do you need the conference room? 1. ASK REPEAT Could you please repeat that? 2. ASK REPHRASE Could you please try to rephrase that? 3. NOTIFY (NTFY) Sorry, I didn’t catch that ... 4. YIELD TURN (YLD) … 5. REPROMPT (RP) For when do you need the conference room? 6. DETAILED REPROMPT (DRP) Right now I need to know the date and time for when you need the reservation … 7. MOVE-ON Sorry, I didn’t catch that. For which day you need the room? 8. YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am … 9. TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am … 10. FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say something like tomorrow at 10 am …

  7. corpus statistics • 449 sessions • 8278 user turns • utterances transcribed and checked • manual annotations • misunderstandings • correct concept values at each turn • sources of understanding errors • user response-types to recovery strategies

  8. questions under investigation • data • what are the main causes of non-understandings? • how large is their impact on performance? • how do various recovery strategies compare to each other? • what are the relationships between strategies and user behaviors?

  9. Goal Interpretation Semantics Parsing Text Recognition Audio channel End-pointing causes of non-understandings system user conversationlevel intentionlevel signallevel channellevel

  10. causes of non-understandings out-of-application conversationlevel 16% out-of-grammar intentionlevel 16% ASR error signallevel 62% endpointer error channellevel

  11. questions under investigation • data • what are the main causes of non-understandings? • how large is their impact on performance? • how do various recovery strategies compare to each other? • what are the relationships between strategies and user behaviors? data: causes of non-understandings : impact on performance : strategy comparison : user behaviors

  12. modeling impact on performance • logistic regression • P(Task Success) = 1 1 + e-(α + β·FNON)

  13. questions under investigation • data • what are the main causes of non-understandings? • how large is their impact on performance? • how do various recovery strategies compare to each other? • what are the relationships between strategies and user behaviors? data: causes of non-understandings : impact on performance : strategy comparison : user behaviors

  14. strategy performance – recovery rate • overall logistic ANOVA • significant differences in mean recovery rates recovery rate Help Yield Notify MoveOn RePrompt AskRepeat YouCanSay AskRephrase TerseYouCanSay DetailedReprompt • all pairs comparison (corrected using FDR)

  15. questions under investigation • data • what are the main causes of non-understandings? • how large is their impact on performance? • how do various recovery strategies compare to each other? • what are the relationships between strategies and user behaviors? data: causes of non-understandings : impact on performance : strategy comparison : user behaviors

  16. user response types • tagging scheme by Shin • also used by Choularton, Raux • 5 categories • repeat • rephrase • contradict • change • other

  17. response types after non-understaning 50% Communicator (Shin et al.) 40% Pizza (choularton & dale) Roomline (this study) 30% 20% 10% 0% contradict change other rephrase repeat

  18. user response types by strategy 100% Other 80% Change Rephrase 60% Repeat 40% 20% 0% Help Yield Notify MoveOn RePrompt AskRepeat YouCanSay AskRephrase TerseYouCanSay DetailedReprompt

  19. summary • sources of non-understandings • impact on performance • strategy comparison • user responses • asr, but also “language” errors → more shaping strategies … • regression model allows better quantitative assessment • help, “move-on” → further investigate “move-on” • margin for improving control over user responses • can we improve global dialog performance by using a smarter policy? • can we learn a better policy from data? • yes • preliminary results promising … 

  20. thank you! questions …

  21. Before rejectionmechanism After rejectionmechanism False rejections Correct rejections Figure 3. Misunderstandings and non-understandings before and after rejections rejections

  22. strategy performance assessment • recovery rate • recovery utility • weighted sum of correctly and incorrectly acquired concepts • weights are determined in a data-driven fashion • recovery efficiency • also takes time to recovery into account

  23. experimental design: scenarios • 10 scenarios, fixed order • presented graphically (explained during briefing)

  24. strategy pair-wise comparison • recovery performance ranked list, based on pair-wise t-tests: • CER evaluation shows similar results

  25. recovery for various response-types

  26. impact of recovery rate on performance • recovery = next turn is correctly understood • P(Task Success) = 1 1 + e-(α + β·RecoveryRate)

More Related