1 / 32

“k hypotheses + other” belief updating in spoken dialog systems

“k hypotheses + other” belief updating in spoken dialog systems. Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department www.cs.cmu.edu/~dbohus Carnegie Mellon University dbohus@cs.cmu.edu Pittsburgh, PA 15213. problem.

dchilson
Download Presentation

“k hypotheses + other” belief updating in spoken dialog systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department www.cs.cmu.edu/~dbohus Carnegie Mellon University dbohus@cs.cmu.edu Pittsburgh, PA 15213

  2. problem spoken language interfaces lack robustness when faced with understanding errors • errors stem mostly from speech recognition • typical word error rates: 20-30% • significant negative impact on interactions

  3. guarding against understanding errors • use confidence scores • machine learning approaches for detecting misunderstadings [Walker, Litman, San-Segundo, Wright, and others] • engage in confirmation actions • explicit confirmation did you say you wanted to fly to Seoul? • yes → trust hypothesis • no → delete hypothesis • “other” → non-understanding • implicit confirmation traveling to Seoul … what day did you need to travel? • rely on new values overwriting old values related work : data : user response analysis : proposed approach: experiments and results: conclusion

  4. today’s talk … construct accurate beliefs by integrating information over multiple turns in a conversation S: Where would you like to go? U: Huntsville [SEOUL / 0.65] destination = {seoul/0.65} S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M / 0.60] destination = {?}

  5. belief updating: problem statement • given • an initial belief Binitial(C) over concept C • a system action SA • a user response R • construct an updated belief • Bupdated(C) ← f (Binitial(C), SA, R) destination = {seoul/0.65} S: traveling to Seoul. What day did you need to travel? [THE TRAVELING TO BERLIN P_M / 0.60] destination = {?}

  6. outline • proposed approach • data • experiments and results • effect on dialog performance • conclusion proposed approach: data: experiments and results : effect on dialog performance : conclusion

  7. belief updating: problem statement • given • an initial belief Binitial(C) over concept C • a system action SA(C) • a user response R • construct an updated belief • Bupdated(C) ← f(Binitial(C),SA(C),R) destination = {seoul/0.65} S: traveling to Seoul. What day did you need to travel? [THE TRAVELING TO BERLIN P_M / 0.60] destination = {?} proposed approach: data: experiments and results : effect on dialog performance : conclusion

  8. belief representation Bupdated(C) ← f(Binitial(C), SA(C), R) • most accurate representation • probability distribution over the set of possible values • however • system will “hear” only a small number of conflicting values for a concept within a dialog session • in our data • max = 3 (conflicting values heard) • only in 6.9% of cases, more than 1 value heard proposed approach: data: experiments and results : effect on dialog performance : conclusion

  9. belief representation Bupdated(C) ← f(Binitial(C), SA(C), R) • compressed belief representation • k hypotheses + other • at each turn, the system retains the top m initial hypotheses and adds n new hypotheses from the input (m+n=k) proposed approach: data: experiments and results : effect on dialog performance : conclusion

  10. belief representation Bupdated(C) ← f(Binitial(C), SA(C), R) • B(C) modeled as a multinomial variable • {h1, h2, … hk, other} • B(C) = <ch1, ch2, …, chk, cother> • where ch1 + ch2 + … + chk + cother = 1 • belief updating can be cast as multinomial regression problem: Bupdated(C) ← Binitial(C) + SA(C) + R proposed approach: data: experiments and results : effect on dialog performance : conclusion

  11. system action Bupdated(C) ← f(Binitial(C), SA(C), R) proposed approach: data: experiments and results : effect on dialog performance : conclusion

  12. user response Bupdated(C) ← f(Binitial(C), SA(C), R) proposed approach: data: experiments and results : effect on dialog performance : conclusion

  13. approach Bupdated(C) ← f(Binitial(C), SA(C), R) • problem • <uch1, … uchk, ucoth> ← f(<ich1, … ichk, icoth>, SA(C), R) • approach: multinomial generalized linear model • regression model, multinomial independent variable • sample efficient • stepwise approach • feature selection • BIC to control over-fitting • one model for each system action • <uch1, … uchk, ucoth> ← fSA(C)(<ich1, … ichk, icoth>, R) proposed approach: data: experiments and results : effect on dialog performance : conclusion

  14. outline • proposed approach • data • experiments and results • effect on dialog performance • conclusion proposed approach: data: experiments and results : effect on dialog performance : conclusion

  15. data • collected with RoomLine • a phone-based mixed-initiative spoken dialog system • conference room reservation • explicit and implicit confirmations • simple heuristic rules for belief updating • explicit confirm: yes / no • implicit confirm: new values overwrite old ones proposed approach: data: experiments and results : effect on dialog performance : conclusion

  16. corpus • user study • 46 participants (naïve users) • 10 scenario-based interactions each • compensated per task success • corpus • 449 sessions, 8848 user turns • orthographically transcribed • manually annotated • misunderstandings • corrections • correct concept values proposed approach: data: experiments and results : effect on dialog performance : conclusion

  17. outline • proposed approach • data • experiments and results • effect on dialog performance • conclusion proposed approach: data: experiments and results : effect on dialog performance : conclusion

  18. baselines • initial baseline • accuracy of system beliefs before the update • heuristic baseline • accuracy of heuristic update rule used by the system • oracle baseline • accuracy if we knew exactly when the user corrects proposed approach: data: experiments and results : effect on dialog performance : conclusion

  19. k=2 hypotheses + other Informative features • priors and confusability • initial confidence score • concept identity • barge-in • expectation match • repeated grammar slots proposed approach: data: experiments and results : effect on dialog performance : conclusion

  20. outline • proposed approach • data • experiments and results • effect on dialog performance • conclusion proposed approach: data: experiments and results : effect on dialog performance : conclusion

  21. a question remains … … does this really matter? what is the effect on global dialog performance? proposed approach: data: experiments and results : effect on dialog performance : conclusion

  22. let’s run an experiment guinea pigs from Speech Lab for exp: $0 getting change from guys in the lab: $2/$3/$5 real subjects for the experiment: $25 picture with advisor of the VERY last exp at CMU: priceless!!!! [courtesy of Mohit Kumar]

  23. a new user study … • implemented models in RavenClaw, performed a new user study • 40 participants, first-time users • 10 scenario-driven interactions each • non-native speakers of North-American English • improvements more likely at higher WER • supported by empirical evidence • between-subjects; 2 gender-balanced groups • control: RoomLine using heuristic update rules • treatment: RoomLine using runtime models proposed approach: data: experiments and results : effect on dialog performance : conclusion

  24. even though control 21.9% average user WER treatment 24.2% effect on task success control 73.6% task success 81.3% treatment proposed approach: data: experiments and results : effect on dialog performance : conclusion

  25. 78% 78% 64% 30% WER 16% WER effect on task success … a closer look probability of task success word error rate Task Success ← 2.09 - 0.05∙WER + 0.69∙Condition p=0.001 proposed approach: data: experiments and results : effect on dialog performance : conclusion

  26. improvements at different WER absolute Improvement in task success word-error-rate proposed approach: data: experiments and results : effect on dialog performance : conclusion

  27. effect on task duration (for successful tasks) • ANOVA on task duration for successful tasks Duration ← -0.21 + 0.013∙WER - 0.106∙Condition • significant improvement, equivalent to 7.9% absolute reduction in WER proposed approach: data: experiments and results : effect on dialog performance : conclusion

  28. outline • proposed approach • data • experiments and results • effect on dialog performance • conclusion proposed approach: data: experiments and results : effect on dialog performance : conclusion

  29. summary • data-driven approach for constructing accurate system beliefs • integrate information across multiple turns • bridge together detection of misunderstandings and corrections • significantly outperforms current heuristics • significantly improves effectiveness and efficiency

  30. other advantages • sample efficient • performs a local one-turn optimization • good local performance leads to good global performance • scalable • works independently on concepts • 29 concepts, varying cardinalities • portable • decoupled from dialog task specification • doesn’t make strong assumptions about dialog management technology

  31. thank you! questions …

  32. user study • 10 scenarios, fixed order • presented graphically (explained during briefing) • participants compensated per task success

More Related