1 / 21

a principled approach for rejection threshold optimization

a principled approach for rejection threshold optimization. Dan Bohus www.cs.cmu.edu/~dbohus Alexander I. Rudnicky www.cs.cmu.edu/~air Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217. understanding errors and rejection. systems often misunderstand

hogan
Download Presentation

a principled approach for rejection threshold optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. a principled approach for rejection threshold optimization Dan Bohus www.cs.cmu.edu/~dbohus Alexander I. Rudnicky www.cs.cmu.edu/~air Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217

  2. understanding errors and rejection • systems often misunderstand • use confidence scores • common design pattern • compare input confidence against a threshold • reject utterance if confidence is too low • may lead to false rejections

  3. 75% 50% 25% 0% 0 0.25 0.5 0.75 1 rejection threshold rejection tradeoff • misunderstandings vs. false rejections false rejections misunderstandings

  4. 0 0.25 0.5 0.75 1 rejection threshold rejection tradeoff • misunderstandings vs. false rejections • correctly vs. incorrectly transferred concepts correctly transferred concepts / turn incorrectly transferred

  5. question given this trade-off, how can we optimize the rejection threshold in a principled fashion?

  6. outline • current solutions • proposed approach • data • results • conclusion

  7. current solutions • follow ASR manual [Nuance documentation] • acknowledge the tradeoff + postulate costs • “misunderstandings are X times more costly than false rejections” [Raymond et al 2004; Kawahara et al, 2000; Cuayahuitl et al, 2002] • costs are likely to differ • across domains / systems • across dialog states within a system

  8. identify a set of variables involved in the tradeoff correctly and incorrectly transferred concepts per turn (CTC, ITC) CTC ITC proposed approach • derive costs in a principled fashion 2. choose a dialog performance metric task completion (binary, kappa) – TC; 3. build a regression model logit(TC) ← C0 + CCTC•CTC + CITC•ITC 4. optimize threshold to maximize performance th* = argmax (CCTC•CTC + CITC•ITC)

  9. state-specific costs • costs are different in different dialog states • CTC and ITC on a per-state basis logit(TC) ← C0 + CCTCstate1•CTCstate1 + CITCstate1•ITCstate1+ CCTCstate2•CTCstate2 + CITCstate2•ITCstate2+ CCTCstate3•CTCstate3 + CITCstate3•ITCstate3+ … • optimize separate threshold for each state thstate_x* = argmax (CCTCstate_x•CTCstate_x + CITCstate_x•ITCstate_x)

  10. outline • current solutions • proposed approach • data • results • conclusion

  11. data • collected using RoomLine • phone-based, mixed-initiative spoken dialog system • conference room reservations • sphinx-2 • utterance-level confidence annotator [0-1] • 46 participants (first-time users) • 10 scenario-driven interactions • corpus • 449 dialog sessions • 8278 user turns • manually labeled decoded concept “correctness”

  12. roomline states • 71 “dialog states” total • clustered into 3 classes • open-request How may I help you? • request(bool) Would you like a reservation for this room? Would you like a room with a projector? • request(non-bool) For what time would you like to reserve the room?

  13. cost coefficients Variable Coeff p se Const -2.3442 0.0416 1.1504 CTC / open-request 0.5518 0.0619 0.2955 ITC / open-request -0.4067 0.3801 0.4634 CTC / request(bool) 3.3127 0.0010 1.0076 ITC / request(bool) -0.5959 0.6491 1.3098 CTC / request(non-bool) 2.5514 0.0017 0.8137 ITC / request(non-bool) -3.441 0.0018 1.1046 results: task success model model predicting binary task success

  14. correctly transferred concepts per turn incorrectly transferred concepts per turn cost coefficients utility = 0.55 x CTC – 0.40 x ITC Variable Coeff p se Const -2.3442 0.0416 1.1504 CTC / open-request 0.5518 0.0619 0.2955 ITC / open-request -0.4067 0.3801 0.4634 CTC / request(bool) 3.3127 0.0010 1.0076 ITC / request(bool) -0.5959 0.6491 1.3098 CTC / request(non-bool) 2.5514 0.0017 0.8137 ITC / request(non-bool) -3.441 0.0018 1.1046 results: threshold optimization open-request 1 0.5 0 0 0.25 0.5 0.75 1

  15. request(bool) 3 utility = 3.31 x CTC – 0.60 x ITC 2 open-request 1 correctly transferred concepts per turn incorrectly transferred concepts per turn 1 utility = 0.55 x CTC – 0.40 x ITC 0 0 0.25 0.5 0.75 1 request(non-bool) 0.5 utility = 2.55 x CTC – 3.44 x ITC 1 0 0 0.25 0.5 0.75 1 0.5 0 0 0.25 0.5 0.6 0.75 1 results: threshold optimization • utility profiles are different across the three states • task duration models lead to similar results

  16. conclusion • principled method for optimizing rejection threshold • determine costs for various types of understanding errors • data-driven approach • can derive state-specific costs • bridge mismatches between off-the-shelf confidence annotators and domain

  17. thank you

  18. fit for task success model

  19. expected changes in task success Remains to be seen …

  20. task duration model

  21. Model 2: Resulting fit and coefficients R^2 = 0.56 intro: data collection : rejection threshold

More Related