1 / 26

Cost of Misunderstandings

Cost of Misunderstandings. Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu) Work by: Dan Bohus, Alex Rudnicky Carnegie Mellon University, 2001. Outline .

keran
Download Presentation

Cost of Misunderstandings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu) Work by: Dan Bohus, Alex Rudnicky Carnegie Mellon University, 2001

  2. Outline • Quick overview of previous utterance-level confidence annotation work. • Modeling the cost of misunderstandings in spoken dialog systems. • Experiments & results. • Further analysis. • Summary, further work, conclusion Modeling the cost of misunderstanding …

  3. Utterance-Level Confidence Annotation Overview • Confidence annotation = data-driven classification • Corpus: 2 months, 131 dialogs, 4550 utterances. • Features: 12 features from decoder, parsing, dialog management levels. • Classifiers: Decision Tree, ANN, BayesNet, AdaBoost, NaiveBayes, SVM + Logistic Regression model (later on). Modeling the cost of misunderstanding …

  4. Confidence annotator performance • Baseline error rate: 32 % • Garble baseline: 25 % • Classifiers performance: 16 % • Differences between classifiers are statistically insignificant except for Naïve Bayes • On a soft-metric, logistic regression model clearly outperformed the others • But is this the right way to evaluate performance? Modeling the cost of misunderstanding …

  5. Judging Performance • Classification Error Rate (FP+FN). • Assumes implicitly that FP and FN errors have same cost • But cost of misunderstanding in dialog systems is presumably different for FPs and FNs. • Build an error function which take into account these costs, and optimize for that. • Cost also depends on • domain/system ~ not a problem • dialog state Modeling the cost of misunderstanding …

  6. Problem Formulation • (1) Develop a cost model which allows us to quantitatively assess the costs of FP and FN errors. • (2) Use the costs to pick the optimal tradeoff point on the classifier ROC. Modeling the cost of misunderstanding …

  7. The Cost Model • Model the impact of the FPs and FNs on the system performance • Identify a suitable performance metric P • Build a statistical regression model at the dialog session level: • P = f(FPs, FNs) • P = k + CostFP*FP + CostFN*FN (Linear Regr) • Then we can plot f, and implicitly optimize for P Modeling the cost of misunderstanding …

  8. Measuring Performance • User Satisfaction (i.e. 5-point scale) • Hard to get • Very subjective ~ hard to make it consistent across users • Concept transfer efficiency: • CTC: correctly transferred concepts per turn • ITC: incorrectly transferred concepts per turn • Completion Modeling the cost of misunderstanding …

  9. Detour : The Dataset • 134 dialogs (2561 utterances), collected using 4 scenarios • Satisfaction scores only for 35 dialogs • Corpus manually labeled at the concept and level • 4 labels: OK / RBAD / PBAD / OOD • Aggregate utterance labels generated • Confidence annotator decisions logged • Computed counts of FPs, FNs, CTCs, ITCs for each session Modeling the cost of misunderstanding …

  10. Example • U: I want to fly from Pittsburgh to Boston • S: I want to fly from Pittsburgh to Austin • C: [I_want/OK] [Depart_Loc/OK] [Arrive_Loc/RBAD] • Only 2 relevantly expressed concepts • If Accept: CTC = 1, ITC = 1 • If Reject: CTC = 0, ITC = 0 Modeling the cost of misunderstanding …

  11. Targeting Efficiency: Model 1 • 3 Successively refined models • CTC = FP + FN + TN + k • CTC - correctly transferred concepts / turn • TN – true negatives Modeling the cost of misunderstanding …

  12. Targeting Efficiency: Model 2 • CTC - ITC = (REC +) FP + FN + TN + k • ITC - incorrectly transferred concepts / turn • REC – relevantly expressed concepts Modeling the cost of misunderstanding …

  13. Targeting Efficiency: Model 3 • CTC-ITC = REC+FPC+FPNC+FN+TN+k • 2 types of FPs: • With concepts - FPC • Without concepts - FPNC Modeling the cost of misunderstanding …

  14. Model 3 - Results • CTC-ITC = REC+FPC+FPNC+FN+TN+k Modeling the cost of misunderstanding …

  15. Other models • Completion (binary) • Logistic regression model • Estimated model does not indicate a good fit • User satisfaction (5-point scale) • Based on only 35 dialogs • R2 = 0.61 (similar to literature – Walker et al) • Explanation: subjectivity of metric + limited dataset Modeling the cost of misunderstanding …

  16. Problem Formulation • (1) Develop a cost model which allows us to quantitatively assess the costs of FP and FN errors. • (2) Use the costs to pick the optimal tradeoff point on the classifier ROC. Modeling the cost of misunderstanding …

  17. Tuning the Confidence Annotator • Using Model 3 • CTC-ITC = REC+FPNC+FPC+FN+TN+k • Drop k & REC, plug in the values • Cost = 0.48FPNC+2.12FPC+1.33FN+0.56TN • Minimize Cost instead of Classification Error Rate (FP+FN), and we’ll implicitly maximize concept transfer efficiency. Modeling the cost of misunderstanding …

  18. Operating Characteristic Modeling the cost of misunderstanding …

  19. Further Analysis • Is CTC-ITC really modeling dialog performance ? • Mean = 0.71, Std.Dev = 0.28 • Mean for completed dialogs = 0.82 • Mean for uncompleted dialogs = 0.57 • Difference between means significant at very high level of confidence • P-value = 7.23*10-9 (in t-test) • So, looks like CTC-ITC is okay, right ? Modeling the cost of misunderstanding …

  20. Further Analysis (cont’d) • Can we reliably extrapolate to other areas of the operating characteristic ? Modeling the cost of misunderstanding …

  21. Further Analysis (cont’d) • Can we reliably extrapolate to other areas of the operating characteristic ? • Yes, look at the distribution of the FP and FN ratios across dialogs. Modeling the cost of misunderstanding …

  22. Further Analysis (cont’d) • Impact of baseline error rate ? • Compared models constructed based on high and low error rates: • For low error rate curve becomes monotonically increasing • This clearly indicates that “trust everything / have no confidence ” is the way to go in this setting Modeling the cost of misunderstanding …

  23. Our explanation so far… • Ability to easily overwrite incorrectly captured information in the CMU Communicator • Relatively low error rates • Likelihood of repeated misrecognition is low Modeling the cost of misunderstanding …

  24. Conclusion • Data-driven approach to quantitatively assess the costs of various types of misunderstandings. • Models based on efficiency fit data well; obtained costs confirm intuition. • For CMU Communicator, model predicts that total cost stays the same across a large range of the operating characteristic of the classifier. Modeling the cost of misunderstanding …

  25. Further Experiments • But, of course, we can verify predictions experimentally • Collect new data with the system running with a very low threshold. • 55 dialogs collected so far. • Thanks to those who have participated in these experiments. • “Help if you have the time” to the others … www.cs.cmu.edu/~dbohus/scenarios.htm • Re-estimate models, verify predictions Modeling the cost of misunderstanding …

  26. Confusion Matrix • FP = False acceptance • FN = False detection/rejection • Fallout = FP/(FP+TN) = FP/NBAD • CDR = 1-Fallout = 1-(FP/NBAD) Modeling the cost of misunderstanding …

More Related