Cost of Misunderstandings

Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu) Work by: Dan Bohus, Alex Rudnicky Carnegie Mellon University, 2001

Outline • Quick overview of previous utterance-level confidence annotation work. • Modeling the cost of misunderstandings in spoken dialog systems. • Experiments & results. • Further analysis. • Summary, further work, conclusion Modeling the cost of misunderstanding …

Utterance-Level Confidence Annotation Overview • Confidence annotation = data-driven classification • Corpus: 2 months, 131 dialogs, 4550 utterances. • Features: 12 features from decoder, parsing, dialog management levels. • Classifiers: Decision Tree, ANN, BayesNet, AdaBoost, NaiveBayes, SVM + Logistic Regression model (later on). Modeling the cost of misunderstanding …

Confidence annotator performance • Baseline error rate: 32 % • Garble baseline: 25 % • Classifiers performance: 16 % • Differences between classifiers are statistically insignificant except for Naïve Bayes • On a soft-metric, logistic regression model clearly outperformed the others • But is this the right way to evaluate performance? Modeling the cost of misunderstanding …

Judging Performance • Classification Error Rate (FP+FN). • Assumes implicitly that FP and FN errors have same cost • But cost of misunderstanding in dialog systems is presumably different for FPs and FNs. • Build an error function which take into account these costs, and optimize for that. • Cost also depends on • domain/system ~ not a problem • dialog state Modeling the cost of misunderstanding …

Problem Formulation • (1) Develop a cost model which allows us to quantitatively assess the costs of FP and FN errors. • (2) Use the costs to pick the optimal tradeoff point on the classifier ROC. Modeling the cost of misunderstanding …

The Cost Model • Model the impact of the FPs and FNs on the system performance • Identify a suitable performance metric P • Build a statistical regression model at the dialog session level: • P = f(FPs, FNs) • P = k + CostFP*FP + CostFN*FN (Linear Regr) • Then we can plot f, and implicitly optimize for P Modeling the cost of misunderstanding …

Measuring Performance • User Satisfaction (i.e. 5-point scale) • Hard to get • Very subjective ~ hard to make it consistent across users • Concept transfer efficiency: • CTC: correctly transferred concepts per turn • ITC: incorrectly transferred concepts per turn • Completion Modeling the cost of misunderstanding …

Detour : The Dataset • 134 dialogs (2561 utterances), collected using 4 scenarios • Satisfaction scores only for 35 dialogs • Corpus manually labeled at the concept and level • 4 labels: OK / RBAD / PBAD / OOD • Aggregate utterance labels generated • Confidence annotator decisions logged • Computed counts of FPs, FNs, CTCs, ITCs for each session Modeling the cost of misunderstanding …

Example • U: I want to fly from Pittsburgh to Boston • S: I want to fly from Pittsburgh to Austin • C: [I_want/OK] [Depart_Loc/OK] [Arrive_Loc/RBAD] • Only 2 relevantly expressed concepts • If Accept: CTC = 1, ITC = 1 • If Reject: CTC = 0, ITC = 0 Modeling the cost of misunderstanding …

Targeting Efficiency: Model 1 • 3 Successively refined models • CTC = FP + FN + TN + k • CTC - correctly transferred concepts / turn • TN – true negatives Modeling the cost of misunderstanding …

Targeting Efficiency: Model 2 • CTC - ITC = (REC +) FP + FN + TN + k • ITC - incorrectly transferred concepts / turn • REC – relevantly expressed concepts Modeling the cost of misunderstanding …

Targeting Efficiency: Model 3 • CTC-ITC = REC+FPC+FPNC+FN+TN+k • 2 types of FPs: • With concepts - FPC • Without concepts - FPNC Modeling the cost of misunderstanding …

Model 3 - Results • CTC-ITC = REC+FPC+FPNC+FN+TN+k Modeling the cost of misunderstanding …

Other models • Completion (binary) • Logistic regression model • Estimated model does not indicate a good fit • User satisfaction (5-point scale) • Based on only 35 dialogs • R2 = 0.61 (similar to literature – Walker et al) • Explanation: subjectivity of metric + limited dataset Modeling the cost of misunderstanding …

Problem Formulation • (1) Develop a cost model which allows us to quantitatively assess the costs of FP and FN errors. • (2) Use the costs to pick the optimal tradeoff point on the classifier ROC. Modeling the cost of misunderstanding …

Tuning the Confidence Annotator • Using Model 3 • CTC-ITC = REC+FPNC+FPC+FN+TN+k • Drop k & REC, plug in the values • Cost = 0.48FPNC+2.12FPC+1.33FN+0.56TN • Minimize Cost instead of Classification Error Rate (FP+FN), and we’ll implicitly maximize concept transfer efficiency. Modeling the cost of misunderstanding …

Operating Characteristic Modeling the cost of misunderstanding …

Further Analysis • Is CTC-ITC really modeling dialog performance ? • Mean = 0.71, Std.Dev = 0.28 • Mean for completed dialogs = 0.82 • Mean for uncompleted dialogs = 0.57 • Difference between means significant at very high level of confidence • P-value = 7.23*10-9 (in t-test) • So, looks like CTC-ITC is okay, right ? Modeling the cost of misunderstanding …

Further Analysis (cont’d) • Can we reliably extrapolate to other areas of the operating characteristic ? Modeling the cost of misunderstanding …

Further Analysis (cont’d) • Can we reliably extrapolate to other areas of the operating characteristic ? • Yes, look at the distribution of the FP and FN ratios across dialogs. Modeling the cost of misunderstanding …

Further Analysis (cont’d) • Impact of baseline error rate ? • Compared models constructed based on high and low error rates: • For low error rate curve becomes monotonically increasing • This clearly indicates that “trust everything / have no confidence ” is the way to go in this setting Modeling the cost of misunderstanding …

Our explanation so far… • Ability to easily overwrite incorrectly captured information in the CMU Communicator • Relatively low error rates • Likelihood of repeated misrecognition is low Modeling the cost of misunderstanding …

Conclusion • Data-driven approach to quantitatively assess the costs of various types of misunderstandings. • Models based on efficiency fit data well; obtained costs confirm intuition. • For CMU Communicator, model predicts that total cost stays the same across a large range of the operating characteristic of the classifier. Modeling the cost of misunderstanding …

Further Experiments • But, of course, we can verify predictions experimentally • Collect new data with the system running with a very low threshold. • 55 dialogs collected so far. • Thanks to those who have participated in these experiments. • “Help if you have the time” to the others … www.cs.cmu.edu/~dbohus/scenarios.htm • Re-estimate models, verify predictions Modeling the cost of misunderstanding …

Confusion Matrix • FP = False acceptance • FN = False detection/rejection • Fallout = FP/(FP+TN) = FP/NBAD • CDR = 1-Fallout = 1-(FP/NBAD) Modeling the cost of misunderstanding …

Cost of Misunderstandings