html5-img
1 / 21

Ian R. Lane, Tatsuya Kawahara Spoken Language Communications Research Laboratories, ATR

Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整合性 を用いた音声認識誤りの検出. Ian R. Lane, Tatsuya Kawahara Spoken Language Communications Research Laboratories, ATR School of Informatics, Kyoto University. Introduction.

ismael
Download Presentation

Ian R. Lane, Tatsuya Kawahara Spoken Language Communications Research Laboratories, ATR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verificationドメイン内の信頼度と談話の整合性を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara Spoken Language Communications Research Laboratories, ATR School of Informatics, Kyoto University

  2. Introduction • Current ASR technologies not robust against: • Acoustic mismatch: noise, channel, speaker variance • Linguistic mismatch: disfluencies, OOV, OOD • Assess confidence of recognition hypothesis, and detect recognition errors Effective user feedback • Select recovery strategy based on type of error and specific application

  3. Previous Works on Confidence Measures • Feature-based • [Kemp] word-duration, AM/LM back-off • Explicit model-based • [Rahim] likelihood ratio test against cohort model • Posterior probability • [Komatani, Soong, Wessel] estimate posterior probability given all competing hypotheses in a word-graph Approaches limited to “low-level” information available during ASR decoding

  4. Proposed Approach • Exploit knowledge sources outside ASR framework for estimating recognition confidence e.g. knowledge about application domain, discourse flow Incorporate CM based on “high-level” knowledge sources • In-domain confidence • degree of match between utterance and application domain • Discourse coherence • consistency between consecutive utterances in dialogue

  5. CMin-domain(Xi): in-domain confidence CMdiscourse(Xi|Xi-1): discourse coherence CM(Xi): joint confidence score, combine above with generalized posteriorprobability CMgpp(Xi) Utterance Verification Framework Input utterance Out-of-domain Detection Xi-1 ASR front-end Topic Classification In-domain Verification CMin-domain(Xi-1) dist(Xi,Xi-1) CMdiscourse(Xi|Xi-1) Out-of-domain Detection CM(Xi) Xi ASR front-end Topic Classification In-domain Verification CMin-domain(Xi) CMgpp(Xi)

  6. In-domain Confidence • Measure of topic consistency with application domain • Previously applied in out-of-domain utterance detection • Examples of errors detected via in-domain confidence • Mismatch of domain • REF: How can I print this WORD file double-sided • ASR: How can I open this word on the pool-side • hypothesis not consistent by topic in-domain confidence low • Erroneous recognition hypothesis • REF: I want to go to Kyoto, can I go by bus • ASR: I want to go to Kyoto, can I take a bath • hypothesis not consistent by topic in-domain confidence low REF: correct transcription ASR: speech recognition hypothesis

  7. In-domain Confidence Input UtteranceXi (recognition hypothesis) Transformation to Vector-space Feature Vector Classification of Multiple Topics SVM (1~m) Topic confidence scores(C(t1|Xi), ... ,C(tm|Xi)) In-Domain Verification Vin-domain(Xi) CMin-domain(Xi) In-domain confidence

  8. (a, an, …, room, …, seat, …, I+have, … (1, 0 , …, 0 , …, 1 , …, 1 , … accom. airplane airport … 0.05 0.36 0.94 90 % In-domain Confidence Input UtteranceXi (recognition hypothesis) e.g. ‘could I have a non-smoking seat’ Transformation to Vector-space Classification of Multiple Topics SVM (1~m) In-Domain Verification Vin-domain(Xi) CMin-domain(Xi)

  9. In-domain Verification Model • Linear discriminate verification model applied • 1, …, mtrained on in-domain data using “deleted interpolation of topics” and GPD [lane ‘04] C(tj|Xi): topic classification confidence score of topic tj for input utterance X j: discriminate weight for topic tj

  10. Discourse Coherence • Topic consistency with preceding utterance • Examples of errors detected via discourse-coherence • Erroneous recognition hypothesis • Speaker A: Previous utterance [Xi-1] • REF: What type of shirt are you looking for? • ASR: What type of shirt are you looking for? • Speaker B: Current utterance [Xi] • REF: I’m looking for a white T-shirt. • ASR: I’m looking for a white teacher. • topic not consistent across utterances • discourse coherence low REF: correct transcription ASR: speech recognition hypothesis

  11. Discourse Coherence • Euclidean distance between current (Xi) and previous (Xi-1) utterances in topic confidence space • CMdiscourse large when Xi, Xi-1 related, low when differ

  12. Joint Confidence Score Generalized Posterior Probability • Confusability of recognition hypothesis against competing hypotheses [Lo & Soong] • At utterance level: GWPP(xj): generalized word posterior probability of xj xj: j-th word in recognition hypothesis of X

  13. Joint Confidence Score where • For utterance verification compare CM(Xi) to threshold () • Model weights (gpp, in-domain, discourse), and threshold () trained on development set

  14. Experimental Setup • Training-set: ATR BTEC (basic-travel-expressions-corpus) • ~400k sentences (Japanese/English pairs) • 14 topic classes (accommodation, shopping, transit, …) • Train: topic-classification + in-domain verification models • Evaluation data: ATR MAD (machine aided dialogue) • Natural dialogue between English and Japanese speakers via ATR speech-to-speech translation system • Dialogue data collected based on set of pre-defined scenarios • Development-set: 270 dialogues Test-set: 90 dialogues On development set train: CM sigmoid transforms CM weights (gpp, in-domain, discourse) Verification threshold ()

  15. Speech Recognition Performance • ASR performed with ATRASR; 2-gram LM applied during decoding, rescore lattice with 3-gram LM

  16. Evaluation Measure • Utterance-based Verification • No definite “keyword” set in S-2-S translation • If recognition error occurs (one or more errors)  prompt user to rephrase entire utterance • CER (confidence error rate) • FA: false acceptance of incorrectly recognized utterance • FR: false rejection of correctly recognized utterance

  17. GPP-based Verification Performance • Accept All: Assume all utterances are correctly recognized • GPP: Generalized posterior probability Accept All Accept All GPP GPP • Large reduction in verification errors compared with “Accept all” case • CER 17.3% (Japanese) and 15.3% (English)

  18. Incorporation of IC and DC Measures (Japanese) GPP: Generalized posterior probability IC:In-domain confidenceDC:Discourse coherence GPP GPP +IC GPP +DC GPP +IC +DC • CER reduced by 5.7% and 4.6% for “GPP+IC” and “GPP+DC” cases • CER 17.3%  15.9% (8.0% relative) for “GPP+IC+DC” case

  19. Incorporation of IC and DC Measures (English) GPP: Generalized posterior probability IC:In-domain confidenceDC:Discourse coherence GPP GPP +IC GPP +DC GPP +IC +DC • Similar performance for English side • CER 15.3%  14.4% for “GPP+IC+DC” case

  20. Conclusions • Proposed novel utterance verification scheme incorporating “high-level” knowledge In-domain confidence: degree of match between utterance and application domain Discourse coherence: consistency between consecutive utterances • Two proposed measures effective • Relative reduction in CER of 8.0% and 6.1% (Japanese/English)

  21. Future work • “High-level” content-based verification • Ignore ASR-errors that do not affect translation quality Further improvement in performance • Topic Switching • Determine when users switch task Consider single task per dialogue session

More Related