1 / 62

Confidence Measures for Automatic Speech Recognition

Confidence Measures for Automatic Speech Recognition. National Taiwan Normal University Spoken Language Processing Lab. Advisor : Hsin-Min Wang Berlin Chen. Presented by Tzan-Hwei Chen. Outline. Introduction The category of estimation methods of confidence measure (CM)

tobias
Download Presentation

Confidence Measures for Automatic Speech Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Confidence Measures for Automatic Speech Recognition National Taiwan Normal University Spoken Language Processing Lab Advisor : Hsin-Min Wang Berlin Chen Presented by Tzan-Hwei Chen

  2. Outline • Introduction • The category of estimation methods of confidence measure (CM) • Featured based • Posterior probability based • Explicit model based • Incorporation of high-level information for CM* • The application of CM to improve speech recognition • Summary

  3. Introduction (1/9) • It is extremely important to be able to make an appropriate and reliable judgement based on the error-prone ASR result. • Researchers have proposed to compute a score (preferably 0~1), called confidence measure (CM), to indicate reliability of any recognition decision made by an ASR system.

  4. Introduction (2/9) Some application of CM Confidence Measure Lexicon 1 2 Feature extraction Decoding Verification feature vector recognized word sequence speech signal 臺北 到 魚籃 Acoustic model Language model 1.臺北到魚籃 2.臺北到宜蘭

  5. Introduction (3/9) • First of all, we can backtrack some early research on CM to rejection in word-spotting systems. • Other early CM-related works lie in automatic detection of new words in LVCSR. • From the past few years, the CM has been applied to more and more research areas, e.g., • To improve speech recognition • The algorithm about look-head in LVCSR • To guide the system to perform unsupervised learning • …

  6. Introduction (4/9) • The general procedure of CM for verification Predefined threshold Recognized units Confidence estimation Confidence of unit judgment > threshold < threshold rejection acceptance

  7. 魚籃 宜蘭 宜蘭 魚籃 宜蘭 宜蘭 宜蘭 宜蘭 ref hyp hyp ref hyp ref ref hyp Introduction (5/9) • Four situations when judging a hypothesis Accept Correct acceptance reject Correct rejection reject false rejection Accept false acceptance

  8. Introduction (6/9) • The evaluation metric : • Confidence error rate : FA CA FR CA FA 三民 候選人 通過 審查 了 hyp 有 三名 候選人 通過 審查 ref

  9. Introduction (7/9) • The evaluation metric : • Confidence error rate : FA CA CA CA FA 三民 候選人 通過 審查 了 hyp 有 三名 候選人 通過 審查 ref

  10. Introduction (8/9) • The evaluation metric (cont): • Receiver operator characteristics (ROC) curve :simply contains a plot of the false acceptance rate over the detection rate.

  11. Introduction (9/9) • All methods proposed for computing CMs can be roughly classified into three major categories [7]: • Feature based • Posterior probability based • Explicit model based (utterance verification, UV) • Incorporation of high-level information for CM*

  12. Feature-based confidence measure

  13. Feature-based confidence measure (1/8) • The feature can be collected during the decoding procedure and may include acoustic, language and syntactic information • Any feature can be called a predictor if its p.d.f. of correctly recognized words is clearly distinct from that of misrecognized words misrecognized word correctly recognized word

  14. Feature-based confidence measure (2/8) • Some common predictor features • Pure normalized likelihood score related : acoustic score per frame. • N-best related : count in the N-best list, N-best homogeneity score • Duration related : word duration divided by its number of phones

  15. Feature-based confidence measure (3/8) • Some common predictor features (cont) • Hypothesis density : 三名 候選人 三名 有 三名 通過 候選人 由 結果 沒有 審查 靜音 沒有 候選人 沒有 審查 建國 候選人 通過 又 候選人 三名

  16. 今天 天氣 今天 天氣 不佳 今天 天氣 很好 Hypothesized word sequence 今天 天氣 很好 今天 天氣 Hypothesized word sequence 今天 天氣 不佳 Feature-based confidence measure (4/8) • Some common predictor features (cont) • Acoustic stability 天氣 很好 今天 Hypothesized word sequence

  17. Feature-based confidence measure (6/8) • We can combine the above features with any one of the following classifiers • Line discriminant function • Generalized linear model • Neural networks • Decision tree • Support vector machine • Boosting • Naïve Bayes classifier

  18. Feature-based confidence measure (7/8) • Naïve Bayes Classifier [3]

  19. Feature-based confidence measure (8/8) • Experiments [3] • Corpus : an Italian speech corpus of phone calls to the front desk of a hotel

  20. Posterior probability based confidence measure

  21. Posterior probability based confidence measure (1/11) • Posterior probability of a word sequence : • To adopt some approximation methods Impossible to estimate in a precise manner

  22. Posterior probability based confidence measure (2/11) • Word graph based approximation 三名 候選人 有 三名 三名 候選人 由 靜音 結果 沒有 三名 又 靜音 靜音 沒有 候選人 沒有 通過 建國 有 靜音 三名 候選人 又 通過 候選人 三名

  23. Posterior probability based confidence measure (3/11) • Posterior probability of a word arc : • Some issues are addressed and the word posterior probability is generalized • Reduced search space • Relaxed time registration • Optimal acoustic and language model weights

  24. Posterior probability based confidence measure (4/11) • Posterior probability of a word arc [6] : 三名 候選人 有 三名 三名 由 候選人 靜音 結果 沒有 三名 又 靜音 沒有 候選人 靜音 沒有 通過 建國 靜音 有 三名 候選人 又 通過 候選人 三名

  25. Posterior probability based confidence measure (5/11) • Posterior probability of a word arc [6] : 三名 候選人 有 三名 三名 由 候選人 靜音 結果 沒有 三名 又 靜音 沒有 候選人 靜音 沒有 通過 建國 靜音 有 三名 候選人 又 通過 候選人 三名

  26. Posterior probability based confidence measure (6/11) • Posterior probability of a word arc [6] : 三名 候選人 三名 有 三名 由 候選人 靜音 結果 沒有 三名 又 靜音 沒有 候選人 靜音 沒有 通過 建國 靜音 有 三名 候選人 又 通過 候選人 三名

  27. Posterior probability based confidence measure (7/11) • Posterior probability of a word arc [6] : 三名 候選人 有 三名 三名 由 候選人 靜音 結果 沒有 三名 又 靜音 沒有 候選人 靜音 沒有 通過 建國 靜音 有 三名 候選人 又 通過 候選人 三名

  28. Posterior probability based confidence measure (8/11) • The drawbacks of the above methods – all need an additional pass. • In [8], the “local word confidence measure” is proposed 今天 今天 今天 今天

  29. bigram applied forward/backward bigram applied Posterior probability based confidence measure (8/11) • local word confidence measure (cont)

  30. Posterior probability based confidence measure (9/11) • Impact of word graph density on the quality of posterior probability [9] Baseline 27.3 15.4

  31. Posterior probability based confidence measure (10/11) • Experiments [6]

  32. Explicit model based confidence measure (1/10) • The CM problem is formulated as a statistical hypothesis testing problem. • Under the framework of binary hypothesis testing, there are two complementary hypotheses • We test against

  33. Explicit model based confidence measure (3/10) • The above LRT score can be transformed to a CM based on a monotonic 1-1 mapping function. • The major difficulty with LRT is how to model the alternative hypothesis. • In practice, the same HMM structure is adopted to model the alternative hypothesis. • A discriminative training procedure plays a crucial role in improving modeling performance.

  34. Explicit model based confidence measure (3/10) • Two-pass procedure : 天氣 很好 今天

  35. Explicit model based confidence measure (4/10) • One-pass procedure 天氣 很好 今天

  36. Explicit model based confidence measure (5/10) • How to calculate the confidence of a recognized word?

  37. Explicit model based confidence measure (6/10) • How to calculate the confidence of a recognized word (cont)?

  38. Explicit model based confidence measure (7/10) • Discriminative training [10] • The goal of the training procedure is to increase the average value of for correct hypotheses and decrease the average value of for false acceptance.

  39. Explicit model based confidence measure (8/10) • Discriminative training (cont)

  40. Explicit model based confidence measure (9/10) Why discriminative training works?

  41. Explicit model based confidence measure (10/10) • Experiments [10] • This task, referred to as the “movie locator”,

  42. Incorporation of high-level information for CM

  43. U A Incorporation of high-level information for CM (1/4) • LSA • The key property of LSA is that words whose vectors are close to each other are semantically similar words. • These similarities can be used to provide an estimate of the likelihood of the words co-occurring within the same utterance.

  44. Incorporation of high-level information for CM (2/4) • LSA (cont) • The entry of matrix : • The confidence of a recognized word :

  45. Incorporation of high-level information for CM (3/4) • Inter-word mutual information :

  46. Incorporation of high-level information for CM (4/4) • Experiments [14]

  47. The application of CM to improve speech recognition

  48. 三名 候選人 有 三名 三名 候選人 由 靜音 結果 沒有 三名 又 靜音 靜音 沒有 候選人 沒有 通過 建國 有 靜音 三名 候選人 又 通過 候選人 三名 The application of CM to improve speech recognition (1/10) • Statistical decision theory aims at minimizing the expected of making error

  49. The application of CM to improve speech recognition (2/10) • Method 1 [16]:

  50. The application of CM to improve speech recognition (3/10) • Method 2 [18] :

More Related