Confidence Measures for Automatic Speech Recognition

Confidence Measures for Automatic Speech Recognition National Taiwan Normal University Spoken Language Processing Lab Advisor : Hsin-Min Wang Berlin Chen Presented by Tzan-Hwei Chen

Outline • Introduction • The category of estimation methods of confidence measure (CM) • Featured based • Posterior probability based • Explicit model based • Incorporation of high-level information for CM* • The application of CM to improve speech recognition • Summary

Introduction (1/9) • It is extremely important to be able to make an appropriate and reliable judgement based on the error-prone ASR result. • Researchers have proposed to compute a score (preferably 0~1), called confidence measure (CM), to indicate reliability of any recognition decision made by an ASR system.

Introduction (2/9) Some application of CM Confidence Measure Lexicon 1 2 Feature extraction Decoding Verification feature vector recognized word sequence speech signal 臺北到魚籃 Acoustic model Language model 1.臺北到魚籃 2.臺北到宜蘭

Introduction (3/9) • First of all, we can backtrack some early research on CM to rejection in word-spotting systems. • Other early CM-related works lie in automatic detection of new words in LVCSR. • From the past few years, the CM has been applied to more and more research areas, e.g., • To improve speech recognition • The algorithm about look-head in LVCSR • To guide the system to perform unsupervised learning • …

Introduction (4/9) • The general procedure of CM for verification Predefined threshold Recognized units Confidence estimation Confidence of unit judgment > threshold < threshold rejection acceptance

魚籃宜蘭宜蘭魚籃宜蘭宜蘭宜蘭宜蘭 ref hyp hyp ref hyp ref ref hyp Introduction (5/9) • Four situations when judging a hypothesis Accept Correct acceptance reject Correct rejection reject false rejection Accept false acceptance

Introduction (6/9) • The evaluation metric : • Confidence error rate : FA CA FR CA FA 三民候選人通過審查了 hyp 有三名候選人通過審查 ref

Introduction (7/9) • The evaluation metric : • Confidence error rate : FA CA CA CA FA 三民候選人通過審查了 hyp 有三名候選人通過審查 ref

Introduction (8/9) • The evaluation metric (cont): • Receiver operator characteristics (ROC) curve :simply contains a plot of the false acceptance rate over the detection rate.

Introduction (9/9) • All methods proposed for computing CMs can be roughly classified into three major categories [7]: • Feature based • Posterior probability based • Explicit model based (utterance verification, UV) • Incorporation of high-level information for CM*

Feature-based confidence measure

Feature-based confidence measure (1/8) • The feature can be collected during the decoding procedure and may include acoustic, language and syntactic information • Any feature can be called a predictor if its p.d.f. of correctly recognized words is clearly distinct from that of misrecognized words misrecognized word correctly recognized word

Feature-based confidence measure (2/8) • Some common predictor features • Pure normalized likelihood score related : acoustic score per frame. • N-best related : count in the N-best list, N-best homogeneity score • Duration related : word duration divided by its number of phones

Feature-based confidence measure (3/8) • Some common predictor features (cont) • Hypothesis density : 三名候選人三名有三名通過候選人由結果沒有審查靜音沒有候選人沒有審查建國候選人通過又候選人三名

今天天氣今天天氣不佳今天天氣很好 Hypothesized word sequence 今天天氣很好今天天氣 Hypothesized word sequence 今天天氣不佳 Feature-based confidence measure (4/8) • Some common predictor features (cont) • Acoustic stability 天氣很好今天 Hypothesized word sequence

Feature-based confidence measure (6/8) • We can combine the above features with any one of the following classifiers • Line discriminant function • Generalized linear model • Neural networks • Decision tree • Support vector machine • Boosting • Naïve Bayes classifier

Feature-based confidence measure (7/8) • Naïve Bayes Classifier [3]

Feature-based confidence measure (8/8) • Experiments [3] • Corpus : an Italian speech corpus of phone calls to the front desk of a hotel

Posterior probability based confidence measure

Posterior probability based confidence measure (1/11) • Posterior probability of a word sequence : • To adopt some approximation methods Impossible to estimate in a precise manner

Posterior probability based confidence measure (2/11) • Word graph based approximation 三名候選人有三名三名候選人由靜音結果沒有三名又靜音靜音沒有候選人沒有通過建國有靜音三名候選人又通過候選人三名

Posterior probability based confidence measure (3/11) • Posterior probability of a word arc : • Some issues are addressed and the word posterior probability is generalized • Reduced search space • Relaxed time registration • Optimal acoustic and language model weights

Posterior probability based confidence measure (4/11) • Posterior probability of a word arc [6] : 三名候選人有三名三名由候選人靜音結果沒有三名又靜音沒有候選人靜音沒有通過建國靜音有三名候選人又通過候選人三名

Posterior probability based confidence measure (6/11) • Posterior probability of a word arc [6] : 三名候選人三名有三名由候選人靜音結果沒有三名又靜音沒有候選人靜音沒有通過建國靜音有三名候選人又通過候選人三名

Posterior probability based confidence measure (8/11) • The drawbacks of the above methods – all need an additional pass. • In [8], the “local word confidence measure” is proposed 今天今天今天今天

bigram applied forward/backward bigram applied Posterior probability based confidence measure (8/11) • local word confidence measure (cont)

Posterior probability based confidence measure (9/11) • Impact of word graph density on the quality of posterior probability [9] Baseline 27.3 15.4

Posterior probability based confidence measure (10/11) • Experiments [6]

Explicit model based confidence measure (1/10) • The CM problem is formulated as a statistical hypothesis testing problem. • Under the framework of binary hypothesis testing, there are two complementary hypotheses • We test against

Explicit model based confidence measure (3/10) • The above LRT score can be transformed to a CM based on a monotonic 1-1 mapping function. • The major difficulty with LRT is how to model the alternative hypothesis. • In practice, the same HMM structure is adopted to model the alternative hypothesis. • A discriminative training procedure plays a crucial role in improving modeling performance.

Explicit model based confidence measure (3/10) • Two-pass procedure : 天氣很好今天

Explicit model based confidence measure (4/10) • One-pass procedure 天氣很好今天

Explicit model based confidence measure (5/10) • How to calculate the confidence of a recognized word?

Explicit model based confidence measure (6/10) • How to calculate the confidence of a recognized word (cont)?

Explicit model based confidence measure (7/10) • Discriminative training [10] • The goal of the training procedure is to increase the average value of for correct hypotheses and decrease the average value of for false acceptance.

Explicit model based confidence measure (8/10) • Discriminative training (cont)

Explicit model based confidence measure (9/10) Why discriminative training works?

Explicit model based confidence measure (10/10) • Experiments [10] • This task, referred to as the “movie locator”,

Incorporation of high-level information for CM

U A Incorporation of high-level information for CM (1/4) • LSA • The key property of LSA is that words whose vectors are close to each other are semantically similar words. • These similarities can be used to provide an estimate of the likelihood of the words co-occurring within the same utterance.

Incorporation of high-level information for CM (2/4) • LSA (cont) • The entry of matrix : • The confidence of a recognized word :

Incorporation of high-level information for CM (3/4) • Inter-word mutual information :

Incorporation of high-level information for CM (4/4) • Experiments [14]

The application of CM to improve speech recognition

三名候選人有三名三名候選人由靜音結果沒有三名又靜音靜音沒有候選人沒有通過建國有靜音三名候選人又通過候選人三名 The application of CM to improve speech recognition (1/10) • Statistical decision theory aims at minimizing the expected of making error

The application of CM to improve speech recognition (2/10) • Method 1 [16]:

The application of CM to improve speech recognition (3/10) • Method 2 [18] :

Confidence Measures for Automatic Speech Recognition

Confidence Measures for Automatic Speech Recognition

Presentation Transcript

Automatic Speech Recognition

Automatic Speech Recognition: An Overview

Automatic Speech Recognition

Automatic Speech Recognition

Automatic Speech Recognition (ASR)

Automatic speech recognition

Automatic Speech Recognition II

Automatic Speech Recognition System

Automatic Speech Recognition

Automatic Continuous Speech Recognition

Confidence Measures for Speech Recognition

Automatic Speech Recognition Studies

Decoding Techniques for Automatic Speech Recognition

Confidence Measures in Speech Recognition

Automatic Speech Recognition Introduction

Automatic Speech Recognition

Automatic Speech Recognition - Edukite

Automatic Speech Recognition Introduction

Introduction to Automatic Speech Recognition

Automatic Speech Recognition Introduction

Automatic Speech Recognition