1 / 25

S tochastic V ector M apping

This study explores the use of Stochastic Vector Mapping (SVM) for compensating for irrelevant variability in speech recognition. The effectiveness of the SVM framework relies on correct assumptions and the ability to estimate correction vectors using stereo data or ML-based joint design. Various SVM functions and CDHMMs are considered, along with the estimation procedures.

eoneil
Download Presentation

S tochastic V ector M apping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stochastic Vector Mapping ShihHsiang 2006

  2. Reference • [ICSLP 2004]A Study of Minimum Classification Error Training for Segmental Switching Linear Gaussian Hidden Markov Models • [ICASSP 2005] An Environment Compensated Maximum Likelihood Training Approach Based on Stochastic Vector Mapping • [ICSLP 2006] A Maximum Likelihood Training Approach to Irrelevant Variability Compensation Based on Piecewise Linear Transformations • [SAP 2006] An Environment-Compensated Minimum Classification Error Training Approach Based on Stochastic Vector Mapping • [CSL 1998] Maximum Likelihood Linear Transformations for HMM-based Speech Recognition

  3. Introduction • Stochastic Vector Mapping (S.V.M.) • frame-dependent bias removal to compensate for “environmental” variabilities in both training and recognition stages. • “corrupted” speech features are mapped into “clean” speech features by a simple transformation (notation of SPLICE) • The success of such a framework relies on at least the correctness of the following assumptions • The mismatch between clean and noisy data in feature domain can be compensated by the assumed stochastic vector mapping • The residue distortion after feature compensation can be modeled and absorbed by the collectively trained HMMs

  4. Estimating Correction Vectors Using Stereo Data: SPLICE Given a set of training data where Assume that the D-dimensional feature vector y under an environment class e follow the distribution of a mixture of Gaussian probability density function (PDF) The mapping function can be defined as follows where and a sequence of feature vector of noisy speech correction vectors

  5. Estimating Correction Vectors Using Stereo Data: SPLICE (cont.) • If stereo data for both clean and noisy are available and used to estimate , this becomes SPLICE approach For example, suppose that denotes the set of corresponding pairs of clean and noisy speech feature vectors recorded under a particular environment class e The correction vectors can be trained based on an ML criterion as follows However, in many ASR application , stereo data are to expensive to collect Iedenote the set of training utterances belong to the environment class e

  6. JOINT DESIGN OF STOCHASTIC VECTOR MAPPING FUNCTION AND CDHMMsWITHOUT USING STEREO DATA • ML-Based Joint Design Approach • ML training approach is to maximize, by adjusting SVM function parameters and CDHMM parameters , the following likelihood function • The SVM function can be one of the following forms where (1) (2) where (3) Constrained MLLR (CMLLR) (4) simplify (5)

  7. Constrained MLLR (CMLLR) • Unconstrained Model-Space Transformation • Constrained Model-Space Transformation (MLLT?) Choleski factor the transformation applied to the variance must correspond to the transform applied to the means

  8. JOINT DESIGN OF STOCHASTIC VECTOR MAPPING FUNCTION AND CDHMMsWITHOUT USING STEREO DATA (cont.)

  9. JOINT DESIGN OF STOCHASTIC VECTOR MAPPING FUNCTION AND CDHMMsWITHOUT USING STEREO DATA (cont.) • The detailed procedure to achieve the above is described as follows Step 1 : Initialization First, a set of CDHMMs with diagonal covariance matrix are trained from multi-condition training data as initial values Initial values of the bias vectors are set to be zero Step 2 : Estimating SVM Function Parameter Second, given the HMM parameters , for each environment class e, Nbtimes EM iteration are performed to estimate the environment dependent mapping function parameters to increase the likelihood function

  10. JOINT DESIGN OF STOCHASTIC VECTOR MAPPING FUNCTION AND CDHMMsWITHOUT USING STEREO DATA (cont.) Let’s consider a particular environment class e. If the SVM function in Eq. (1) is used for feature compensation, the auxiliary Q-function for becomes the occupation probability of Gaussian component m in state s at time t by setting the derivate of Qe w.r.t as zero Solve SVM function 1

  11. JOINT DESIGN OF STOCHASTIC VECTOR MAPPING FUNCTION AND CDHMMsWITHOUT USING STEREO DATA (cont.) Since above equation holds for all k, it equivalent to solve the root of vector in the following equation where is a K x K matrix with the (k,k’)-th element being and is a K –dimensional vector The estimation need an inverse operation of the K x K matrix Solve SVM function 1

  12. JOINT DESIGN OF STOCHASTIC VECTOR MAPPING FUNCTION AND CDHMMsWITHOUT USING STEREO DATA (cont.) If the SVM function in Eq. (2) is used for feature compensation, the EM updating formula for can be derived similarly with a much simpler result as follows Solve SVM function 2

  13. JOINT DESIGN OF STOCHASTIC VECTOR MAPPING FUNCTION AND CDHMMsWITHOUT USING STEREO DATA (cont.) If the SVM function in Eq. (5) is used for feature compensation, the auxiliary Q-function for becomes For implicitly, two-stage iterative procedure is used to increase the Q-function. Firstly, update while keeping fixed; secondly, update by using the feature vectors transformed by only Step 2.1 : Estimating By differentiating the Q-function w.r.t. the r-th row of as zero, the following updating formula can be derived where is the cofactor row vector where Solve SVM function 5

  14. JOINT DESIGN OF STOCHASTIC VECTOR MAPPING FUNCTION AND CDHMMsWITHOUT USING STEREO DATA (cont.) The value of is select that maximizes Solve SVM function 5

  15. JOINT DESIGN OF STOCHASTIC VECTOR MAPPING FUNCTION AND CDHMMsWITHOUT USING STEREO DATA (cont.) Step 2.2 : Estimating After estimating , we transform each feature vector to by using the updated Then, can be estimated by using the compensated feature vectors as follow Solve SVM function 5

  16. JOINT DESIGN OF STOCHASTIC VECTOR MAPPING FUNCTION AND CDHMMsWITHOUT USING STEREO DATA (cont.) Step 3 : Estimating CDHMM Parameters Third, we transform each training utterance using the relevant mapping function with parameters . Using the environment compensated utterances, Nh EM iterations are performed to re estimate CDHMM parameters , with an increase of the likelihood function Step 4 : Repeat Step 2 and Step 3 Ne times

  17. How to train an environmental specified GMM

  18. Experiment Setup • Use Aurora 3 database to verify the algorithms • Contains utterances of connected digits in four European languages, namely Finnish, Spanish, German and Danish • All utterances were recorded by using both close-talking (CT) and hands-free (HF) microphones in cars under several driving conditions • In SVM experiment, all training data are clustered into 8 different environment classes, of which each is modeled by a GMM consisting of 32 Gaussian components

  19. A Comparison of Recognition Performance • No big difference is made by different forms of SVM1 and SVM2 in terms of the final recognition performance for ML-training • Updating SVM function parameters in Step 2 of ML training procedure is harmful Multiple iterations for updating HMM parameters in Step 3 is helpful Word error rate (in %) of ML/MCE-trained models based on stochastic vector mapping with different mapping functions (SVM1 vs SVM2) and iteration numbers under WM condition of Finnish language.

  20. A Comparison of Recognition Performance (cont.) • Although they have demonstrated the usefulness of the SVM based approaches for several robust ASR applications where diversified yet representative training data are available • The performance improvement of SVM-based approaches is less significant in the case of that there is a severe mismatch between training and testing conditions • In order to improve the performance further, one possibility is to perform unsupervised online adaptation of SVM function parameters.

  21. MCE-Based Joint Design Approach • Objective Function • Loss Function • Misclassification measure anti-discriminant function log-likelihood of current enhance feature vector sequence against competitive word strings discriminant function for recognition decision making log-likelihood of current enhance feature vector sequence under the current HMM parameters against word string Zc

  22. MCE-Based Joint Design Approach (cont.) • Let denote generically the parameters to be estimated • In order to find the gradient , the following partial derivation is used • Updating HMM parameters is the same as MCE training

  23. MCE-Based Joint Design Approach (cont.) • Updating For each , it follows Therefore

  24. EXPERIMENTS AND RESULTS with stereo data without stereo data 8 GMMs, each having 32 Gaussian Components eight best competitive word strings 17 GMMs, each having 256 Gaussian Components ML-trained parameters are treated as the initial values of the first iteration. SPLICE works much better in the condition where stereo training data exists than in unseen condition By simply training CDHMMs by MCE, the word error rate can be further reduced

  25. Our experiment

More Related