1 / 20

Segmental GPD training of HMM based speech recognizer

Segmental GPD training of HMM based speech recognizer. Author: W. Chou, B.H. Juang and C.H. Lee Present: Yi-Ning Huang. Outline . Introduction The system configuration Segmental GPD training of HMMs Parameter transformations Experimental evaluation Summary and discussion. Introduction.

roana
Download Presentation

Segmental GPD training of HMM based speech recognizer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Segmental GPD training of HMM based speech recognizer Author: W. Chou, B.H. Juang and C.H. Lee Present: Yi-Ning Huang

  2. Outline • Introduction • The system configuration • Segmental GPD training of HMMs • Parameter transformations • Experimental evaluation • Summary and discussion

  3. Introduction • GPD: “generalized probabilistic descent” • In this paper, we propose a segmental based training method, segmental GPD training, for speech recognizer using hidden markov model and Viterbi decoding.

  4. The main features of our approach can be summarized as follows • The algorithm is based on the principle of minimum recognition error rate in which segmentation and discriminative training are jointly optimized. • The algorithm can be initialized from a given HMM, regardless of whether it has been trained according to other criteria or directly generated from a training set with (non-optimal) uniform segmentation.

  5. The algorithm handles both errors and correct recognition cases in a theoretically consistent way, and is adaptively adjusted to achieve an optimal configuration with maximum possible separation between each confusing classes. • The algorithm can be used either off-line or on-line with the ability of learning new features from any new training sources. • The algorithm is consistent with HMM framework and does not required major modification of the current system. Moreover, it is theoretically justified to converge to a (at least locally) minimum point of the recognition error rate.

  6. The system configuration • The observation probability density function of observing vector x in j-th state of i-th word HMMwhere is the mixture weights

  7. The log-likelihood score of the input utterance X along its optimal path in i-th model λ • :the corresponding state sequence along the optimal path • :the corresponding observation vector at time t • T(X):the number of frames in the input utterance X • :the state transition probability from state to state

  8. Define the classification error count function for i-th classthen the goal of training is to reduce the expected error ratetraining results is often measures by the empirical error rate

  9. Segmental GPD training of HMMs • In segmental GPD training, the loss fuction is constructed through the following steps: • Define the misclassification measure for each class iη :positive number, W :total number of classes

  10. Define the smoothed loss function for each class • Define the loss function for entire training population

  11. Generalized probabilistic decent (GPD) algorithm adjusts the model parameters Λ recursively according to

  12. Parameter transformations • In segmental GPD training, the HMM parameters are adaptively adjusted according to p.11 • A diagram of this training procedure is illustrated in Figure 1.

  13. Figure 1: Diagram of segmental GPD training

  14. The following transformations are used in our approach • Logarithm of the variance is the variance of the i-th word, j-th state, k-th mixture and d-th dimension. • Transformed logarithm of the mixture wrights L is the total number of mixture weights in the j-th state in i-th word model.

  15. Transformed logarithm of the transition probsbilityM is total number of states in i-th word model.

  16. Experimental evaluation • First experiment • The English E-set (b, c, d, e, g, p, t, v, z) • 50 male and 50 female (all native American) • Through local dialed-up telephone lines • 10-state, 5-mixture: testing set – 76% , training set – 89%10 iteration : testing set 88.3% training set 99.6% • 15-state, 3-mixture:testing set – 73.3% , training set – 86.3% testing set 88.7% training set 100%

  17. Figure 2: Recognition curve of segmental GPD training (88.7% on testing data set)

  18. Second experiment • TI-database of connected digit utterances • Has a random length from 1 to 7 • Recorded from various region of U.S. • 8565 strings for training8578 strings for testing • 10-state, 64-mixture

  19. Table 1: Recognition result of TI-data base

  20. Summary and discussion • We demonstrated the effectiveness of the proposed training algorithm in isolated word and connected digit recognition applications.

More Related