1 / 27

Discriminative Training in Speech Processing

Discriminative Training in Speech Processing. Filipp Korkmazsky LORIA. Content. Bayes Decision Theory and DiscriminativeTraining Minimum Classification Error(MCE) Training Generalized Probabilistic Descent(GPD) algorithm MCE Training versus Maximum Mutual

tola
Download Presentation

Discriminative Training in Speech Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discriminative Training in Speech Processing Filipp Korkmazsky LORIA

  2. Content • Bayes Decision Theory and DiscriminativeTraining • Minimum Classification Error(MCE) Training • Generalized Probabilistic Descent(GPD) algorithm • MCE Training versus Maximum Mutual Information(MMI) Training • Discriminative Training for Speech Recognition • Discriminative Training for Speaker Verification

  3. Discriminative Training for Feature Extraction • Discriminative Training of Language Models • Discriminative Training for Speech/Music Classification • Conclusions

  4. Bayes Decision Theory and Discriminative Training • Main assumption of Bayes decision theory: a joint probability functions are known, where X is an observation and are class labels. Decision cost function: (1)

  5. (2) (3) (4) (5) MAP decision: (6)

  6. Why MAP decision is not optimal for real speech data? • Probability distribution of speech data is usually uknown and a postulated HMM approximation for this distribution doesn’t provide a MAP optimal solution. • Even if HMM was correct distribution for speech, the lack of training data often doesn’t allow to accurately model probability distribution of competing speech classes near their boundaries.

  7. Class I real distribution Class II real distribution Class I postulated distribution Class II postulated distribution

  8. Discriminative Functions Discriminative functions: - classification error

  9. Minimum Classification Error(MCE) Training - a classification error for X

  10. 1 0.5 0

  11. Generalized Probabilistic Descent(GPD) Algorithm positive definite matrix a set of HMMs at the step t of GPD algorithm a speech sample(sentence, word, phone,frame) at the step t of GPD algorithm Example: Gaussian mean correction by GPD algorithm a mean for the HMM i, state j, Gaussian mixture k, dimension at the step t of GPD algorithm

  12. MCE Training versus Maximum Mutual Information Training

  13. Maximization of mutual information corresponds to minimization of special type of classification error. Unlike general procedure of MCE maximization of mutual information doesn’t provide higher correction values to the parameters at the class boundaries. Minimization of classification error provides a better class separation at the class boundaries due to a form of the sigmoid function

  14. Discriminative Training for Speech Recognition • Discriminative training is based on comparison the likelihood • scores estimated for single speech units(phones, words). • Examples: • E-set vocabulary recognition(W.Chou, 1992) • Speaker independent recognition(100 speakers) • ML training – 76% phone recognition accuracy • MCE/GPD training – 88% phone recognition accuracy. • Broadcast news phone string recognition(Korkmazsky, 2003) • ML training – 61.93% phone recognition accuracy • MCE/GPD training – 65.11% phone recognition accuracy

  15. 2. Discriminative training is based on comparison the likelihood scores estimated for the strings of speech units(sentences) a true word string, one of the N alternative word strings • Examples: • Connected digit strings of uknown length recognition(Wu Chou,1993) • ML training - 1.4% string error rate • MCE/GPD training – 0.95% string error rate • Wireless noisy data digit strings recognition(Korkmazsky, 1997) • ML training – 2.6% word error rate • MCE/GPD training –1.4% word error rate • Generalized HMM MCE/GPD training – 1.0% word error rate

  16. Discriminative Training for Speaker Verification a true talker and impostor HMMs then X represents a true talker then X represents an impostor a verification threshold

  17. E[A]-an expectation for A Example: a speaker verification for database consisiting of 43 speakers, each having 5 training sentences(Korkmazsky,1996) ML training – 4.40% equal error rate MCE/GPD training – 2.50% equal error rate

  18. Discriminative Training for Feature Extraction Feature Extractor Acoustic Model Discriminative Training Language Model

  19. Examples: • Discriminative filter bank design(Biem, Katagiri, 1996): • Central filter bank frequencies were adjusted by MCE/GPD training. • First, 128 FFT spectral coefficients were converted to 16 Mel • spectrum coefficients by using some convential frequency scale. • The models for 5 japanese vowels were represented by the frequency • templates. Recognition accuracy in this experiment was 80.91%. • After MCE/GPD adjustment of the central band frequencies accuracy • increased to 82.45%. • Discriminative training of the lifter coefficients(Biem, Juang,1997): • Lifter coefficients weight quefrency values after cosine transform. • Lifter weights were trained by adjusting neural network coefficients • using MCE criterion. Error rate for 5 japanese vowels was reduced • from 14.5% to 11.3%.

  20. Discriminative Training of Language Models (Zhen Chen, Kai-Fu Lee(1999), Jeff Kuo, Hui Jiang(2002)) correct word sequence

  21. Discriminative correction of the bigram probabilities for all word pairs : appears a number of times a word pair in the word sequence

  22. DARPA Communicator Project(air travel reservation system) Baseline language model: 900 unigrams and 41K bigrams Baseline LM perplexity =34, after DT perplexity = 35 Baseline LM After DT Word error rate 19.7% 17.5% Training sentences 19.7% 19.0% Test sentences Sentence error rate 30.9% 26.4% Training sentences Test sentences 30.9% 29.0%

  23. Discriminative Training for Speech/Music Classification (Korkmazsky, 2003) Speech class: speech, speech&music in the background speech&song in the background Nonspeech class: music, song, noise(aspiration, cough, laugh)

  24. block classification error frame classification error a total number frames in the block a set of 6 GMMs Frame labeling accuracy for ML trained GMMs – 90.5% Frame labeling accuracy for MCE trained GMMs – 92.7%

  25. Conclusions • Maximum likelihood training often does not provide optimal speech classification because real distribution of speech data is unknown. • Discriminative training usually improves speech classification over ML training. • Discriminative training may provide comparable to ML training recognition performance by using a a smaller number of model parameters. • Many new methods of classification(like SVM or boosting) are discriminative ones.

More Related