1 / 21

Speaker Adaptation for Vowel Classification

Speaker Adaptation for Vowel Classification. Xiao Li Electrical Engineering Dept. Outline. Introduction Background on statistical classifiers Proposed Adaptation strategies Experiments and results Conclusion. /ae/. /aa/. /iy/. /uh/. Application. “Vocal Joystick” (VJ)

Download Presentation

Speaker Adaptation for Vowel Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speaker Adaptation for Vowel Classification Xiao Li Electrical Engineering Dept.

  2. Outline • Introduction • Background on statistical classifiers • Proposed Adaptation strategies • Experiments and results • Conclusion

  3. /ae/ /aa/ /iy/ /uh/ Application • “Vocal Joystick” (VJ) • Human-computer interaction for people with motor-impairments • Acoustic parameters – energy, pitch, vowel quality, discrete sound • Vowel classification • Vowels /ae/ (bat); /aa/ (bought); /uh/ (boot); /iy/ (beat) • Control motion direction

  4. Features • Formants • Peaks in spectrum • Low dimension (F1, F2, F3, F4 + dynamics) • Hard to estimate • Mel-frequency cesptral coefficients (MFCC) • Cosine transform of log spectrum • High dimension (26 including deltas) • Easy to compute • Our choice – MFCCs

  5. User-Independent vs. User–Dependent • User-independent models • NOT optimized for a specific speaker • Easy to get a large train set • User-dependent models • Optimized for a specific speaker • Difficult to get a large train set

  6. Adaptation • What is adaptation? • Adapting user-independent models to a specific user, using a small set of user-dependent data • Adaptation methodology for vowel classification • Train speaker-independent vowel models • Ask a speaker to articulate a few seconds of vowels for each class • Adapt the classifier on this small amount of speaker-dependent data

  7. Outline • Introduction • Background on statistical classifiers • Proposed Adaptation strategies • Experiments and results • Conclusion

  8. Gaussian mixture models (GMM) • Generative models • Training objective – maximum likelihood (EM) • For training samples O1:T • Classification • Compute the likelihood scores for each class, and choose the one with the highest likelihood • Limitation • A class model is trained using only the data in this class • Constraints on the discriminant functions

  9. Neural Networks (NN) • Three layer perceptrons • # input nodes – feature dimension x window size • # hidden nodes – empirically chosen • # output nodes – # of classes • Training objective • Minimum relative entropy • Classification • Compare the output values • Advantages • Discriminative training • Nonlinearity • Features taken from multiple frames Target yk

  10. NN-SVM Hybrid Classifier • Idea – replace the hidden-to-output layer of the NN by linear-kernel SVMs • Training objective • Maximum margin • theoretically guaranteed on test error bound • Classification • Compare the output values of binary classifiers • Advantages • Compared to pure NN: optimal solution in the last layer • Compared to pure SVM: efficiently handling features from multiple frames; no need to choose kernel

  11. Outline • Introduction • Background on statistical classifiers • Proposed Adaptation strategies • Experiments and results • Conclusion

  12. MLLR for GMM Adaptation • Maximum Likelihood Linear Regression • Apply a linear transformation on the Gaussian mean • Same transformation for the mixture of Gaussians in the same class • The covariance matrix can be adapted in a similar fashion, but less effective

  13. MLLR Formulas • Objective – maximum likelihood • For adaptation samples O1:T • First-order derivative vanishes • The transform W is obtained by solving a linear equation

  14. NN Adaptation • Idea – fix the nonlinear mapping and adapt the last layer (linear classifier) • Adaptation objective – minimum relative entropy • Start from the original weights • Gradient descent formulas

  15. NN-SVM Classifier Adaptation • Idea – *again* fix the nonlinear mapping and adapt the last layer • Adaptation objective – maximum margin • Adaptation procedure • Keep the support vectors of the training data • Combine these support vectors with the adaptation data • Retrain the linear-kernel SVMs for the last layer

  16. Outline • Introduction • Background on statistical classifiers • Proposed Adaptation strategies • Experiments and results • Conclusion

  17. Database • Pure vowel recordings with different energy and pitch • Duration – long short • Energy – loud, normal, quiet • Pitch – rising, level, falling • Statistics • Train set -- 10 speakers • Test set – 5 speakers • 4 or 8 or 9 vowel classes • 18 utterances (2000 samples) for each vowel and each speaker

  18. Adaptation and Evaluation Set • 6-fold cross-validation for each speaker • 18 utterances are divided into 6 subsets • We adapt on each subset and evaluate on the rest • We get 6 accuracy scores for each vowel, and compute the mean and deviation • Average over 5 speakers

  19. Speaker-Independent Classifiers • The individual scores for different speakers vary a lot • If NN window = 1, the performance is similar to GMM

  20. Adapted Classifiers

  21. Conclusion • For speaker-independent models, the NN classifier (with multiple frame input) works well • For speaker-adapted models, the NN classifier is effective, and NN-SVM so far gets the best performance

More Related