lecture 24 model adaptation and semi supervised learning n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Lecture 24: Model Adaptation and Semi-Supervised Learning PowerPoint Presentation
Download Presentation
Lecture 24: Model Adaptation and Semi-Supervised Learning

Loading in 2 Seconds...

play fullscreen
1 / 38

Lecture 24: Model Adaptation and Semi-Supervised Learning - PowerPoint PPT Presentation


  • 149 Views
  • Updated on

Lecture 24: Model Adaptation and Semi-Supervised Learning. Machine Learning. Iain Murray’s MLSS lecture on videolectures.net: http://videolectures.net/mlss09uk_murray_mcmc/ . Today. Adaptation of Gaussian Mixture Models Maximum A Posteriori (MAP) Maximum Likelihood Linear Regression (MLLR)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Lecture 24: Model Adaptation and Semi-Supervised Learning


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Lecture 24: Model Adaptation and Semi-Supervised Learning Machine Learning Iain Murray’s MLSS lecture on videolectures.net: http://videolectures.net/mlss09uk_murray_mcmc/

    2. Today • Adaptation of Gaussian Mixture Models • Maximum A Posteriori (MAP) • Maximum Likelihood Linear Regression (MLLR) • Application: Speaker Recognition • UBM-MAP + SVM • Other Semi-Supervised Approaches • Self-Training • Co-Training

    3. The Problem • I have a little bit of labeled data, and a lot of unlabeled data. • I can model the trainingdata fairly well. • But we always fit training data betterthan testing data. • Can we use the wealthof unlabeled data to do better?

    4. Let’s use a GMM • GMMs to model labeled data. • In simplest form, one mixture component per class.

    5. Labeled training of GMM • MLE estimators of parameters • Or these can be used to seed EM.

    6. Adapting the mixtures to new data • Essentially, let EM start with MLE parameters as seeds. • Expand the available data for EM, proceed until convergence

    7. Adapting the mixtures to new data • Essentially, let EM start with MLE parameters as seeds. • Expand the available data for EM, proceed until convergence

    8. Problem with EM adaptation • The initial labeled seeds could contribute very little to the final model

    9. One Problem with EM adaptation • The initial labeled seeds could contribute very little to the final model

    10. MAP Adaptation • Constrain the contribution of unlabeled data. • Let the alpha terms dictate how much weight to give to the new, unlabeled data compared to the exiting estimates.

    11. MAP adaptation • The movement of the parameters is constrained.

    12. MLLR adaptation • Another idea… • “Maximum Likelihood Linear Regression”. • Apply an affine transformation to the means. • Don’t change the covariance matrices

    13. MLLR adaptation • Another view on adaptation. • Apply an affine transformation to the means. • Don’t change the covariance matrices

    14. MLLR adaptation • The new means are the MLE of the means with the new data.

    15. MLLR adaptation • The new means are the MLE of the means with the new data.

    16. Why MLLR? • We can tie the transformation matrices of mixture components. • For example: • You know that the red and green classes are similar • Assumption: Their transformations should be similar

    17. Why MLLR? • We can tie the transformation matrices of mixture components. • For example: • You know that the red and green classes are similar • Assumption: Their transformations should be similar

    18. Application of Model Adaptation • Speaker Recognition. • Task: Given speech from a known set of speakers, identify the speaker. • Assume there is training data from each speaker. • Approach: • Model a generic speaker. • Identify a speaker by its difference from the generic speaker • Measure this difference by adaptation parameters

    19. Speech Representation • Extract a feature representation of speech. • Samples every 10ms. MFCC – 16 dims

    20. Similarity of sounds MFCC2 /s/ /b/ /u/ /o/ MFCC1

    21. Universal Background Model • If we had labeled phone information that would be great. • But it’s expensive, and time consuming. • So just fit a GMM to the MFCC representation of all of the speech you have. • Generally all but one example, but we’ll come back to this.

    22. MFCC Scatter MFCC2 /s/ /b/ /u/ /o/ MFCC1

    23. UBM fitting MFCC2 /s/ /b/ /u/ /o/ MFCC1

    24. MAP adaptation • When we have a segment of speech to evaluate, • Generate MFCC features. • Use MAP adaptation on the UBM Gaussian Mixture Model.

    25. MAP Adaptation MFCC2 /s/ /b/ /u/ /o/ MFCC1

    26. MAP Adaptation MFCC2 /s/ /b/ /u/ /o/ MFCC1

    27. UBM-MAP • Claim: • The differences between speakers can be represented by the movement of the mixture components of the UBM. • How do we train this model?

    28. UBM-MAP training • Supervector • A vector of adapted means of the gaussian mixture components Training Data UBM Training Supervector MAP Held out Speaker N Train a supervised model with these labeled vectors.

    29. UBM-MAP training Training Data UBM Training Supervector Multiclass SVM Training MAP Held out Speaker N Repeat for all training data

    30. UBM-MAP Evaluation UBM Multiclass SVM Supervector MAP Prediction Test Data

    31. Alternate View • Do we need all this? • What if we just train an SVM on labeled MFCC data? Labeled Training Data Multiclass SVM Multiclass SVM Training Test Data Prediction

    32. Results • UBM-MAP (with some variants) is the state-of-the-art in Speaker Recognition. • Current state of the art performance is about 97% accuracy (~2.5% EER) with a few minutes of speech. • Direct MFCC modeling performs about half as well ~5% EER.

    33. Model Adaptation • Adaptation allows GMMs to be seeded with labeled data. • Incorporation of unlabeled data gives a more robust model. • Adaptation process can be used to differentiate members of the population • UBM-MAP

    34. Self-Training • Train a supervised model based on training data, T. • Generate predictions [t, x] for the unlabeled data, U. • Add these to the training data. • Retrain the supervised model • Alternates • only use the most confident predictions on U • Weight the new predictions by the confidence

    35. Self-Training • Advantages • Simple to use • No classifier dependence • Disadvantages • Bad decisions get reinforced • Uncertain convergence properties

    36. Co-Training • Train two supervised classifiers C1 and C2 using different (uncorrelated) feature representations of T • Generate predictions for U using C1 and C2. • Add the most confident predictions made using C1 to the C2 training set, and vice versa. • Repeat

    37. Co-Training • Pros • Simple wrapper method • Less sensitive to mistakes than co-training • Disadvantages • Natural feature splits might not exist • Models using both features are probably better

    38. Next Time • Ensemble Techniques