Adaptation techniques in automatic speech recognition
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

Adaptation Techniques in Automatic Speech Recognition PowerPoint PPT Presentation


  • 59 Views
  • Uploaded on
  • Presentation posted in: General

Adaptation Techniques in Automatic Speech Recognition. Tor Andr é Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications, 2003. Goal and Objective. Make ASR robust to speaker and environmental variability.

Download Presentation

Adaptation Techniques in Automatic Speech Recognition

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Adaptation techniques in automatic speech recognition

Adaptation Techniques in Automatic Speech Recognition

Tor André Myrvoll

Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications, 2003.


Goal and objective

Goal and Objective

  • Make ASR robust to speaker and environmental variability.

  • Model adaptation: Automatically adapt a HMM using limited but representative new data to improve performance.

  • Train ASRs for applications w/ insufficient data.


What do we have adapt

What Do We Have/Adapt?

  • A HMM based ASR trained in the usual manner.

  • The output probability is parameterized by GMMs.

  • No improvement when adapting state transition probabilities and mixture weights.

  • Difficult to estimate  robustly.

  • Mixture means can be adapted “optimally” and proven useful.


Adaptation principles

Adaptation Principles

  • Main Assumption: Original model is “good enough”, model adaptation can’t be re-training!


Offline vs online

Offline Vs. Online

  • If possible offline (performance uncompromised by computational reasons).

  • Decode the adaptation speech data based on current model.

  • Use this to estimate the “speaker-dependent” model’s statistics.


Online adaptation using prior evolution

Online Adaptation Using Prior Evolution.

  • Present posterior is the next prior.


Map adaptation

MAP Adaptation

  • HMMs have no sufficient statistics => can’t use conjugate prior-posterior pairs. Find posterior via EM.

  • Find prior empirically (multi-modal, first model estimated using ML training).


Adaptation techniques in automatic speech recognition

EMAP

  • All phonemes in every context don’t occur in adaptation data; Need to store correlations between variables.

  • EMAP only considers correlation between mean vectors under jointly Gaussian assumption.

  • For large model sizes, share means across models.


Transformation based model adaptation

Transformation Based Model Adaptation

  • ML

  • MAP

  • Estimate a transform T parameterized by .


Bias affine and nonlinear transformations

Bias, Affine and Nonlinear Transformations

  • ML estimation of bias.

  • Affine transformation.

  • Nonlinear transformation ( may be a neural network).


Adaptation techniques in automatic speech recognition

MLLR

  • Apply separate transformations to different parts of the model (HEAdapt in HTK).


Adaptation techniques in automatic speech recognition

SMAP

  • Model the mismatch between the SI model (x) and the test environment.

  • No mismatch

  • Mismatch

  •  and  estimated by usual ML methods on adaptation data.


Adaptive training

Adaptive Training

  • Gender dependent model selection

  • VTLN (in HTK using WARPFREQ)


Speaker adaptive training

Speaker Adaptive Training

  • Assumption: There exists a compact model (c),which relates to all speaker-dependent model via an affine transformation T (~MLLR). The model and the transformation are found using EM.


Cluster adaptive training

Cluster Adaptive Training

  • Group speakers in training set into clusters. Now find the cluster closest to the test speaker.

  • Use Canonical Models


Eigenvoices

Eigenvoices

  • Similar to Cluster Adaptive Training.

  • Concatenate means from ‘R’ speaker dependent model. Perform PCA on the resulting vector. Store K << R eigenvoice vectors.

  • Form a vector of means from the SI model too.

  • Given a new speaker, the mean is a linear combination of SI vector and eigenvoice vector.


Summary

Summary

  • 2 major approaches: MAP (&EMAP) and MLLR.

  • MAP needs more data (use of a simple prior) than MLLR. MAP --> SD model.

  • Adaptive training is gaining popularity.

  • For mobile applications, complexity and memory are major concerns.


  • Login