html5-img
1 / 21

HIWIRE MEETING Nancy, July 6-7, 2006

HIWIRE MEETING Nancy, July 6-7, 2006. José C. Segura, Ángel de la Torre. Schedule. Non-linear feature normalization for mobile platform Integration scheme Results and discussion Rapid speaker adaptation Combination of adaptation at signal level and acoustic model level

nusa
Download Presentation

HIWIRE MEETING Nancy, July 6-7, 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HIWIRE MEETINGNancy, July 6-7, 2006 José C. Segura, Ángel de la Torre

  2. Schedule • Non-linear feature normalization for mobile platform • Integration scheme • Results and discussion • Rapid speaker adaptation • Combination of adaptation at signal level and acoustic model level • Results and discussion • Assessment of two non-linear techniques for feature normalization • Non-linear parametric equalization • Model based feature compensation (VTS) • New improvements in robust VAD • Model based VAD

  3. HIWIRE MEETINGNancy, July 6-7, 2006 José C. Segura, Ángel de la Torre

  4. Schedule • Non-linear feature normalization for mobile platform • Integration scheme • Results and discussion • Rapid speaker adaptation • Combination of adaptation at signal level and acoustic model level • Results and discussion • Assessment of two non-linear techniques for feature normalization • Non-linear parametric equalization • Model based feature compensation (VTS) • New improvements in robust VAD • Model based VAD

  5. Non-linear Parametric Equalization • Feature normalization • Motivation of PEQ: • Limitation of linear methods: • Cepstral Mean Normalization • Cepstral Mean and Variance Normalization • Limitation of non-linear methods (HEQ, OSEQ): • Speech/non-speech ratio • Estimation problems • Parametric Equalization PEQ: • Two Gaussian Model (speech / non-speech) • Training of clean Gaussians; estimation of noisy Gaussians • Non-linear transformation: combination of two linear transformations (one for speech, one for non-speech)

  6. Non-linear Parametric Equalization • Aurora-2 results: • Aurora-4 results:

  7. Non-linear Parametric Equalization • Additional problem of non-linear transformations: • Once the transformation is estimated, it is an “instantaneous transformation” • Temporal correlations are not exploited • Temporal Smoothing (TES): • Each equalized cepstrum is time-filtered with an ARMA filter that restores autocorrelation of clean data

  8. Non-linear Parametric Equalization TES • Aurora-2 results: • Aurora-4 results: TES

  9. Model Based Feature Compensation (VTS) • VTS feature normalization: • Performed in log-FBE domain, (previous to DCT) • Based on a Gaussian mixture model trained with clean speech • Allows feature compensation and uncertainty estimation • Summary of VTS (vector Taylor series approach): • Given the noisy conditions, VTS provides a noisy Gaussian from each clean Gaussian • The noisy Gaussian mixture model allow the computation of the probabilities P(k|y) • An estimation of the clean speech x is then possible • An estimation of the uncertainty is also possible

  10. Model Based Feature Compensation (VTS) • Step 1: Estimation of a noisy Gaussian from a clean Gaussian: where the function g0, f0 and h0 are evaluated at the mean of the clean Gaussian and at the mean of the noise:

  11. Model Based Feature Compensation (VTS) • Step 2: Estimation of P(k|y): where: is the k-th Gaussian evaluated at the noisy speech y, and P(k) is the a-priori probability of the Gaussian. • Step 3: Estimation of clean speech:

  12. Model Based Feature Compensation (VTS) • Step 4: Estimation of uncertainty: assuming small values of the variance of the noise: and from the estimation of the clean speech: the uncertainty of the clean speech can be estimated as:

  13. Model Based Feature Compensation (VTS) • Aurora-2 results: • Some considerations about VTS: • Computational load • Better than HEQ, PEQ, etc., but only valid for additive noise or channel distortion • Estimation of noise is critical • There are some approximations in the formulation • Uncertainty: small improvement (insert., substit., delet.) • Alternative: model-based compensation based on numerical integration of pdfs

  14. Schedule • Non-linear feature normalization for mobile platform • Integration scheme • Results and discussion • Rapid speaker adaptation • Combination of adaptation at signal level and acoustic model level • Results and discussion • Assessment of two non-linear techniques for feature normalization • Non-linear parametric equalization • Model based feature compensation (VTS) • New improvements in robust VAD • Model based VAD

  15. Model-based VAD • Fundamentals of model-based VAD: • Gaussian mixture model in log-FBE domain • Gaussian mixture model trained with clean speech • VTS provides a noisy version of the GMM • From the noisy GMM, P(k|y) can be estimated for each observation yand each Gaussian k • A-priori probability of kth Gaussian being speech P(V|k) can be estimated from the training data • Then, the probability P(V|y) of the noisy observation y being speech is given by:

  16. Model-based VAD • Some considerations about model-based VAD: • VAD decision relies on a Gaussian mixture model trained with clean speech (based on speech events observed in the training database) • Not based on energy.... • Based on observations in the log-FBE domain • VTS adapts the Gaussian mixture to noisy conditions: the performance of the VAD is expected to be stable for a wide range of SNRs • Computational load

  17. Model-based VAD • Model-based VAD for different SNRs:

  18. Model-based VAD Comparison with other VADs: HR1 and HR0 evaluated for AURORA-2

  19. Model-based VAD Comparison with other VADs: HR1 and HR0 evaluated for AURORA-2

  20. Model-based VAD • Aurora-2 recognition results (WAcc): Baseline: 60.5 % (no VAD, no WF, no FD)

  21. HIWIRE MEETINGNancy, July 6-7, 2006 José C. Segura, Ángel de la Torre

More Related