HIWIRE MEETING Nancy, July 6-7, 2006

HIWIRE MEETINGNancy, July 6-7, 2006 José C. Segura, Ángel de la Torre

Schedule • Non-linear feature normalization for mobile platform • Integration scheme • Results and discussion • Rapid speaker adaptation • Combination of adaptation at signal level and acoustic model level • Results and discussion • Assessment of two non-linear techniques for feature normalization • Non-linear parametric equalization • Model based feature compensation (VTS) • New improvements in robust VAD • Model based VAD

Non-linear Parametric Equalization • Feature normalization • Motivation of PEQ: • Limitation of linear methods: • Cepstral Mean Normalization • Cepstral Mean and Variance Normalization • Limitation of non-linear methods (HEQ, OSEQ): • Speech/non-speech ratio • Estimation problems • Parametric Equalization PEQ: • Two Gaussian Model (speech / non-speech) • Training of clean Gaussians; estimation of noisy Gaussians • Non-linear transformation: combination of two linear transformations (one for speech, one for non-speech)

Non-linear Parametric Equalization • Aurora-2 results: • Aurora-4 results:

Non-linear Parametric Equalization • Additional problem of non-linear transformations: • Once the transformation is estimated, it is an “instantaneous transformation” • Temporal correlations are not exploited • Temporal Smoothing (TES): • Each equalized cepstrum is time-filtered with an ARMA filter that restores autocorrelation of clean data

Non-linear Parametric Equalization TES • Aurora-2 results: • Aurora-4 results: TES

Model Based Feature Compensation (VTS) • VTS feature normalization: • Performed in log-FBE domain, (previous to DCT) • Based on a Gaussian mixture model trained with clean speech • Allows feature compensation and uncertainty estimation • Summary of VTS (vector Taylor series approach): • Given the noisy conditions, VTS provides a noisy Gaussian from each clean Gaussian • The noisy Gaussian mixture model allow the computation of the probabilities P(k|y) • An estimation of the clean speech x is then possible • An estimation of the uncertainty is also possible

Model Based Feature Compensation (VTS) • Step 1: Estimation of a noisy Gaussian from a clean Gaussian: where the function g0, f0 and h0 are evaluated at the mean of the clean Gaussian and at the mean of the noise:

Model Based Feature Compensation (VTS) • Step 2: Estimation of P(k|y): where: is the k-th Gaussian evaluated at the noisy speech y, and P(k) is the a-priori probability of the Gaussian. • Step 3: Estimation of clean speech:

Model Based Feature Compensation (VTS) • Step 4: Estimation of uncertainty: assuming small values of the variance of the noise: and from the estimation of the clean speech: the uncertainty of the clean speech can be estimated as:

Model Based Feature Compensation (VTS) • Aurora-2 results: • Some considerations about VTS: • Computational load • Better than HEQ, PEQ, etc., but only valid for additive noise or channel distortion • Estimation of noise is critical • There are some approximations in the formulation • Uncertainty: small improvement (insert., substit., delet.) • Alternative: model-based compensation based on numerical integration of pdfs

Model-based VAD • Fundamentals of model-based VAD: • Gaussian mixture model in log-FBE domain • Gaussian mixture model trained with clean speech • VTS provides a noisy version of the GMM • From the noisy GMM, P(k|y) can be estimated for each observation yand each Gaussian k • A-priori probability of kth Gaussian being speech P(V|k) can be estimated from the training data • Then, the probability P(V|y) of the noisy observation y being speech is given by:

Model-based VAD • Some considerations about model-based VAD: • VAD decision relies on a Gaussian mixture model trained with clean speech (based on speech events observed in the training database) • Not based on energy.... • Based on observations in the log-FBE domain • VTS adapts the Gaussian mixture to noisy conditions: the performance of the VAD is expected to be stable for a wide range of SNRs • Computational load

Model-based VAD • Model-based VAD for different SNRs:

Model-based VAD Comparison with other VADs: HR1 and HR0 evaluated for AURORA-2

Model-based VAD • Aurora-2 recognition results (WAcc): Baseline: 60.5 % (no VAD, no WF, no FD)

HIWIRE MEETING Nancy, July 6-7, 2006

HIWIRE MEETING Nancy, July 6-7, 2006

Presentation Transcript

July 2006

Welcome Session July 6, 2006

LACPA – Roger Nasr July 6, 2006

LACPA – Roger Nasr July 6, 2006

HIWIRE MEETING Paris, February 11, 2005

July , 2006

CLIMBING MT FUJI JULY 6, 2006

Interservice Meeting, 7 July 2005

JISC/CNI Meeting 6 th July 2006

ECT* Trento, July 3-7, 2006

Consortium Meeting 6/7 Sept 2006

Purchasing Directors’ Meeting July 13, 2006

HIWIRE MEETING Athens, November 3-4, 2005

HIWIRE MEETING Trento, January 11-12, 2007

HIWIRE MEETING Granada, June 9-10, 2005

July 6, 2006

SGC Meeting, JIC, 6-7 th April 2006

Exosome Journal Club July 7, 2006

HIWIRE PRESENTATION

STARLab Research Meeting July 7, 2006 Relevance Computation

HIWIRE MEETING Torino, March 9-10, 2006

Naples, July 6 2006