hiwire meeting nancy july 6 7 2006 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
HIWIRE MEETING Nancy, July 6-7, 2006 PowerPoint Presentation
Download Presentation
HIWIRE MEETING Nancy, July 6-7, 2006

Loading in 2 Seconds...

play fullscreen
1 / 21

HIWIRE MEETING Nancy, July 6-7, 2006 - PowerPoint PPT Presentation


  • 124 Views
  • Uploaded on

HIWIRE MEETING Nancy, July 6-7, 2006. José C. Segura, Ángel de la Torre. Schedule. Non-linear feature normalization for mobile platform Integration scheme Results and discussion Rapid speaker adaptation Combination of adaptation at signal level and acoustic model level

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'HIWIRE MEETING Nancy, July 6-7, 2006' - nusa


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
hiwire meeting nancy july 6 7 2006

HIWIRE MEETINGNancy, July 6-7, 2006

José C. Segura, Ángel de la Torre

schedule
Schedule
  • Non-linear feature normalization for mobile platform
    • Integration scheme
    • Results and discussion
  • Rapid speaker adaptation
    • Combination of adaptation at signal level and acoustic model level
    • Results and discussion
  • Assessment of two non-linear techniques for feature normalization
    • Non-linear parametric equalization
    • Model based feature compensation (VTS)
  • New improvements in robust VAD
    • Model based VAD
hiwire meeting nancy july 6 7 20061

HIWIRE MEETINGNancy, July 6-7, 2006

José C. Segura, Ángel de la Torre

schedule1
Schedule
  • Non-linear feature normalization for mobile platform
    • Integration scheme
    • Results and discussion
  • Rapid speaker adaptation
    • Combination of adaptation at signal level and acoustic model level
    • Results and discussion
  • Assessment of two non-linear techniques for feature normalization
    • Non-linear parametric equalization
    • Model based feature compensation (VTS)
  • New improvements in robust VAD
    • Model based VAD
non linear parametric equalization
Non-linear Parametric Equalization
  • Feature normalization
  • Motivation of PEQ:
    • Limitation of linear methods:
      • Cepstral Mean Normalization
      • Cepstral Mean and Variance Normalization
    • Limitation of non-linear methods (HEQ, OSEQ):
      • Speech/non-speech ratio
      • Estimation problems
  • Parametric Equalization PEQ:
    • Two Gaussian Model (speech / non-speech)
    • Training of clean Gaussians; estimation of noisy Gaussians
    • Non-linear transformation: combination of two linear transformations (one for speech, one for non-speech)
non linear parametric equalization1
Non-linear Parametric Equalization
  • Aurora-2 results:
  • Aurora-4 results:
non linear parametric equalization2
Non-linear Parametric Equalization
  • Additional problem of non-linear transformations:
    • Once the transformation is estimated, it is an “instantaneous transformation”
    • Temporal correlations are not exploited
  • Temporal Smoothing (TES):
    • Each equalized cepstrum is time-filtered with an ARMA filter that restores autocorrelation of clean data
non linear parametric equalization3
Non-linear Parametric Equalization

TES

  • Aurora-2 results:
  • Aurora-4 results:

TES

model based feature compensation vts
Model Based Feature Compensation (VTS)
  • VTS feature normalization:
    • Performed in log-FBE domain, (previous to DCT)
    • Based on a Gaussian mixture model trained with clean speech
    • Allows feature compensation and uncertainty estimation
  • Summary of VTS (vector Taylor series approach):
    • Given the noisy conditions, VTS provides a noisy Gaussian from each clean Gaussian
    • The noisy Gaussian mixture model allow the computation of the probabilities P(k|y)
    • An estimation of the clean speech x is then possible
    • An estimation of the uncertainty is also possible
model based feature compensation vts1
Model Based Feature Compensation (VTS)
  • Step 1: Estimation of a noisy Gaussian from a clean Gaussian:

where the function g0, f0 and h0 are evaluated at the mean of the clean Gaussian and at the mean of the noise:

model based feature compensation vts2
Model Based Feature Compensation (VTS)
  • Step 2: Estimation of P(k|y):

where:

is the k-th Gaussian evaluated at the noisy speech y, and P(k) is the a-priori probability of the Gaussian.

  • Step 3: Estimation of clean speech:
model based feature compensation vts3
Model Based Feature Compensation (VTS)
  • Step 4: Estimation of uncertainty:

assuming small values of the variance of the noise:

and from the estimation of the clean speech:

the uncertainty of the clean speech can be estimated as:

model based feature compensation vts4
Model Based Feature Compensation (VTS)
  • Aurora-2 results:
  • Some considerations about VTS:
    • Computational load
    • Better than HEQ, PEQ, etc., but only valid for additive noise or channel distortion
    • Estimation of noise is critical
    • There are some approximations in the formulation
    • Uncertainty: small improvement (insert., substit., delet.)
  • Alternative: model-based compensation based on numerical integration of pdfs
schedule2
Schedule
  • Non-linear feature normalization for mobile platform
    • Integration scheme
    • Results and discussion
  • Rapid speaker adaptation
    • Combination of adaptation at signal level and acoustic model level
    • Results and discussion
  • Assessment of two non-linear techniques for feature normalization
    • Non-linear parametric equalization
    • Model based feature compensation (VTS)
  • New improvements in robust VAD
    • Model based VAD
model based vad
Model-based VAD
  • Fundamentals of model-based VAD:
    • Gaussian mixture model in log-FBE domain
    • Gaussian mixture model trained with clean speech
    • VTS provides a noisy version of the GMM
    • From the noisy GMM, P(k|y) can be estimated for each observation yand each Gaussian k
    • A-priori probability of kth Gaussian being speech P(V|k) can be estimated from the training data
  • Then, the probability P(V|y) of the noisy observation y being speech is given by:
model based vad1
Model-based VAD
  • Some considerations about model-based VAD:
    • VAD decision relies on a Gaussian mixture model trained with clean speech (based on speech events observed in the training database)
      • Not based on energy....
      • Based on observations in the log-FBE domain
    • VTS adapts the Gaussian mixture to noisy conditions: the performance of the VAD is expected to be stable for a wide range of SNRs
    • Computational load
model based vad2
Model-based VAD
  • Model-based VAD for different SNRs:
model based vad3
Model-based VAD

Comparison with other VADs: HR1 and HR0 evaluated for AURORA-2

model based vad4
Model-based VAD

Comparison with other VADs: HR1 and HR0 evaluated for AURORA-2

model based vad5
Model-based VAD
  • Aurora-2 recognition results (WAcc):

Baseline: 60.5 % (no VAD, no WF, no FD)

hiwire meeting nancy july 6 7 20062

HIWIRE MEETINGNancy, July 6-7, 2006

José C. Segura, Ángel de la Torre