Enhanced speech models for robust speech recognition
Download
1 / 47

Enhanced Speech Models for Robust Speech Recognition - PowerPoint PPT Presentation


  • 132 Views
  • Uploaded on

Enhanced Speech Models for Robust Speech Recognition. Juan Arturo Nolazco-Flores Dpto. de Ciencias Computacinales ITESM, campus Monterrey. Talk Overview. Introduction Enhanced-Speech Models Coments and Conclusions. Questions?. Introduction. Problem:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Enhanced Speech Models for Robust Speech Recognition' - diane


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Enhanced speech models for robust speech recognition

Enhanced Speech Modelsfor Robust Speech Recognition

Juan Arturo Nolazco-Flores

Dpto. de Ciencias Computacinales

ITESM, campus Monterrey


Talk overview
Talk Overview

  • Introduction

  • Enhanced-Speech Models

  • Coments and Conclusions



Introduction
Introduction

  • Problem:

    • Automatic Speech Recognition performance is highly degraded when speech is corrupted for noise (additive noise, convolutional noise, etc.).

  • Fact:

    • In order to have real speech recognisers, ASR should tackle this problem.

  • Knowledge.

    • ASR can be improved either:

      • Enhancing speech before recognition

      • Training models in the same environment the ASR is going to be used.

  • Challenge:

    • Find a simple and efficient technique to solve this problem.


Recognition using cd hmm

Input Data

It needs a model

for unit of recognition.

M1

M2

Probability of each model.

MQ

Higher Probability

Recognised word

Recognition using CD-HMM

Recogniser



Enhancing speech
Enhancing Speech

  • Features:

    • Models are trained with clean speech.

    • Corrupted speech is enhanced.

  • There are a number of well studied techniques:

    • Subtract an estimated noise found during nonspeech activity.

    • Adaptive noise cancelling (ANC).

  • Successful for low to medium SNR (>5dB).


  • Problems:

    • Enhancers are not perfects, therefore

      • the speech is distorted and

      • there are residual noise.


Training models in the same environment
Training models in the same environment

  • ASR systems which uses this technique can deal with low to high SNR (>0 dB).

  • In example, for an isolated digit recognition task where digits are corrupted for helicopter(Lynx) noise, you can get the following performance:

  • For TIMIT

  • Problem:

    • There are many possible environments (no practical).



Changing to linear domain using pmc
Changing to linear domain using PMC clean speech model and noise model and obtain a noisy speech model.

  • Introduction

  • Scheme

  • Diagram


Introduction1
Introduction clean speech model and noise model and obtain a noisy speech model.

  • It is an artificial way to simulate that the system has been trained in the adverse environment the system is going to work.

  • The clean speech CHMM and the noise CHMM (estimated with the noise before the word is uttered) are combined in the linear domain to obtain models adapted to the adverse environment.

  • The combination is based in the assumption that that pdf of the state distribution models are completely defined by the mean and variance.


Scheme
Scheme clean speech model and noise model and obtain a noisy speech model.

  • For simplicity, it is convenient to combine these models in a linear domain.

  • Problem:

    • High performance speech recognition is obtained in a non-linear domain (i.e. mel-cepstral domain, auditory-based coefficients).

  • Solution:

    • Transform coefficients to a linear domain.


Diagram
Diagram clean speech model and noise model and obtain a noisy speech model.

Clean

speech

HMM

Linear

domain

C-1()

exp()

PMC

HMM

C()

+

log()

Noise

HMM

C-1()

exp()

Simulates training in noise.


Enhanced speech models
Enhanced Speech Models clean speech model and noise model and obtain a noisy speech model.

  • Introduction

  • Hypothesis prove

  • Enhanced-Speech Models Combination

    • Changing to linear domain using PMC

    • Diagram

    • Results


Introduction2
Introduction clean speech model and noise model and obtain a noisy speech model.

  • When we train in the same environment, we obtained the following upper boundry values:

  • Since PMC or CDMC (Cepstrum-Domain Model Combination) tries to simulated recognition in the same environment, hence this are the best expected results for these kind of techniques.


Introduction3
Introduction clean speech model and noise model and obtain a noisy speech model.

  • How can we improve recognition performance in adverse environments?


  • Fact: clean speech model and noise model and obtain a noisy speech model.

    • The enhancer returns a “cleaner” speech, but distorted.

  • Therefore the question is:

    • Is it possible to improve recognition performance if the models where trained with this enhaned speech?


Hypothesis
Hypothesis clean speech model and noise model and obtain a noisy speech model.

  • Enhanced-Speech models improve ASR performance in noisy environments.


In order to prove this hypothesis
In order to prove this hypothesis: clean speech model and noise model and obtain a noisy speech model.

  • A signal enhancement scheme has to be selected.

  • Models has to be trained with the enhanced speech.

  • Observation vectors input to the recogniser has to be processed for the selected enhancement scheme.


Hypothesis prove
Hypothesis Prove clean speech model and noise model and obtain a noisy speech model.

  • Introduction

  • Spectral Subtraction definition

  • Experiments and results

  • Conclusions


Introduction4
Introduction clean speech model and noise model and obtain a noisy speech model.

  • Since it is a simple (and successful) scheme, Spectral Subtraction (SS) was selected.


Spectral subtraction definition
Spectral Subtraction Definition clean speech model and noise model and obtain a noisy speech model.

  • Before filterbank

  • After filterbank.


Experiments and results
Experiments and Results. clean speech model and noise model and obtain a noisy speech model.

  • CHMMs were trained with speech enhanced by SS.

  • Recognition performance was developed over speech enhance by SS in the same conditions.


Example 1
Example 1 clean speech model and noise model and obtain a noisy speech model.

  • Task: isolated digit Recognition

  • Vocabulary Size: 10

  • Training: Using enhanced speech

  • Noise: Helicopter (Lynx)

  • Database: Noisex92

  • Real noise is artificially added to clean speech, such that no Lombard effect can bias recognition performance.


  • bPSS clean speech model and noise model and obtain a noisy speech model.

Std. HMM

Training Models in Noise

(PMC)

Enhanced-Speech Models


Example 2
Example 2 clean speech model and noise model and obtain a noisy speech model.

  • Task: continuous digit Recognition

  • Vocabulary size: 30 words

  • Training: Using enhanced speech

  • Noise: White

  • White noise is artificially added to clean speech, such that no Lombard effect can bias recognition performance.


Results
Results: clean speech model and noise model and obtain a noisy speech model.

Std. HMM

Noisy Speech

Models (PMC)

Enhanced-Speech

Models


Example 3
Example 3: clean speech model and noise model and obtain a noisy speech model.

  • Task: continuous speech Recognition

  • Vocabulary size: 6233 words

  • Training: Using enhanced speech

  • Noise: white

  • Database: TIMIT

  • Real noise is artificially added to clean speech, such that no Lombard effect can bias recognition performance.


Results1
Results: clean speech model and noise model and obtain a noisy speech model.

Std. HMM

Noisy Speech

Models (PMC)

Enhanced-Speech

Models


Conclusions
Conclusions clean speech model and noise model and obtain a noisy speech model.

  • Hypothesis was prove to be true.

  • Challenge:

    • Tried these experiments using other databases.

    • How can we combine

      • Enhanced Scheme,

      • the Noise Model

      • and the Clean models

    • such that we do not need to train for all enhancement conditions.


Conclusions1
Conclusions clean speech model and noise model and obtain a noisy speech model.

  • Are all the enhancement schemes suited for combination?


Conclusions2
Conclusions clean speech model and noise model and obtain a noisy speech model.

  • Now, we know that ASR can be improved either:

    • Enhancing speech before recognition

    • Training CHMM in the same environment the ASR is going to be used.

    • Training CHMM with the same enhancement technique that is used to get “cleaner” speech at recognition.

  • Advantage:

    • Moreover, training with a better enhancement technique means a potential better recognition performance.


  • Es ss model combination
    ES-SS Model Combination clean speech model and noise model and obtain a noisy speech model.

    • Introduction

    • ES-Spectral Subtraction Scheme


    Introduction5
    Introduction clean speech model and noise model and obtain a noisy speech model.

    • How can we combine CHMMs without having to train for each enhancement and noise condition?

    • Observation: For CHMMs the state’s pdfs are completelydefined for their means and variances.


    Es spectral subtraction scheme
    ES-Spectral Subtraction Scheme clean speech model and noise model and obtain a noisy speech model.

    Assuming Y and YD can be modelled as parametric distributions

    with means E[Y] and E[YD] and variances V[Y] and V[YD].

    It can be shown that these parameters are distorted as follows:

    pdf of Y


    Prove
    Prove: clean speech model and noise model and obtain a noisy speech model.

    where

    Re-arranging


    Hence: clean speech model and noise model and obtain a noisy speech model.


    A a p y
    A(a,P(Y)) clean speech model and noise model and obtain a noisy speech model.

    Assuming that Y is lognormal:

    Making

    ( )


    Es pmc diagram
    ES-PMC Diagram clean speech model and noise model and obtain a noisy speech model.

    Adaptation

    calculations

    Clean

    speech

    HMM

    ES-PMC

    HMM

    C->log

    exp()

    C()

    log()

    +

    +

    PMC

    Noise

    HMM

    C->log

    exp()

    Speech is pre-processed using SS.


    Results2
    Results clean speech model and noise model and obtain a noisy speech model.

    No compensation scheme

    Spectral

    Subtraction

    PMC

    Spectral

    Subtraction and parallel model

    combination


    Results3
    Results clean speech model and noise model and obtain a noisy speech model.

    No compensation scheme

    Spectral

    Subtraction

    PMC

    Spectral

    Subtraction and parallel model

    combination


    Results4
    Results clean speech model and noise model and obtain a noisy speech model.

    No compensation scheme

    Spectral

    Subtraction

    PMC

    Spectral

    Subtraction and parallel model

    combination


    Results5
    Results clean speech model and noise model and obtain a noisy speech model.

    No compensation scheme

    Spectral

    Subtraction

    PMC

    Spectral

    Subtraction and parallel model

    combination


    Coments and conclusions
    Coments and Conclusions clean speech model and noise model and obtain a noisy speech model.

    • Since training and recognition with the same speech enhancement scheme have not been tried before, hence a new area of research has been open.

      • How can we combine CHMM, such that we do not need to train for all enhancement conditions.

      • Are all the enhancement technique suited for CHMM combination?

    • We show how to combine enhanced-speech, noise and clean CHMM for SS scheme.

    • It was shown that equations for ES-PMC-SS were straightforward.



    ad