Amsp advanced methods for speech processing
1 / 20

AMSP : Advanced Methods for Speech Processing - PowerPoint PPT Presentation

  • Uploaded on

AMSP : Advanced Methods for Speech Processing. An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues Submitted by Marcos FAUNDEZ-ZANUY

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'AMSP : Advanced Methods for Speech Processing' - elmer

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Amsp advanced methods for speech processing
AMSP : Advanced Methods for Speech Processing

An expression of Interest to set up a Network of Excellence in FP6

Prepared by members of COST-277

and colleagues

Submitted by Marcos FAUNDEZ-ZANUY

Presented here by Gérard [email protected] GET-ENST/CNRS-LTCI


  • Rationale of the proposition

  • Objectives

  • Approaches

  • Modeling

  • Recognition by synthesis

  • Robustness to environmental conditions

  • Evaluation paradigm

  • Excellence

  • Integration and structuring effect

Rationale for the noe amsp
Rationale for the NoE-AMSP

  • The areas of Automatic Speech Processing (recognition, synthesis, coding, language identification, speaker verification) should be better integrated

  • Better models of Speech Production and Perception

  • Investigate Nonlinear Speech Processing

  • Understanding, Semantic interpretation

Integrated platform for automatic speech processing
Integrated platform for Automatic Speech Processing

Features of speech models
Features of Speech Models

  • Reflect auditory properties of human perception

  • Explain articulatory movements

  • Surpass the limitations of the source-filter model

  • Capture the dynamics of speech

  • Capable of natural speech restitution

  • Be discriminant for segmental information

  • Robust to noise and channel distortions

  • Adaptable to new speakers and new environments

Time frequency distributions
Time – Frequency distributions

  • Short Time Fourier Transform

  • Non-linear frequency scale (PLP, WLP), mel-cepstrum

  • Wavelets, FAMlets

  • Bilinear distributions (Wigner-Ville, Choi-Williams,...)

  • Instantaneous frequency, Teager operator

  • Time – dependent representations (parametric and non parametric)

  • Vector quantisation

  • Matrix quantisation, non linear prediction

Time dependent spectral models
Time-dependent Spectral Models

  • Temporal Decomposition (B. Atal, 1983)

  • Vectorial Autoregressive models with detection of model ruptures (A. DeLima, Y. Grenier)

  • Segmental parameterisation using a time-dependent polynomial expansion (Y. Grenier)

Modeling of segmental units
Modeling of segmental units

  • Hidden Markov Model

  • Markov Fields

  • Bayesian Networks, Graphical Models


  • Production models

  • Synthesis (concatenative or rule based)

    with voice transformation

    AND / OR

  • Non linear predictor

Expected achievements in speech coding and synthesis
Expected achievements in Speech Coding and Synthesis

  • Modeling the non-linearities in Speech Production and Perception will lead to more accurate and/or compact parametric representations.

  • Integrate segmental recognition and synthesis techniques in the coding loop to achieve bit rates as low as a few 100's bps with natural quality

  • Develop voice transformation techniques in order to :

    • Adapt segmental coders to new speakers,

    • Modify the characteristics of synthetic voices

Expected achievements in speech synthesis
Expected achievements inSpeech Synthesis

  • Self-excited nonlinear feedback oscillators will allow to better match synthetic and human voices.

  • Current concatenative techniques should be supplemented (or replaced) by (nonlinear) model based generative techniques to improve quality, naturalness, flexibility, training and adaptation.

  • Model-based voice mimicry controled by textual, phonetic and/or parametric input should not only improve synthesis but also coding, recognition and speaker characterisation.

Automatic speech recognition
Automatic Speech Recognition

  • Limitations of the HMM and hybrid HMM-ANN approaches

  • Keyword spotting (detection with SVM), noise robustness, adaptation

  • Large Vocabulary Speech Recognition (SIROCCO)

  • Markov Random Fields, Bayesian Networks and Graphical Models

Markov random fields bayesian networks and graphical models
Markov Random Fields Bayesian Networks and Graphical Models

  • Speech modelling with state constrained

  • Markov Random Field over Frequency bands

  • (Guillaume Gravier and Marc Sigelle)


  • Comparative framework to study MRF,

  • Bayesian Networks and Graphical Models.


Recognition by synthesis
Recognition by Synthesis

  • If we could drive a synthesizer with meaningful units (phone sequences, words,...) to produce a speech signal that mimics the one to recognize, we may come close to transcription.

  • Analysis by Synthesis (which is in fact modeling) is a powerful tool in recognition and coding.

  • A trivial implementation is indexing a labelled speech memory

A l i s p






Automatic discovery of segmental units for

speech coding, synthesis, recognition, language

identification and speaker verification.

The robustness issue
The robustness issue :

  • Mismatch between training and testing conditions

  • High Order Statistics are less sensitive to environment and transmission noise than autocorrelation

  • CMS, RASTA filtering

  • Independent Component Analysis

  • From Speaker Independent to Speaker Dependent recognition (Personalisation)

Expected achievements in automatic speech recognition
Expected achievements inAutomatic Speech Recognition

  • Dynamic nonlinear models should allow to merge feature extraction and classification under a common paradigm

  • Such models should be more robust to noise, channel distortions and missing data (transmission errors and packet losses)

  • Indexing a speech memory may help in the verification of hypotheses (a technique shared with Very Low Bit Rate Coders)

  • Statistical language models should be supplemented with adapted semantic information (conceptual graphs)

Voice technology in majordome
Voice technology in Majordome

  • Server side background tasks:

    continuous speech recognition applied to voice messages upon reception

    • Detection of sender’s name and subject

  • User interaction:

    • Speaker identification and verification

    • Speech recognition (receiving user commands through voice interaction)

    • Text-to-speech synthesis (reading text summaries, E-mails or faxes)

Collaboration with cost 278
Collaboration with COST-278

  • COST-278: Vocal Dialogue

    is a continuation of COST-249

  • High interest in Robust Speech Recognition, Word spotting, Speech to actions, Speaker adaptation,...

  • Some members contribute to the Eureka-MAJORDOME project

  • Could be the seed for a Network of Excellence in FP6

Evaluation paradigm
Evaluation paradigm


  • NIST


      Could we organize evaluation campaigns in Europe ?

      The 6th program of the EU is trying to promote Networks of Excellence.

      How should excellence be evaluated ?

      Should financial support be correlated with evaluation results ?