Survey icassp 2007 discriminative training
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

Survey ICASSP 2007 Discriminative Training PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on
  • Presentation posted in: General

Survey ICASSP 2007 Discriminative Training. Reporter: Shih-Hung Liu 2007/04/30. References. Large-Margin Minimum Classification Error Training for Large-Scale Speech Recognition Tasks Dong Yu, Li, Deng Xiaodong He, Alex Acero , Microsoft

Download Presentation

Survey ICASSP 2007 Discriminative Training

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Survey icassp 2007 discriminative training

Survey ICASSP 2007Discriminative Training

Reporter: Shih-Hung Liu 2007/04/30


References

References

  • Large-Margin Minimum Classification Error Training for Large-Scale Speech Recognition Tasks

    • Dong Yu, Li, Deng Xiaodong He, Alex Acero, Microsoft

  • Approximate Test Risk Minimization Through Soft Margin Estimation

    • Jinyu Li, Sabato Marco, Chin-Hui Lee, Georgia

  • Unsupervised Training for Mandarin Broadcast News and Conversation transcription

    • L. Wang, M.J.F. Gales, P.C. Woodland, Cambridge

  • A New Minimum Divergence Approach to Discriminative Training

    • J. Du, P. Liu, H. Jiang, F.K. Soong, R.H. Wang, Microsoft Asia


Lm mce

LM-MCE

  • The basic idea of LM-MCE is to include the margin in the optimization criteria along with the smoothed empirical error rate and make the correct samples classified well far away from the decision boundary

  • To successfully incorporate the margin, we proposed increasing the discriminative margin gradually over iterations


Lm mce1

Using Parzen window

LM-MCE


Lm mce2

LM-MCE

define symmetric kernel function

Margin-free Bayes Risk


Lm mce3

LM-MCE


Lm mce4

LM-MCE


Experiments

Experiments


Soft margin estimation

Soft Margin Estimation

  • Test risk bound expressed as a sum of an empirical risk and a function of VC dimension

  • Approximate test risk minimization

  • Define loss function


Soft margin estimation1

Soft Margin Estimation


Soft margin estimation on lvcsr

Soft Margin Estimation on LVCSR


Experiments1

Experiments


Unsupervised training

Unsupervised Training

  • Segmentation:

    • First, advert removal is run. Here the arithmetic harmonic sphericity distance is used to detect repeated blocks of audio data, for example jingles or commercials.

    • Acoustic segmentation is performed. The data is then split into wide-band and narrow-band speech.

    • Sections of music are discarded.

    • Finally gender detection and speaker clustering are run.


Unsupervised training1

Unsupervised Training

  • Transcription generation:

    • Initial transcriptions are generated using good acoustic models, MPE trained in this work

    • • P1: gender-independent models are used to generate initial transcriptions using a trigram language model and relatively tight beamwidths.

    • • P2: the 1-best hypothesis from the P1 stage is used to generate adaptation transforms. Here least squares linear regression and diagonal variance transforms are estimated. Using the adapted models lattices are generated using a trigram language model. These lattices are then rescored using a 4-gram language model.


Experiments2

Experiments


A new minimum divergence approach

A New Minimum Divergence Approach

  • MD possesses the following advantages:

    • 1. It is with higher resolution than any label comparison based error definition.

    • 2. It is a general solution in dealing with any kinds of models and phone sets.

    • As a result, MD outperforms other DT criteria on several tasks

  • It is notable that in MD, the accuracy term is a function of model parameters. Hence, we can also take it into consideration in the optimization process


A new minimum divergence approach1

A New Minimum Divergence Approach

  • MD criterion

  • Joint optimization

  • It satisfies the conditions of the weak-sense auxiliary function


A new minimum divergence approach2

A New Minimum Divergence Approach


A new minimum divergence approach3

A New Minimum Divergence Approach

  • With state frame independent assumption


A new minimum divergence approach4

A New Minimum Divergence Approach

  • Statistics for EBW


A new minimum divergence approach5

A New Minimum Divergence Approach


Experiments3

Experiments


  • Login