Less is More? - PowerPoint PPT Presentation

Less is more
1 / 22

  • Uploaded on
  • Presentation posted in: General

Less is More?. Yi Wu Advisor: Alex Rudnicky. People:. There is no data like more data!. Goal: Use less to Perform more. Identifying an informative subset from a large corpus for Acoustic Model (AM) training. Expectation of the Selected Set Good in Performance Fast in Selection.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Less is More?

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Less is more

Less is More?

Yi Wu

Advisor: Alex Rudnicky



There is no data like more data!

Goal use less to perform more

Goal: Use less to Perform more

  • Identifying an informative subset from a large corpus for Acoustic Model (AM) training.

  • Expectation of the Selected Set

    • Good in Performance

    • Fast in Selection



  • The improvement of system will become increasingly smaller when we keep adding data.

  • Training acoustic model is time consuming.

  • We need some guidance on what is the most needed data.

Approach overview

Approach Overview

  • Applied to well-transcribed data

  • Selection based on transcription

  • Choose subset that have “uniform” distribution on speech unit (word, phoneme, character)

How to sample data wisely a simple example

k Gaussian distribution with known priorωi and unknown density function fi(μi ,σi)

How to sample data wisely?--A simple example

How to sample wisely a simplified example

How to sample wisely?--A simplified example

  • We are given access to at most N examples.

  • We have right to choose how much we want from each class.

  • We train the model use MLE estimator.

  • When a new sample generated, we use our model to determine its class.


    How to sample to achieve minimum error?

The optimal bayes classifier

The optimal Bayes Classifier

If we have the exact form of fi(x), above classification is optimal.

To approximate the optimal

To approximate the optimal

  • We use our MLE

  • The true error would be bounded by optimal Bayes error plus error bound for our worst estimated

Sample uniformly

Sample Uniformly

  • We want to sample each class equally.

    • The data selected will have good coverage on each class.

    • This will give robust estimation on each class.

The real asr system

The Real ASR system

Data selection for asr system

Data Selection for ASR System

  • The prior has been estimated independently by language model.

  • To make acoustic model accurate, we want to sample the W uniformly.

  • We can take the unit to be phoneme, character, word. We want their distribution to be uniform.

Entropy measure for uniformness

Entropy: Measure for “uniformness”

  • Use the entropy of the word (phoneme) as ways of evaluation

    • Suppose the word (phoneme) has a sample distribution p1, p2…. pn

    • Choose subset have maximum -p1*log(p1)-p2*log(p2)-... pn *log(pn))

  • Entropy actually is the KL distance from uniform distribution

Computational issue

Computational Issue

  • It is computational intractable to find the transcription set that maximizes the entropy

  • Forward Greedy Search



  • There are multiple entropies we want to maximize.

  • Combination Method

    • Weighted Sum

    • Add sequentially

Experiment setup

Experiment Setup

  • System: Sphinx III

  • Feature: 39 dimension MFCC

  • Training Corpus: Chinese BN 97(30hr)+ GaleY1(810hr data)

  • Test Set: RT04(60 min)

Experiment 1 use word distribution

Experiment 1 ( use word distribution)

Table 1

More result

More Result

Experiment 2 add sequentially with phoneme and character 150hr

Experiment 2 (add sequentially with phoneme and character 150hr)

Table 2

Experiment 1 2

Experiment 1,2

Experiment 3 with vtln

Experiment 3 (with VTLN)

Table 3



  • Choose data uniformly according to speech unit

  • Maximize entropy using greedy algorithm

  • Add data sequentially

Future Work

  • Combine Multiple Sources

  • Select Un-transcribed Data

  • Login