Arthur Chan, Ravishankar Mosur Alexander Rudnicky

On Improvement of CI-based GMM Selection in Sphinx 3 Arthur Chan, Ravishankar Mosur Alexander Rudnicky Computer Science Department Carnegie Mellon University • CMU Sphinx is an open source speech recognition system. • Recent development (Sphinx 3.6) has focused on building a real-time continuous HMM system and speaker adaptation. • In this work, we describe improvements to the GMM computation which reduces 10%-30% computation in the Viterbi search in different tasks. • The algorithms are freely available in a www.cmusphinx.org. Context-Independent Senone- Based GMM Selection (CIGMMS) Three Enhancements 1. Bound the number of CD GMMs to be computed. 2. When best Gaussian index (BGI) of previous frame is available and CD is out of beam -> compute CD GMM score based on previous BGI. Motivation: The current BGI is a good approximation of GMM score. And the previous BGI is a good approximation to the current BGI. 3. Use a tightened CI beam size for every N frames. Motivation: Similar to dropping senone computation every N frames, and using previous frame scores (Chan 2004), which significantly reduced computation, but impacted accuracy. Narrowing the CI beam size every N frames preserves the very best scoring senones in the current frame, and improves accuracy. Using a tightening factor provides more flexible control. Summary: Technique for Gaussian Computation Speed-Up (Lee 2001, Chan 2004) Idea: CI senone score as approximate score Procedure: 1. Compute all CI scores, form a beam (CI beam) from the highest score 2. For all CD scores a. If base CI score is within the beam -> Compute detailed CD score b. Else -> Backoff to CI score Issues of the Basic CIGMMS Issue 1: Unpredictable Per-frame Performance:beam search -> number of CD scores computed varies a great deal Issue 2: Poor Pruning Characteristics:Large number of CD scores fallback to the same CI scores -> pruning is less effective Experimental Results Assumptions in Enhancement 2: BGIs in adjacent frames are usually the same. But how often? (Depends on GMM size) Table: Percentages of adjacent BGIs that are the same. Conclusions: Adjacent BGIs are quite consistent (even in noisy tasks). But, less consistent for the top-scoring senones (Not shown in table; Leads to Enhancement 3.) Table: Word error rates and execution times. Summary: Cumulative speedup of up to 37% with only slight increase in WER.

Arthur Chan, Ravishankar Mosur Alexander Rudnicky

Arthur Chan, Ravishankar Mosur Alexander Rudnicky

Presentation Transcript

Bodhidharma, Chan Buddhism, and Chan Patriarchs

Bill Chan

Ravishankar P. Hariharan

Jackie Chan

Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky

Ernie Chan

Alan Chan

EASON CHAN

Arthur

Jackie Chan

Jackie Chan

CHAN

Jackie chan

Jackie Chan

Mae Chan

chan

Arthur

suzette chan

King Arthur: Arthur Dux Bellorum