On Improvement of CI-based GMM Selection in Sphinx 3. Arthur Chan, Ravishankar Mosur Alexander Rudnicky. Computer Science Department Carnegie Mellon University. CMU Sphinx is an open source speech recognition system.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
CI-based GMM Selection in Sphinx 3
Arthur Chan, Ravishankar Mosur
Computer Science Department
Carnegie Mellon University
Based GMM Selection (CIGMMS)
1. Bound the number of CD GMMs to be computed.
2. When best Gaussian index (BGI) of previous frame is available and CD is out of beam -> compute CD GMM score based on previous BGI.
Motivation: The current BGI is a good approximation of GMM score. And the previous BGI is a good approximation to the current BGI.
3. Use a tightened CI beam size for every N frames.
Motivation: Similar to dropping senone computation every N frames, and using previous frame scores (Chan 2004), which significantly reduced computation, but impacted accuracy.
Narrowing the CI beam size every N frames preserves the very best scoring senones in the current frame, and improves accuracy. Using a tightening factor provides more flexible control.
Summary: Technique for Gaussian Computation Speed-Up (Lee 2001, Chan 2004)
Idea: CI senone score as approximate score
1. Compute all CI scores, form a beam (CI beam) from the highest score
2. For all CD scores
a. If base CI score is within the beam -> Compute detailed CD score
b. Else -> Backoff to CI score
Issues of the Basic CIGMMS
Issue 1: Unpredictable Per-frame Performance:beam search -> number of CD scores computed varies a great deal
Issue 2: Poor Pruning Characteristics:Large number of CD scores fallback to the same CI scores -> pruning is less effective
Assumptions in Enhancement 2:
BGIs in adjacent frames are usually the same.
But how often? (Depends on GMM size)
Table: Percentages of adjacent BGIs that are the same.
Conclusions: Adjacent BGIs are quite consistent (even in noisy tasks). But, less consistent for the top-scoring senones (Not shown in table; Leads to Enhancement 3.)
Table: Word error rates and execution times.
Summary: Cumulative speedup of up to 37% with only slight increase in WER.