1 / 8

Sphinx on Handhelds

Sphinx on Handhelds. David Huggins-Daines dhuggins@cs.cmu.edu. Sphinx on Handhelds?. Handheld/embedded devices are pretty speedy these days LVCSR on them is not unreasonable An open-source one does not exist yet CALO’s new focus on mobility S2S translation projects could use it

yair
Download Presentation

Sphinx on Handhelds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sphinx on Handhelds David Huggins-Daines dhuggins@cs.cmu.edu

  2. Sphinx on Handhelds? • Handheld/embedded devices are pretty speedy these days • LVCSR on them is not unreasonable • An open-source one does not exist yet • CALO’s new focus on mobility • S2S translation projects could use it • Sublime, smartphone applications, etc

  3. Handheld challenges • CPU speed • Typically 200-400MHz ARM/XScale • Faster than the workstations Sphinx started out on • No hardware floating-point instructions • ARM has very fast and sophisticated integer ISA • Memory and storage capacity/speed • DRAM is very limited (32 or 64MB) • Storage is very slow (typically CF cards) • Inefficient and clumsy operating systems • WinCE has no stdio, broken malloc, 32MB limit • PalmOS is much, much worse!

  4. Plan for Sphinx on Handhelds • Start out with Sphinx2 • It’s fast • People use it already • Convert “hot spots” to integer math • Precompute model files • Avoid parsing (no stdio, remember) • Allow memory-mapped I/O (subvert the 32MB limit on WinCE) • Disable non-useful features in libraries • e.g. flat lexicon search, CDHMM

  5. Current Status • Sphinx2 on Sharp Zaurus • Linux, 40MB system RAM, 206MHz ARM • Performance on RM1: 1.7x realtime • No degradation in accuracy • Integer front-end and GMM code complete • Front end also has a “faster” mode • 10% faster, 10% degradation in accuracy • Memory consumption is too high • WSJ5k can just barely run • Sphinx2 consumes about 16MB of heap space • Requires quantized mixture weights (-8bsen) • Sphinx3.x is much smaller … and slower

  6. Implementation details • FFT is done with 16:16 fixed point • Bits 31:16 are whole part and sign • Bits 15:0 are fractional part • I.e. all numbers scaled by 65536 • Lossless multiplication done using 4 integer shift-multiply-accumulates (ARM is really good at this) • Mel-spectrum calculated in log scale • Using base 1.0001 in order to exploit existing add-table implementation • “Faster” mode uses 28:4 fixed point instead • Overflows saturated to INT_MAX • Zeroes floored to log(2-4) - very important!

  7. Implementation details • Abstract types for intermediate values • mfcc_t, powspec_t, mean_t, var_t • #define FIXED_POINT to make them ints • Arithmetic macros (fixpoint.h) • fixed32 type analogous to float32 • addition and subtraction work as expected • MFCCMUL(), MFCC2FLOAT(), FLOAT2MFCC() macros become no-ops in floating-point build • GMMADD(), GMMSUB() do saturating addition and subtraction • ARM has special instructions for this too! Wow!

  8. Future Work • Rationalize the file formats • General WinCE porting (Mohit) • Front-end optimization • Implement fixed-point FHT • Investigate Sphinx 3.x for embedded • SubVQ and GS can make it fast and cut memory consumption even more • Much nicer architecture • But not widely used, API not as stable

More Related