390 likes | 564 Views
3 rd Progress Meeting For Sphinx 3.6 Development. Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006. This meeting . 3 rd Progress report on 3.6 development (40 pages) Agenda What happened in Fall 2005? (4 slides)
E N D
3rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006
This meeting • 3rd Progress report on 3.6 development (40 pages) • Agenda • What happened in Fall 2005? (4 slides) • Progress of Sphinx Development in Fall 2005 (17 slides) • Summary of Progress in 2005 (10 slides) • Discussion: Should we create one release candidate? (1 slide)
What happened in Fall 2005? • Major Events in Sphinx Development • We participate GALE in Oct 2006 • Conformance of the recognizers (sphinx 3 and sphinx 4) become an issue • Lack of advanced acoustic modeling techniques become very glaring • Sphinx 3 and 4 have gone through bug fixes. • CALO effort are now split to two • Off-line recognizer: require major improvement in LM and AM. • AM Issue is shared with GALE • On-line recognizer (CALO jargon: Smartnote) • Now have new LM and AM • Require significant development work
Time distribution (Estimated) • Arthur • 50% on GALE, 20% on CALO, 30% on Sphinx • Dave • 65% CALO, 30% on PocketSphinx, 5% on Sphinx • Yitao • 90% CALO, 10% on Sphinx
The Two Funded Projects • Upside: • They point to issues that need to be solved • Need significant reprioritization of tasks • Balance of effort on the 2 projects is now achieved • Downside: • Code development of Sphinx becomes a slower process • Also, we haven’t released s3 for a while • => Should we release the code now? • Tired students and staffs can be found everywhere
Overview • Work on second-stage • Merging of bestpath search in the 2-nd stage of tree search • IBM lattice generation • word confidence estimation • Behavior changes and bug fixes • Treatment of acoustic scores • Assertion in vithist.c • Attempts in search algorithm improvements • Mode 3 – Flat lexicon decoding • Mode 4 – Tree lexicon decoding • Sphinx on Mandarin and coded language. • New tools: conf, dp
Work Schedule • Sep 1 to Oct 1: • Implementation of triphones in flat lexicon decoder • Oct 1 to Nov 1: • Implementation of triphones on tree lexicon decoder (incomplete) • Nov 1 to Dec 8: • IBM lattice generation • Confidence score generation • Fixed issues in scores • Dec 8 to Jan 3: Concept of “vacation” was tried • Jan 3 to now: • Fixed bugs, prepare release.
Second-stage Processing • Best-path search could now be specified in decode • Implementation requires write back. (urgh.) • Recognizer can now generate lattice in IBM format • Word is attached at the link • Sphinx format generates word attached to the node. • Scores are normalized with best senone scores • Rong’s confidence-based routine is now in Sphinx • conf • Goodies: use Sphinx logs3 routine -> significantly reduce alpha-beta scores mismatch.
Second-stage Processing (cont.) • Further work • Best-path generation doesn’t conform to past 3.5 • -> Bugs caused by 3.6 development • Also, the best path is not always in the lattice • -> Legacy bug • Confidence-based method • Lattice-based : could only be used off-line currently • 10% of the data still have alpha-beta mismatch • Consensus network generation need special focus
Scores we see (Change 1) • Tree search now truly generate un-normalized scores. • was normalized by the ending frame only • Caused by bug introduced in mid-2005 • All 1-st stage search use the same score logging functions • Include align, allphone, decode_anytopo, decode • matchseg_write, match_write are the current versions • log_* is still used but will soon be totally replaced
Scores we see(Change 2) • Multi-stream GMM computation (ms_gauden) • By default, it won’t quantize log pdf to 8 bits now • Single-stream GMM computation • Vectors with zero means and variances are removed (-remove_zero_var_gau) • Scores and performance will change • Testing resource has changed. • (Evandro grins at this point)
Scores we see (Change 3) • Sphinx now supports generation of different hypseg format (-hypseg_fmt) • SPHINX 2-format • SPHINX 3-format • ctm format • Always require more processing, but it is better than nothing.
Scores – a summary • Unnormalized (true) acoustic and language scores generated by (-hypsegscore_unscale) • 1-st stage search and • Best path search right after the 1-st stage • Normalized acoustic score would be generated by • Lattice generation • If developers wants to have true scores in lattice • Developers could get the best scores from the decoder (–bestsenscrdir) and do their own processing
Other important bug fixes • Bug in vithist.c • Caused assertion and stop the recognizer • Now fix and will return error message to the search abstraction routine.
Attempts in search algorithm improvements (Mode 3) • Flat-lexicon decoder • Search implementation is completed • decode could now use flat-lexicon decoding • -op_mode 3 • Decoders revamping is completed • Mode 2 (FST) • Mode 3 (Flat-lexicon) • Mode 4 (Ravi’s Tree-Lexicon) • Mode 5 (Arthur’s Tree-Lexicon) • decode_anytopo is still there for backward compatibility purpose • decode_anytopo = decode in mode 3
No Further Re-factoring • Avoid re-factoring before next check-in • Align and allphone have different input/output file formats • It doesn’t make sense to stuff into a single executable. • Using XML configuration and control file will be a choice • But it takes too much time to implement
Algorithmic Work -Flat Lexicon Decoder • Full triphone completed in flat-lexicon decoding • 2.5% relative improvement in accuracy • But requires 100xRT (urgh) • Useful for debugging • Also considered full trigram implementation • Will results in another 5-10 times slow down • Conclusion • Flat lexicon search has come to its limit
Algorithmic Work -Tree Lexicon Decoder • Current full triphone implementation • Has flaws in score propagation • Tree copies • No time to do it at all, Q4’s workload nearly kill AC • Benchmarking results • GALE results: • Full Lexicon = Tree Lexicon • CALO/Communicator results: • Tree Lexicon 5% relative poorer. • Conclusion • Half a year on search is expected to give us another 5%
Conclusion on Search • Need to seriously consider • Is working on search a good idea? • In both CALO/GALE, gain come from • SAT and cross adaptation • Second-stage processing • Confusion network • Confidence annotation • First-stage SD -> Second-stage SA • VTLN • also only give 5% rel • but it only takes 5 days to implement
Sphinx on Different Text Encodings • There are already non-CMU work for • Spanish • French • Big question mark • Could it work on other encoding?
Sphinx on Mandarin (cont.) • Thanks to Ravi • Bugs we fixed to get it through • 1236322: libutil\str2words special character bug • 1236166: special character wasn't supported • This should give us fairly good foundation to start on most language
Summary of Sphinx in Fall 2005 • We have done something • Strong focus in search research doesn’t seem to get us far. • Fire to fight on the modeling side • Sounds like the time to check in and move on
Progress of Sphinx 3.X(From X=5 to X=6) • New Features (4 slides) • Items that are significant • Gentle, mild and simple re-factoring and its consequence (4 slides) • Documentation (1 slide) • Regression testing (1 slide) • Pruned Features ?
New Features (Search) • Speed • Further enhancement of CIGMMS • BBI tree implementation (by Dave, in SphinxTrain) • Search • FST search • Full triphone implementation in decode_anytopo • Separation of search abstraction/implementation in 3.X
New Features (Adaptation) • Adaptation • Multiple classes for MLLR (by Dave) • MAP adaptation (by Dave, in SphinxTrain)
New Features (Others) • New executables • lm_convert • lm3g2dmp++ • dp • If Evandro ask, “Why do we need dp in sphinx 3?” • Say this, “I don’t know, we found the executable at ./s3/src/misc/dp.c” • conf • Off-line word-level confidence annotation program • Mismatch dict-LM • Un-match entries could be automatically generated (-lts_mismatch)
Gentle, mild and simple re-factoring (GMM computation) • GMM computation is now shared among • decode, decode_anytopo, align, allphone • So e.g. • decode_anytopo could use fast GMM computation • decode could use SCHMM
Gentle, mild and simple re-factoring (Search) • Its consequence in search programming: • FST, Flat, Tree search now share the same interface (decode) • Just like Sphinx 2 and 4 • Writing a new search won’t be replacing a search • 2-nd stage now works for decode • Alright, not for FST search
Gentle, mild and simple re-factoring (Others) • Scores output now rationalized • Several bug fixes causing seg faults are eliminated • Vithist.c bugs • Class-based LM is now working correctly • Command-line among applications are now synchronized and re-factored
Documentation/Tutorial • Hieroglyph • Now writing 2nd draft • Doxygen documentation • (by Evandro) Tutorial now works • archive_s3 • Sphinx 2 • Sphinx 3 • Sphinx 4
Regression Testing • Our weakest link • Now daily • Standard regression test is done • Performance check on Communicator/TIDIGITs/TI46 • doxygen documentation will be made and tested • make check now has 50 tests (3.5: 11) • fairly robust to careless mistakes
Expected Trimmed Features • Search • Mode 0: alignment • (?) Mode 1: allphone • Mode 5: word tree copies • If full triphone in Ravi’s tree search couldn’t be quickly, trimmed it as well • (?) Yitao’s PCFG rescoring
Conclusion of Sphinx 3.X (From X=5 to X=6) • We have done something • Development last year • has enriched the code • Niceify a lot of things internal to code • There are hiccups in our development • Not perfect • Well, compare this with NASDAQ.
Discussion:What should we do now? • Option 1, keep on working without release • Option 2, merge the crazy branch with the trunk without release • Option 3, merge the crazy branch with the trunk and create release-candidate Sphinx 3.6 RCI