3 rd progress meeting for sphinx 3 6 development n.
Download
Skip this Video
Download Presentation
3 rd Progress Meeting For Sphinx 3.6 Development

Loading in 2 Seconds...

play fullscreen
1 / 39

3 rd Progress Meeting For Sphinx 3.6 Development - PowerPoint PPT Presentation


  • 150 Views
  • Uploaded on

3 rd Progress Meeting For Sphinx 3.6 Development. Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006. This meeting . 3 rd Progress report on 3.6 development (40 pages) Agenda What happened in Fall 2005? (4 slides)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '3 rd Progress Meeting For Sphinx 3.6 Development' - obelia


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
3 rd progress meeting for sphinx 3 6 development

3rd Progress Meeting For Sphinx 3.6 Development

Arthur Chan,

David Huggins-Daines,

Yitao Sun

Carnegie Mellon University

Jan 25, 2006

this meeting
This meeting
  • 3rd Progress report on 3.6 development (40 pages)
  • Agenda
    • What happened in Fall 2005? (4 slides)
    • Progress of Sphinx Development in Fall 2005 (17 slides)
    • Summary of Progress in 2005 (10 slides)
    • Discussion: Should we create one release candidate? (1 slide)
what happened in fall 20051
What happened in Fall 2005?
  • Major Events in Sphinx Development
    • We participate GALE in Oct 2006
      • Conformance of the recognizers (sphinx 3 and sphinx 4) become an issue
      • Lack of advanced acoustic modeling techniques become very glaring
      • Sphinx 3 and 4 have gone through bug fixes.
    • CALO effort are now split to two
      • Off-line recognizer: require major improvement in LM and AM.
        • AM Issue is shared with GALE
      • On-line recognizer (CALO jargon: Smartnote)
        • Now have new LM and AM
        • Require significant development work
time distribution estimated
Time distribution (Estimated)
  • Arthur
    • 50% on GALE, 20% on CALO, 30% on Sphinx
  • Dave
    • 65% CALO, 30% on PocketSphinx, 5% on Sphinx
  • Yitao
    • 90% CALO, 10% on Sphinx
the two funded projects
The Two Funded Projects
  • Upside:
    • They point to issues that need to be solved
      • Need significant reprioritization of tasks
    • Balance of effort on the 2 projects is now achieved
  • Downside:
    • Code development of Sphinx becomes a slower process
      • Also, we haven’t released s3 for a while
      • => Should we release the code now?
    • Tired students and staffs can be found everywhere
overview
Overview
  • Work on second-stage
    • Merging of bestpath search in the 2-nd stage of tree search
    • IBM lattice generation
    • word confidence estimation
  • Behavior changes and bug fixes
    • Treatment of acoustic scores
    • Assertion in vithist.c
  • Attempts in search algorithm improvements
    • Mode 3 – Flat lexicon decoding
    • Mode 4 – Tree lexicon decoding
  • Sphinx on Mandarin and coded language.
  • New tools: conf, dp
work schedule
Work Schedule
  • Sep 1 to Oct 1:
    • Implementation of triphones in flat lexicon decoder
  • Oct 1 to Nov 1:
    • Implementation of triphones on tree lexicon decoder (incomplete)
  • Nov 1 to Dec 8:
    • IBM lattice generation
    • Confidence score generation
    • Fixed issues in scores
  • Dec 8 to Jan 3: Concept of “vacation” was tried
  • Jan 3 to now:
    • Fixed bugs, prepare release.
second stage processing
Second-stage Processing
  • Best-path search could now be specified in decode
    • Implementation requires write back. (urgh.)
  • Recognizer can now generate lattice in IBM format
    • Word is attached at the link
    • Sphinx format generates word attached to the node.
    • Scores are normalized with best senone scores
  • Rong’s confidence-based routine is now in Sphinx
    • conf
    • Goodies: use Sphinx logs3 routine -> significantly reduce alpha-beta scores mismatch.
second stage processing cont
Second-stage Processing (cont.)
  • Further work
    • Best-path generation doesn’t conform to past 3.5
      • -> Bugs caused by 3.6 development
    • Also, the best path is not always in the lattice
      • -> Legacy bug
    • Confidence-based method
      • Lattice-based : could only be used off-line currently
      • 10% of the data still have alpha-beta mismatch
    • Consensus network generation need special focus
scores we see change 1
Scores we see (Change 1)
  • Tree search now truly generate un-normalized scores.
    • was normalized by the ending frame only
    • Caused by bug introduced in mid-2005
  • All 1-st stage search use the same score logging functions
    • Include align, allphone, decode_anytopo, decode
    • matchseg_write, match_write are the current versions
    • log_* is still used but will soon be totally replaced
scores we see change 2
Scores we see(Change 2)
  • Multi-stream GMM computation (ms_gauden)
    • By default, it won’t quantize log pdf to 8 bits now
  • Single-stream GMM computation
    • Vectors with zero means and variances are removed (-remove_zero_var_gau)
  • Scores and performance will change
    • Testing resource has changed.
    • (Evandro grins at this point)
scores we see change 3
Scores we see (Change 3)
  • Sphinx now supports generation of different hypseg format (-hypseg_fmt)
    • SPHINX 2-format
    • SPHINX 3-format
    • ctm format
      • Always require more processing, but it is better than nothing.
scores a summary
Scores – a summary
  • Unnormalized (true) acoustic and language scores generated by (-hypsegscore_unscale)
    • 1-st stage search and
    • Best path search right after the 1-st stage
  • Normalized acoustic score would be generated by
    • Lattice generation
  • If developers wants to have true scores in lattice
    • Developers could get the best scores from the decoder (–bestsenscrdir) and do their own processing
other important bug fixes
Other important bug fixes
  • Bug in vithist.c
    • Caused assertion and stop the recognizer
    • Now fix and will return error message to the search abstraction routine.
attempts in search algorithm improvements mode 3
Attempts in search algorithm improvements (Mode 3)
  • Flat-lexicon decoder
    • Search implementation is completed
    • decode could now use flat-lexicon decoding
      • -op_mode 3
  • Decoders revamping is completed
    • Mode 2 (FST)
    • Mode 3 (Flat-lexicon)
    • Mode 4 (Ravi’s Tree-Lexicon)
    • Mode 5 (Arthur’s Tree-Lexicon)
  • decode_anytopo is still there for backward compatibility purpose
    • decode_anytopo = decode in mode 3
no further re factoring
No Further Re-factoring
  • Avoid re-factoring before next check-in
  • Align and allphone have different input/output file formats
    • It doesn’t make sense to stuff into a single executable.
    • Using XML configuration and control file will be a choice
      • But it takes too much time to implement
algorithmic work flat lexicon decoder
Algorithmic Work -Flat Lexicon Decoder
  • Full triphone completed in flat-lexicon decoding
    • 2.5% relative improvement in accuracy
    • But requires 100xRT (urgh)
    • Useful for debugging
  • Also considered full trigram implementation
    • Will results in another 5-10 times slow down
  • Conclusion
    • Flat lexicon search has come to its limit
algorithmic work tree lexicon decoder
Algorithmic Work -Tree Lexicon Decoder
  • Current full triphone implementation
    • Has flaws in score propagation
  • Tree copies
    •  No time to do it at all, Q4’s workload nearly kill AC
  • Benchmarking results
    • GALE results:
      • Full Lexicon = Tree Lexicon
    • CALO/Communicator results:
      • Tree Lexicon 5% relative poorer.
  • Conclusion
    • Half a year on search is expected to give us another 5%
conclusion on search
Conclusion on Search
  • Need to seriously consider
    • Is working on search a good idea?
  • In both CALO/GALE, gain come from
    • SAT and cross adaptation
    • Second-stage processing
      • Confusion network
      • Confidence annotation
      • First-stage SD -> Second-stage SA
    • VTLN
      • also only give 5% rel
      • but it only takes 5 days to implement
sphinx on different text encodings
Sphinx on Different Text Encodings
  • There are already non-CMU work for
    • Spanish
    • French
  • Big question mark
    • Could it work on other encoding?
sphinx on mandarin cont
Sphinx on Mandarin (cont.)
  • Thanks to Ravi
  • Bugs we fixed to get it through
    • 1236322: libutil\str2words special character bug
    • 1236166: special character wasn't supported
  • This should give us fairly good foundation to start on most language
summary of sphinx in fall 2005
Summary of Sphinx in Fall 2005
  • We have done something
  • Strong focus in search research doesn’t seem to get us far.
  • Fire to fight on the modeling side
  • Sounds like the time to check in and move on
progress of sphinx 3 x from x 5 to x 61
Progress of Sphinx 3.X(From X=5 to X=6)
  • New Features (4 slides)
    • Items that are significant
  • Gentle, mild and simple re-factoring and its consequence (4 slides)
  • Documentation (1 slide)
  • Regression testing (1 slide)
  • Pruned Features ?
new features search
New Features (Search)
  • Speed
    • Further enhancement of CIGMMS
    • BBI tree implementation (by Dave, in SphinxTrain)
  • Search
    • FST search
    • Full triphone implementation in decode_anytopo
    • Separation of search abstraction/implementation in 3.X
new features adaptation
New Features (Adaptation)
  • Adaptation
    • Multiple classes for MLLR (by Dave)
    • MAP adaptation (by Dave, in SphinxTrain)
new features others
New Features (Others)
  • New executables
    • lm_convert
      • lm3g2dmp++
    • dp
      • If Evandro ask, “Why do we need dp in sphinx 3?”
      • Say this, “I don’t know, we found the executable at ./s3/src/misc/dp.c”
    • conf
      • Off-line word-level confidence annotation program
  • Mismatch dict-LM
    • Un-match entries could be automatically generated (-lts_mismatch)
gentle mild and simple re factoring gmm computation
Gentle, mild and simple re-factoring (GMM computation)
  • GMM computation is now shared among
    • decode, decode_anytopo, align, allphone
  • So e.g.
    • decode_anytopo could use fast GMM computation
    • decode could use SCHMM
gentle mild and simple re factoring search
Gentle, mild and simple re-factoring (Search)
  • Its consequence in search programming:
    • FST, Flat, Tree search now share the same interface (decode)
      • Just like Sphinx 2 and 4
    • Writing a new search won’t be replacing a search
    • 2-nd stage now works for decode
      • Alright, not for FST search
gentle mild and simple re factoring others
Gentle, mild and simple re-factoring (Others)
  • Scores output now rationalized
  • Several bug fixes causing seg faults are eliminated
    • Vithist.c bugs
    • Class-based LM is now working correctly
  • Command-line among applications are now synchronized and re-factored
documentation tutorial
Documentation/Tutorial
  • Hieroglyph
    • Now writing 2nd draft
  • Doxygen documentation
  • (by Evandro) Tutorial now works
    • archive_s3
    • Sphinx 2
    • Sphinx 3
    • Sphinx 4
regression testing
Regression Testing
  • Our weakest link
  • Now daily
    • Standard regression test is done
      • Performance check on Communicator/TIDIGITs/TI46
      • doxygen documentation will be made and tested
  • make check now has 50 tests (3.5: 11)
    • fairly robust to careless mistakes
expected trimmed features
Expected Trimmed Features
  • Search
    • Mode 0: alignment
    • (?) Mode 1: allphone
    • Mode 5: word tree copies
  • If full triphone in Ravi’s tree search couldn’t be quickly, trimmed it as well
  • (?) Yitao’s PCFG rescoring
conclusion of sphinx 3 x from x 5 to x 6
Conclusion of Sphinx 3.X (From X=5 to X=6)
  • We have done something
  • Development last year
    • has enriched the code
    • Niceify a lot of things internal to code
  • There are hiccups in our development
    • Not perfect
    • Well, compare this with NASDAQ.
discussion what should we do now
Discussion:What should we do now?
  • Option 1, keep on working without release
  • Option 2, merge the crazy branch with the trunk without release
  • Option 3, merge the crazy branch with the trunk and create release-candidate Sphinx 3.6 RCI