3 rd progress meeting for sphinx 3 6 development
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

3 rd Progress Meeting For Sphinx 3.6 Development PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on
  • Presentation posted in: General

3 rd Progress Meeting For Sphinx 3.6 Development. Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006. This meeting . 3 rd Progress report on 3.6 development (40 pages) Agenda What happened in Fall 2005? (4 slides)

Download Presentation

3 rd Progress Meeting For Sphinx 3.6 Development

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


3 rd progress meeting for sphinx 3 6 development

3rd Progress Meeting For Sphinx 3.6 Development

Arthur Chan,

David Huggins-Daines,

Yitao Sun

Carnegie Mellon University

Jan 25, 2006


This meeting

This meeting

  • 3rd Progress report on 3.6 development (40 pages)

  • Agenda

    • What happened in Fall 2005? (4 slides)

    • Progress of Sphinx Development in Fall 2005 (17 slides)

    • Summary of Progress in 2005 (10 slides)

    • Discussion: Should we create one release candidate? (1 slide)


What happened in fall 2005

What happened in FALL 2005?


What happened in fall 20051

What happened in Fall 2005?

  • Major Events in Sphinx Development

    • We participate GALE in Oct 2006

      • Conformance of the recognizers (sphinx 3 and sphinx 4) become an issue

      • Lack of advanced acoustic modeling techniques become very glaring

      • Sphinx 3 and 4 have gone through bug fixes.

    • CALO effort are now split to two

      • Off-line recognizer: require major improvement in LM and AM.

        • AM Issue is shared with GALE

      • On-line recognizer (CALO jargon: Smartnote)

        • Now have new LM and AM

        • Require significant development work


Time distribution estimated

Time distribution (Estimated)

  • Arthur

    • 50% on GALE, 20% on CALO, 30% on Sphinx

  • Dave

    • 65% CALO, 30% on PocketSphinx, 5% on Sphinx

  • Yitao

    • 90% CALO, 10% on Sphinx


The two funded projects

The Two Funded Projects

  • Upside:

    • They point to issues that need to be solved

      • Need significant reprioritization of tasks

    • Balance of effort on the 2 projects is now achieved

  • Downside:

    • Code development of Sphinx becomes a slower process

      • Also, we haven’t released s3 for a while

      • => Should we release the code now?

    • Tired students and staffs can be found everywhere


Progress of sphinx 3 6 in fall 2005

Progress of Sphinx 3.6 in FALL 2005


Overview

Overview

  • Work on second-stage

    • Merging of bestpath search in the 2-nd stage of tree search

    • IBM lattice generation

    • word confidence estimation

  • Behavior changes and bug fixes

    • Treatment of acoustic scores

    • Assertion in vithist.c

  • Attempts in search algorithm improvements

    • Mode 3 – Flat lexicon decoding

    • Mode 4 – Tree lexicon decoding

  • Sphinx on Mandarin and coded language.

  • New tools: conf, dp


Work schedule

Work Schedule

  • Sep 1 to Oct 1:

    • Implementation of triphones in flat lexicon decoder

  • Oct 1 to Nov 1:

    • Implementation of triphones on tree lexicon decoder (incomplete)

  • Nov 1 to Dec 8:

    • IBM lattice generation

    • Confidence score generation

    • Fixed issues in scores

  • Dec 8 to Jan 3: Concept of “vacation” was tried

  • Jan 3 to now:

    • Fixed bugs, prepare release.


Second stage processing

Second-stage Processing

  • Best-path search could now be specified in decode

    • Implementation requires write back. (urgh.)

  • Recognizer can now generate lattice in IBM format

    • Word is attached at the link

    • Sphinx format generates word attached to the node.

    • Scores are normalized with best senone scores

  • Rong’s confidence-based routine is now in Sphinx

    • conf

    • Goodies: use Sphinx logs3 routine -> significantly reduce alpha-beta scores mismatch.


Second stage processing cont

Second-stage Processing (cont.)

  • Further work

    • Best-path generation doesn’t conform to past 3.5

      • -> Bugs caused by 3.6 development

    • Also, the best path is not always in the lattice

      • -> Legacy bug

    • Confidence-based method

      • Lattice-based : could only be used off-line currently

      • 10% of the data still have alpha-beta mismatch

    • Consensus network generation need special focus


Scores we see change 1

Scores we see (Change 1)

  • Tree search now truly generate un-normalized scores.

    • was normalized by the ending frame only

    • Caused by bug introduced in mid-2005

  • All 1-st stage search use the same score logging functions

    • Include align, allphone, decode_anytopo, decode

    • matchseg_write, match_write are the current versions

    • log_* is still used but will soon be totally replaced


Scores we see change 2

Scores we see(Change 2)

  • Multi-stream GMM computation (ms_gauden)

    • By default, it won’t quantize log pdf to 8 bits now

  • Single-stream GMM computation

    • Vectors with zero means and variances are removed (-remove_zero_var_gau)

  • Scores and performance will change

    • Testing resource has changed.

    • (Evandro grins at this point)


Scores we see change 3

Scores we see (Change 3)

  • Sphinx now supports generation of different hypseg format (-hypseg_fmt)

    • SPHINX 2-format

    • SPHINX 3-format

    • ctm format

      • Always require more processing, but it is better than nothing.


Scores a summary

Scores – a summary

  • Unnormalized (true) acoustic and language scores generated by (-hypsegscore_unscale)

    • 1-st stage search and

    • Best path search right after the 1-st stage

  • Normalized acoustic score would be generated by

    • Lattice generation

  • If developers wants to have true scores in lattice

    • Developers could get the best scores from the decoder (–bestsenscrdir) and do their own processing


Other important bug fixes

Other important bug fixes

  • Bug in vithist.c

    • Caused assertion and stop the recognizer

    • Now fix and will return error message to the search abstraction routine.


Attempts in search algorithm improvements mode 3

Attempts in search algorithm improvements (Mode 3)

  • Flat-lexicon decoder

    • Search implementation is completed

    • decode could now use flat-lexicon decoding

      • -op_mode 3

  • Decoders revamping is completed

    • Mode 2 (FST)

    • Mode 3 (Flat-lexicon)

    • Mode 4 (Ravi’s Tree-Lexicon)

    • Mode 5 (Arthur’s Tree-Lexicon)

  • decode_anytopo is still there for backward compatibility purpose

    • decode_anytopo = decode in mode 3


No further re factoring

No Further Re-factoring

  • Avoid re-factoring before next check-in

  • Align and allphone have different input/output file formats

    • It doesn’t make sense to stuff into a single executable.

    • Using XML configuration and control file will be a choice

      • But it takes too much time to implement


Algorithmic work flat lexicon decoder

Algorithmic Work -Flat Lexicon Decoder

  • Full triphone completed in flat-lexicon decoding

    • 2.5% relative improvement in accuracy

    • But requires 100xRT (urgh)

    • Useful for debugging

  • Also considered full trigram implementation

    • Will results in another 5-10 times slow down

  • Conclusion

    • Flat lexicon search has come to its limit


Algorithmic work tree lexicon decoder

Algorithmic Work -Tree Lexicon Decoder

  • Current full triphone implementation

    • Has flaws in score propagation

  • Tree copies

    •  No time to do it at all, Q4’s workload nearly kill AC

  • Benchmarking results

    • GALE results:

      • Full Lexicon = Tree Lexicon

    • CALO/Communicator results:

      • Tree Lexicon 5% relative poorer.

  • Conclusion

    • Half a year on search is expected to give us another 5%


Conclusion on search

Conclusion on Search

  • Need to seriously consider

    • Is working on search a good idea?

  • In both CALO/GALE, gain come from

    • SAT and cross adaptation

    • Second-stage processing

      • Confusion network

      • Confidence annotation

      • First-stage SD -> Second-stage SA

    • VTLN

      • also only give 5% rel

      • but it only takes 5 days to implement


Sphinx on different text encodings

Sphinx on Different Text Encodings

  • There are already non-CMU work for

    • Spanish

    • French

  • Big question mark

    • Could it work on other encoding?


Sphinx on mandarin gb2312

Sphinx on Mandarin (gb2312)


Sphinx on mandarin cont

Sphinx on Mandarin (cont.)

  • Thanks to Ravi

  • Bugs we fixed to get it through

    • 1236322: libutil\str2words special character bug

    • 1236166: special character wasn't supported

  • This should give us fairly good foundation to start on most language


Summary of sphinx in fall 2005

Summary of Sphinx in Fall 2005

  • We have done something

  • Strong focus in search research doesn’t seem to get us far.

  • Fire to fight on the modeling side

  • Sounds like the time to check in and move on


Progress of sphinx 3 x from x 5 to x 6

Progress of Sphinx 3.X (From X=5 to X=6)


Progress of sphinx 3 x from x 5 to x 61

Progress of Sphinx 3.X(From X=5 to X=6)

  • New Features (4 slides)

    • Items that are significant

  • Gentle, mild and simple re-factoring and its consequence (4 slides)

  • Documentation (1 slide)

  • Regression testing (1 slide)

  • Pruned Features ?


New features search

New Features (Search)

  • Speed

    • Further enhancement of CIGMMS

    • BBI tree implementation (by Dave, in SphinxTrain)

  • Search

    • FST search

    • Full triphone implementation in decode_anytopo

    • Separation of search abstraction/implementation in 3.X


New features adaptation

New Features (Adaptation)

  • Adaptation

    • Multiple classes for MLLR (by Dave)

    • MAP adaptation (by Dave, in SphinxTrain)


New features others

New Features (Others)

  • New executables

    • lm_convert

      • lm3g2dmp++

    • dp

      • If Evandro ask, “Why do we need dp in sphinx 3?”

      • Say this, “I don’t know, we found the executable at ./s3/src/misc/dp.c”

    • conf

      • Off-line word-level confidence annotation program

  • Mismatch dict-LM

    • Un-match entries could be automatically generated (-lts_mismatch)


Gentle mild and simple re factoring gmm computation

Gentle, mild and simple re-factoring (GMM computation)

  • GMM computation is now shared among

    • decode, decode_anytopo, align, allphone

  • So e.g.

    • decode_anytopo could use fast GMM computation

    • decode could use SCHMM


Gentle mild and simple re factoring search

Gentle, mild and simple re-factoring (Search)

  • Its consequence in search programming:

    • FST, Flat, Tree search now share the same interface (decode)

      • Just like Sphinx 2 and 4

    • Writing a new search won’t be replacing a search

    • 2-nd stage now works for decode

      • Alright, not for FST search


Gentle mild and simple re factoring others

Gentle, mild and simple re-factoring (Others)

  • Scores output now rationalized

  • Several bug fixes causing seg faults are eliminated

    • Vithist.c bugs

    • Class-based LM is now working correctly

  • Command-line among applications are now synchronized and re-factored


Documentation tutorial

Documentation/Tutorial

  • Hieroglyph

    • Now writing 2nd draft

  • Doxygen documentation

  • (by Evandro) Tutorial now works

    • archive_s3

    • Sphinx 2

    • Sphinx 3

    • Sphinx 4


Regression testing

Regression Testing

  • Our weakest link

  • Now daily

    • Standard regression test is done

      • Performance check on Communicator/TIDIGITs/TI46

      • doxygen documentation will be made and tested

  • make check now has 50 tests (3.5: 11)

    • fairly robust to careless mistakes


Expected trimmed features

Expected Trimmed Features

  • Search

    • Mode 0: alignment

    • (?) Mode 1: allphone

    • Mode 5: word tree copies

  • If full triphone in Ravi’s tree search couldn’t be quickly, trimmed it as well

  • (?) Yitao’s PCFG rescoring


Conclusion of sphinx 3 x from x 5 to x 6

Conclusion of Sphinx 3.X (From X=5 to X=6)

  • We have done something

  • Development last year

    • has enriched the code

    • Niceify a lot of things internal to code

  • There are hiccups in our development

    • Not perfect

    • Well, compare this with NASDAQ.


Discussion what should we do now

Discussion:What should we do now?

  • Option 1, keep on working without release

  • Option 2, merge the crazy branch with the trunk without release

  • Option 3, merge the crazy branch with the trunk and create release-candidate Sphinx 3.6 RCI


3 rd progress meeting for sphinx 3 6 development

End


  • Login