progress presentation of sphinx 3 6 2005 q2
Download
Skip this Video
Download Presentation
Progress Presentation of Sphinx 3.6 (2005 Q2)

Loading in 2 Seconds...

play fullscreen
1 / 44

Progress Presentation of Sphinx 3.6 (2005 Q2) - PowerPoint PPT Presentation


  • 123 Views
  • Uploaded on

Progress Presentation of Sphinx 3.6 (2005 Q2). Arthur Chan Carnegie Mellon University Jun 7, 2005. This talk. Purpose of this talk A working progress report on various aspects of the development A briefing on s3.generic. Codebase only exists in my hard disc since Mar 28 2005

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Progress Presentation of Sphinx 3.6 (2005 Q2)' - andres


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
progress presentation of sphinx 3 6 2005 q2

Progress Presentation of Sphinx 3.6 (2005 Q2)

Arthur Chan

Carnegie Mellon University

Jun 7, 2005

this talk
This talk
  • Purpose of this talk
    • A working progress report on various aspects of the development
    • A briefing on s3.generic.
      • Codebase only exists in my hard disc since Mar 28 2005
      • Include a bunch of gentle changes but it’s still significantly different from current s3.5
      • Development is regarded as incomplete
    • Allows developers to have mutual understanding on the code and its potential effects in future development
outline of this talk 26 pages
Outline of this talk (26 pages)
  • Review of changes of Sphinx 3.5 from Jan to April 1st
    • Mainly on GMM Computation (2 pages)
  • S3.generic (22 pages)
    • High Priority Items
      • New search architecture (7 pages)
      • Development of the new search using word-conditioned tree copies (7 pages)
      • Manipulation of LMs (1 page)
    • Other Items
      • Gentle re-factoring and minor changes (5 pages)
      • Progress on documentation (2 pages)
  • Discussion (2 pages)
    • On future plan of Sphinx 3 and SphinxTrain (1 page)
review of gmm computation
Review of GMM Computation
  • Completed in Q1 2005 in conjunction with the ICSI speed up setup development
  • Include
    • Absolute discounting of CIGMMs
    • Usage of best Gaussian index (BGI)
    • Usage of adaptive CIGMMS (ACIGMMS)
  • Details
    • www-2.cs.cmu.edu/~archan/presentation/SphinxLunch20050310.ppt (Sphinx Lunch Presentation)
    • “On Improvements of CI-based GMM Selection” Eurospeech 2005
  • Already exists in the repository
    • tag SPHINX3_5_1_RCI_IRII
last impression on gmm computation
Last impression on GMM Computation
  • Internal comments on GMM computation was mixed
  • Speed gain starts to reach a limit (30% relative instead of 80% relative)
  • Speed gain also starts to be not the focus, accuracy becomes more important concern
  • Some Other Signs:
    • AlexR’s facial impressions:
      •  (When talking about GMM computation)
      • (When talking about future development of search)
    • Jack:
      • “zzzzzzzzz” (Literally fell asleep, not his default behavior)
progress of gmm computation
Progress of GMM Computation
  • Still under worked secretly
  • Detail disclosed later
development of new search
Development of new search
  • Why a new search in Sphinx 3?
  • search in S3.X (X<6) (The Ravi’s Method)
    • An unconventional way to take care of segmentation problem of using tree lexicon.
    • Gives nice memory/speed/accuracy trade-off when it was first written
  • Downside
    • Not an exact bi-gram search
    • Techniques in literature couldn’t be easily applied.
      • We will be able to apply 5-10 existing or new techniques if the conventional way is used.
design of the new search architecture
Design of the new search architecture
  • Motivation
    • The risk of replacing the old search is high
    • The old search is an interesting one. It is a waste if we just replace it.
  • Re-factoring was first done to allow Ravi’s method and new search co-exist
  • Implemented by so called “C classes”
    • Struct with both internal variables and methods.
    • A function pointer implementation
    • Using similar concepts as implementation in feat.c
    • Similar to how C++ handle class internally.
separation of mechanism and implementation
Separation of Mechanism and Implementation

-Provide Atomic Search Operations (ASOs) in the form of function pointers

-Only implement one mechanism

-ASOs could be configured by just setting the value of function pointers

- A single interface for applications

Search Mechanism

Module (srch.c)

Search Implementation

Module (srch.c)

Search Implementation

Module (srch.c)

-Could have multiple of them

-Responsible for the details such as handling of the graph and know sources

-Possibilities:

A, Decoding with different implementations

B, Operations that has the concept of search including alignment, phoneme recognition or keyword spotting.

Search Implementation

Module (srch.c)

Search Implementation

Module (srch.c)

Search Implementation

Modules

(srch_????.c)

advantages
Advantages
  • A cheap way of polymorphism
  • When the flow of the search need to change
    • E.g. batch mode or live mode
    • Only search mechanism module need to be implemented
  • When detail of search need to change
    • One have options to choose to rewrite the whole search or just part of the implementations
    • No need for complete replacement
what does the search mechanism module actually do a flow chart
What does the search mechanism module actually do? -A flow chart

scores

Senone

Computation

Search

Simplified Version

(Information

For Pruning

GMM)

Select

Active

CD

Senone

1st Approximation

Compute

Detail

GMM

Score

(CD senone)

Compute

Detail

HMM

Score

(CD)

Propagate

Graph

(Phone-

Level)

Rescoring

At word

End using

High-Level

KS

(e.g. LM)

Propagate

Graph

(Word-

Level)

Compute

Approx.

GMM

Score

(CI senone)

different search implementations
Different Search Implementations
  • 3 modes is currently implemented
    • Mode 4
      • Ravi’s Search for 3.X (X<6) (Completion: 100%)
    • Mode 5
      • Word-conditioned tree copy search (Completion: 10%)
    • Mode 1369
      • Debug mode of the search mechanism module.
      • No decoding will be done, only text output to indicate the flow of the search
  • Reserved Modes (Not implemented yet)
    • Mode 0 - Force alignment
    • Mode 1 - Phoneme recognition
    • Mode 2 - Graph Search with FSM
    • Mode 3 - Flat Lexicon Search
architecture diagram
Architecture Diagram

decode

livepretend

livedecode

Batch-mode Decoder

Live-mode Decoder

Search Mechanism

Implementation of

Ravi’s Search

(Mode 4)

Implementation of

3.6 Search

(Mode 5)

Implementation of

Search Debugging

(Mode 1369)

GMM

LM

Trees

Fast GMM

struct

Dict

Beam

Struct

search anatomy in debug mode
Search anatomy in debug mode
  • SEARCH DEBUG: MODE UTT BEGIN
  • SEARCH DEBUG: APPROXIMATE COMPUTATION AT TIME 0
  • SEARCH DEBUG: SELECT ACTIVE GMM
  • SEARCH DEBUG: DETAIL COMPUTATION AT TIME 0
  • SEARCH DEBUG: COMPUTE HEURISTIC
  • SEARCH DEBUG: HMM COMPUTE LV 2
  • SEARCH DEBUG: HMM PROPAGATE GRAPH (PHONEME) LV 2
  • SEARCH DEBUG: RESCORING AT LV2
  • SEARCH DEBUG: HMM PROPAGATE GRAPH (WORD) LV 2
  • SEARCH DEBUG: SHIFT ONE CACHE FRAME
  • SEARCH DEBUG: APPROXIMATE COMPUTATION AT TIME 1
  • SEARCH DEBUG: FRAME WINDUP
  • SEARCH DEBUG: SELECT ACTIVE GMM
  • SEARCH DEBUG: DETAIL COMPUTATION AT TIME 1
  • SEARCH DEBUG: COMPUTE HEURISTIC
  • SEARCH DEBUG: HMM COMPUTE LV 2
  • SEARCH DEBUG: HMM PROPAGATE GRAPH (PHONEME) LV 2
  • SEARCH DEBUG: RESCORING AT LV2
  • SEARCH DEBUG: HMM PROPAGATE GRAPH (WORD) LV 2
  • SEARCH DEBUG: SHIFT ONE CACHE FRAME
  • SEARCH DEBUG: APPROXIMATE COMPUTATION AT TIME 2
discussion
Discussion
  • Why not using graph as the parent of the data structure?
    • Say inherit a tree or a bi-tree from a graph?
    • This sounds like a way that could unify different methods.
discussion cont
Discussion (cont.)
  • My answer
    • Because of legacy,
      • most recognizers actually use many special methods to optimize speed of search of different optimizations
      • Generic graph search may not able to represent these methods sufficiently
      • That’s why a lot of graph approach turns out to be slower than its tree equivalent
    • Could require a lot of effort
      • To make a generic graph search to be as fast as the legacy system.
flat lexicon and tree lexicon unigram search

ph2

P(w1)

ph1

ph3

P(w2)

Flat Lexicon and Tree lexicon-Unigram Search

P(w1)

Word 1

P(w2)

Word 2

-Tree lexicon with single tree copy will produce the same result

as Flat lexicon

-Only difference:

In flat lexicon: uw could be applied at both word begin and word end

In tree lexicon: uw could be applied only at the word end

flat lexicon and tree lexicon bigram search
Flat Lexicon and Tree lexicon-Bigram Search

P(w1|w1)

Word 1

Word 1

ph2

P(w1|w1)

P(w2|w1)

ph1

P(w1|w2)

Word 2

Word2

ph3

P(w1|w2)

P(w2|w2)

-The two searches are unequal because the tree search doesn’t consider

the possibilities of P(w2|w1) or P(w2|w2)

-If max was taken at the word end, then the Word Segmentation Error

will occur. (Another term : Delayed Bigram)

flat lexicon and tree lexicon bigram search cont

ph2

ph2

ph2

ph1

ph1

ph1

ph3

ph3

ph3

Flat Lexicon and Tree lexicon-Bigram Search (cont.)

P(w1|w1)

P(w1|w1)

Word 1

Word 1

P(w1)

P(w1)

P(w2|w1)

P(w2|w1)

P(w1|w2)

Word 2

Word2

P(w2)

P(w1|w2)

P(w2)

P(w2|w2)

-Need to Maintaining copies of tree representing state which word 1 and word 2 were entered

P(w2|w2)

flat lexicon and tree lexicon bigram search cont1
Flat Lexicon and Tree lexicon-Bigram Search (cont.)
  • Intriguing Economics of Tree Lexicon
    • From Flat lexicon to Tree lexicon give
      • 3-4 time reduction of state space
    • Expansion of Tree copies require N times state space where N is # of words (e.g. N=100 to 65k)
  • So, why it became a text-book answer?
    • When search space is dynamically expanded with pruning, it will be significantly smaller. (From Lit., Usually only 10-50 times)
    • Multiple techniques can reduce this number further.
      • Usage of back-off nodes
      • Usage of tail-sharing
      • Usage of sub-tree dominance
      • No need to expand the whole tree
important note how did ravi solve it then
Important Note: How did Ravi solve it then?
  • This is the blackmagic of Ravi ……
  • Magic 1: Instead of using word tree copies
    • Transitions into lextrees staggered across time:
      • Multiple tree are allocated
      • At alternate time, alternate lextree is entered.
      • Later “-epl” (entries per lextree) parameter was introduced, that will make block of frames one lextree entered, before switching to next
      • More word segmentations (start times) survive
  • Magic 2: Full LM rescoring at the leaf node
    • The backtrack pointer table could provide the complete history.
    • Full LM will be used to rescore the history
  • Magic 3: Composite triphones
    • Detail omitted.
current status of the development of mode 5 in 3 6
Current Status of the Development of mode 5 in 3.6
  • It is still incomplete.
  • Though check-in is necessary to avoid too separate branches
  • Prototype 1, DP is completed.
    • But it used a lot of memory (50x tree copies)
    • tested in a very simple case.
    • No tree deletion.
    • No control when number of tree exceed max. (Just reallocate)
  • Still keep the full LM rescoring feature in Ravi’s search. (It will be useful someday.  )
  • Expect to have ~10 prototypes before actual shipping.
relationship between mode 4 and 5
Relationship between Mode 4 and 5
  • They share the code of GMM computation
    • So speed-up techniques in 3.X(X=4 to X=6) could be applied to mode 5 as well
  • Mode 4 and Mode 5 still use the same lexical tree data structure
    • Major difference
      • when entering to new trees, handling are different.
      • Mode 4 enter a tree by looking at the time index.
      • Mode 5 enter a tree depends on the word copy.
discussion1
Discussion
  • There are a lot of potential in the work of search:
    • Could we combine search philosophies of mode 4 and mode 5?
    • How could we reduce the memory size used in mode 5?
    • Tree copies for bigram and beyond?
  • Expect a lot of fun in next 3 months.
lm manipulation
LM Manipulation
  • CALO and LISTEN shows that
    • Dynamic addition and deletion of LM is very important.
  • New feature is implemented (not tested thoroughly) for
    • Refactoring the LM code such that an array of LM (lmset_t) always assume to exist.
    • Reading LM in text format.
    • In mode 4, deletion and addition of LMs
  • Expected problem in future
    • Changes in high level knowledge source such as LM will also change the search graph.
    • This makes handling quite tricky.
other re factoring that affects us
Other re-factoring that affects us
  • Did it because
    • Push from projects
    • Push from implementation of mode 5
  • Important ones
    • 1, kb and kbcore
    • 2, Physical file structure of libs3decoder
    • 3, refactoring across dag/astar/decode_anytopo
    • 4, synchronization of command line
kb and kbcore
kb and kbcore
  • Changed motivated by the new search changes.
    • Kb and kbcore take care of mode initialization
    • srch will point resource to the kb.
    • Initialization of graph structures are now responsibility of search implementation modules.
  • Implemented and tested
    • Consistent style of modules reporting
    • Add arguments for reporting in every modules
physical file structure of libs3decoder
Physical file structure of libs3decoder
  • libs3decoder starts to be overcrowded
  • Now divided to eight libraries: (Tested)
    • libs3decoder/libam (gmm, hmm, optimized computation)
    • libs3decoder/libcep_feat (feature, d-coeff, agc, cmn)
    • libs3decoder/libcommon (util, misc)
    • libs3decoder/libdict (dict, dict2pid, wid)
    • libs3decoder/liblm (lm, lmclass)
    • libs3decoder/libsearch(srch, srch_impl*)
    • libs3decoder/libep (endptr, classify)
    • libs3decoder/libAPI (ld_decode_API, utt)
  • Not very orthogonal yet
    • E.g. libam/liblm inter-depends
libs3decoder before after
libs3decoder Before/After

adaptor, Approx_cont_mgau, gs, hmm, interp, mdef, mllr, ms_gauden, ms_mllr, ms_senone, cb2mllr_io (not there yet)

Ascr, dag (new), flat_fwd, gmm_wrap (new), kb, kbcore, lextree, vithist

srch (new)

srch_debug (new)

srch_time_switch_tree (Mode 4)

srch_word_switch_tree (Mode 5)

agc, approx_cont_mgau, ascr, bio, cb2lmllr_io, classify, cmn, cmn_prior, cont_mgau, corpus, dict2pid, dict, endptr, fast_algo_struct, feat, fe, fe_interface, fe_sigproc, fillpen, flat_fwd, gs, hmm, interp, kb, kbcore, lextree, live_decode_API, live_decode_args, lm, lmclass, logs3, mdef, misc, mllr, ms_gauden, ms_mllr, ms_senone, subvq, tmat, utt, vector, vithist, wid

am

search

agc, cmn, cmn_prior, feat, fe, fe_interface, fe_sigproc

lm, lmclass, fillpen

cep_feat

lm

classify, endptr

3.5

dict, dict2pid, wid

ep

dict

bio, corpus, logs3, misc, stat stat (new), vector

utt, live_decode_api, live_decode_args

common

API

refactoring across dag astar decode anytopo
Refactoring across dag/astar/decode_anytopo
  • The three has a lot in common
    • So some fats need to be cut.
    • A standalone library dag.c is created.
  • E.g.
    • Dag_link, dag_update_link is shared
    • Dag_search, dag_load is still not easy to share.
    • Dag and 2nd-stage search of decode_anytopo may still not be equivalent
    • Need more testing.
synchronization of command line arguments
Synchronization of command line arguments
  • Clean up has been done for
    • decode
    • align
    • allphone
    • dag
    • astar
    • decode_anytopo
  • Use
    • –wip for insertion penalty
    • -lw not -langw
    • -mean not –meanfn
  • This should be stable in 3.6
doxygen style documentation
Doxygen-style documentation
  • Fixing a lot of bugs in doxygen documents during the development
    • Close to completion
    • Instead of

int fun(int a, /** a is a variable */

int b); /** b is a variable */

It should be

int fun(int a, /**< a is a variable */

int b /**< b is a variable */

);

status of hieroglyphs draft 1
Status of Hieroglyphs Draft 1
  • It looks like a book now.
    • less crappy
    • the crappy parts are consistent
  • Another 3 chapters is completed
    • On software installation (Chapter 4)
    • On the front end of Sphinx (Chapter 6)
    • FAQs of using Sphinx (Appendix B)
  • The number of chapters is now increased by 2. (From 12 to 14, finished # from 6 to 9)
    • Still 5 chapters to go!
status of hieroglyphs draft 11
Status of Hieroglyphs Draft 1
  • Other chapters
    • Chapter I : License and use of Sphinx, SphinxTrain and CMU LM Toolkit (1st draft, 4th Rev)
    • Chapter II : Introduction to Sphinx, SphinxTrain and CMU LM Toolkit (1st draft, 2nd Rev)
    • Chapter IX : Search Structure and Speed-up of Sphinx\'s recognizers (1st draft, 2nd Rev)
    • Chapter X: Speaker adaptation using Sphinx (1st draft, 3rd Rev)
    • Chapter XI: Development using Sphinx (1st draft, 2nd Rev)
    • Appendix A.2: Full SphinxTrain Command Line Information (1st draft, 2nd Rev)
  • Writing Quality:
    • Still Low
    • Start to have logic and look like English
  • The 1st draft will be completed in the summer (hopefully)
final note on st and s3
Final note on ST and S3
  • Our plan for SphinxTrain and sphinx3
    • Separation to libraries/applications is our main goal
    • Before that merging ST to S3 will be a good step
    • libs3decoder’s refactoring will be a good step for merging.
    • Do it slowly:
      • Arthur Chan is disallowed to check-in more than 4 executables a month to sphinx 3
      • This should allow us to balance short-term and long-term goal.
sphinx development in general
Sphinx development in general
  • Motivated by CALO
  • 4 important aspects
    • Adaptation
    • Search
    • Intelligent system combination and hypothesis rescoring.
    • Discriminating training.
conclusion
Conclusion
  • In first half of 2005
    • Interesting research
      • GMM Computation
      • Search
      • Speaker Adaptation
    • Improvement in infrastructure
      • Start to make innovation appropiate.
      • With ST/S3 in next 1 year, it will look even better
ad