2 nd progress meeting for sphinx 3 6 development l.
Download
Skip this Video
Download Presentation
2 nd Progress Meeting For Sphinx 3.6 Development

Loading in 2 Seconds...

play fullscreen
1 / 30

2 nd Progress Meeting For Sphinx 3.6 Development - PowerPoint PPT Presentation


  • 140 Views
  • Uploaded on

2 nd Progress Meeting For Sphinx 3.6 Development. Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jun 7, 2005. This meeting (2 nd Progress Meeting of 3.6). Purpose of this meeting A working progress report on various aspects of the development

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '2 nd Progress Meeting For Sphinx 3.6 Development' - ollie


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
2 nd progress meeting for sphinx 3 6 development

2nd Progress Meeting For Sphinx 3.6 Development

Arthur Chan,

David Huggins-Daines,

Yitao Sun

Carnegie Mellon University

Jun 7, 2005

this meeting 2 nd progress meeting of 3 6
This meeting (2nd Progress Meeting of 3.6)
  • Purpose of this meeting
    • A working progress report on various aspects of the development
    • A briefing on embedded sphinx2. (by David)
    • A briefing on sphinx3’s “crazy branch” (by Arthur)
      • As a branch in CVS
      • Include several interesting features
      • Include bunches of mild changes
    • Discussion before another check-in.
outline of this talk
Outline of this talk
  • Review of 1st Progress Meeting
  • Progress of Embedded version of Sphinx 2 (by Dave, 7-10 pages)
  • Progress of Sphinx 3’s crazy branches (15-20 pages)
    • Architecture Diagram of Sphinx 3.6
    • Changes in search abstraction (7 pages)
    • Progress on search implementation (8 pages)
      • GMM Computation
      • FSG mode, Word Switching Tree Search mode
    • Mild re-factoring (Not “gentle” any more) (3 pages)
      • LM
      • S3.0 family of tools
    • Hieroglyph (1 page)
review of 1 st progress meeting
Review of 1st Progress Meeting
  • Last time..
    • Two separate layers were defined
      • Low-Level Implementation of Search and
      • Possible abstractions of Search
      • Just introduced, its advantage was not yet revealed.
    • Implementation of Mode 5 was still under developed (only 10% Completion)
    • Just modularize libs3decoder to 8 sub-modules
motivation of architecting sphinx 3 x
Motivation of Architecting Sphinx 3.X
  • Need of new search algorithms
    • New search algorithm development could have risk.
    • We don’t want to throw away the old one.
    • Mere replacement could cause backward compatibility problem.
  • Code has grown to a stage where
    • Some changes could be very hard.
  • Multiple programmers become active at the same time
    • CVS conflict could become often if things are controlled by “if-else” structure
architecture of sphinx 3 x x 6
Architecture of Sphinx 3.X (X<6)
  • Batch sequential Architecture (Shaw 96)
  • Each executable would customize the sub-routines

decode

livepretend

Decode_anytopo

align

allphone

Initialization 1

(kb and kbcore)

Initialization 2

Initialization 3

Initialization 4

GMM Computation 1

approx_cont_mgau

GMM Computation 2

(Using gauden &

senone Method 1)

GMM Computation 3

(Using gauden &

senone Method 2)

GMM Computation 4

(Using gauden &

senone Method 3)

Search 1

Search 2

Search 3

Search 4

Process Controller 1

Process Controller 2

Process Controller 3

Process Controller 4

Command Line 1

Command Line 2

Command Line 3

Command Line 4

pros cons of batch sequential architecture
Pros/Cons of Batch Sequential Architecture
  • Pros:
    • Great flexibility for individual programmers
    • No assumption, data structure are usually optimized for the application.
      • Align and allphone have optimization.
    • Crafting in individual application has high quality
  • Cons:
    • Tremendous difficulty in maintenance
      • Most changes need to be carried out for 5-6 times.
    • Spread disease of code duplication
      • Code with functionality was duplicated multiple times
    • Scared a lot of programmers in the past
      • Beginners tend to love general architecture
big picture of software architecture in sphinx 3 6
Big Picture of Software Architecture in Sphinx 3.6
  • Layered and Object Oriented
    • Implemented in C
  • Major high level routines
    • Initializer (kb.c or kbcore.c)
      • A kind of clipboard for other controllers
    • Process controller (corpus.c)
      • Govern the protocol of processing a sentence
    • Search abstraction routine (srch.c)
      • Govern how search is done
      • Implemented as piplines and filters with shared memory
      • Each filter can be overridden, similar to what OO language do
    • Command line processor (cmd_ln_macro.c and cmd_ln.c) – implemented as macros.
software architecture diagram of sphinx 3 6
Software Architecture Diagram of Sphinx 3.6

User Defined

Applications

Fast Single Stream

GMM

Computation

livedecode

API

Dictionary

Library

livepretend

Search

Library

Multi Stream

GMM

Computation

Search

Controller

dag

LM

Library

decode

(anytopo)

Mode 0 : Align

Process

Controller

AM

Library

Mode 1 : Allphone

decode

Utility

Library

Mode 2 : FSG

Search

Initializer

allphone

Mode 3 : Anytopo

Feature

Library

align

Mode 4 :

Magic Wheel

Command

Line

Processor

Miscellaneous

Library

astar

Mode 5 : WSFT

Controllers/

Abstractions

Applications

Implementations

Libraries

search abstraction
Search Abstraction
  • Search abstraction is implemented as objects
  • Search operations are implemented as filters with shared memory
  • Each filter, a kind of unique operation for search
  • Ideally, each filter or a set of filter can be replaced.

Select

Active

CD

Senone

Compute

Detail

GMM

Score

(CD senone)

Compute

Detail

HMM

Score

(CD)

Propagate

Graph

(Phone-

Level)

Rescoring

At word

End using

High-Level

KS

(e.g. LM)

Propagate

Graph

(Word-

Level)

Compute

Approx.

GMM

Score

(CI senone)

Search For One Frame

different ways to implement search implementations
Different ways to implement Search implementations
  • 1, Use Default implementation
    • Just specify all atomic search operations (ASOs) provided
  • 2, Override “search_one_frame”
    • Only need to specify GMM computation and how to “search_one_frame”
  • 3, Override the whole mechanism
    • For people who dislike the default so much
    • Override how to “search”
concrete examples
Concrete Examples
  • Mode 4 (Magic Wheel) and Mode 5 (WST) are using the default implementation
  • Mode 2 (FSG)
    • override “search_one_frame” implementation
    • But share GMM implementation.
  • Likely, Mode 0 (align),1 (allphone) and 3 (flat lexicon decoding) will also do the same.
future work
Future work
  • Align, allphone and decode_anytopo’s re-factoring are not yet completed.
  • Search abstraction need to consider
    • More flexible mechanisms
      • Do the search backward. (for backward search)
      • Approximate search in the first stage (for phoneme and word look-ahead)
      • (Optional) Parallel and distributed decoding
  • Command-line and internal modules could still have mismatch
    • Might learn from mechanisms of Sphinx 2 and Sphinx 4
  • Controlling how an utterance could require 5 different files
    • A better control format?
  • Not yet fully anticipate fixed point front-end and GMM computation in Sphinx 2
gmm computation
GMM Computation
  • Decode can now use SCHMM
    • specify by .semi.
    • Implemented and tested by Dave
  • GMM Computation in align, allphone, decode, livepretend are now common
  • Not yet incorporate Sphinx 2 Fixed-point version of GMM computation
    • It looks very delicious.
finite state machine search mode 2 implementation
Finite State Machine Search (Mode 2) -Implementation
  • Largely Completed (Completion 70%)
  • Recipe:
    • Search function pointer implementation
      • adapted from Sphinx 2 FSG_* family of routines
    • GMM computation
      • Use Sphinx 3 GMM computation
      • Already allows CIGMMS
finite state machine search mode 2 problems for the users
Finite State Machine Search (Mode 2) –Problems for the Users
  • Not yet seriously tested
    • Finding test cases are hard
  • Still don’t have a way to write grammar
    • Yitao’s goal in Q3 and Q4 2005
      • Either directly incorporate the CFG’s score into the search
      • Or implement an approximate converter from CFG to FSM (HTK’s method)
finite state machine search mode 2 other problems
Finite State Machine Search (Mode 2) –Other Problems
  • Problems inherited from Sphinx2 (copied from Ravi’s slide)
    • No lextree implementation (What?)
    • Static allocation of all HMMs; not allocated “on demand” (Oh, no!)
    • FSG transitions represented by NxN matrix (You can’t be serious!! )
  • Other wish list
    • No histogram pruning (Houston, we’ve got a problem.)
    • No state-based implementation (Wilson! I am sorry!! )
      • We need it for unifyication of BW, alignment, allphone and FSG search.
time switching tree search mode 4
Time Switching Tree Search (Mode 4)
  • Name changes:
    • It was “lucky wheel”
    • Now is “magic wheel”
  • In last check-in, after test-full, results are exactly the same for 6 corpora
    • We could sleep.
  • Future work:
    • Change the word end triphone implementation
      • from composite triphone to full triphones
word switching tree search mode 5
Word Switching Tree Search (Mode 5)
  • Now could run for the Communicator task
    • With the same performance as mode 4
  • Major reasons why it doesn’t approach decode_anytopo’s result
    • Bigram probability is not yet factored
      • Not an easy task. Still considering howto.
    • Triphone’s implementation is not yet exact
  • Completion 30%
future work on mode 5
Future work on Mode 5
  • N-gram Look-ahead
  • Full trigram tree implementation
  • Phoneme and Word Look-ahead
  • Share full triphone implementation with mode 4 in future.
big picture of all search implementations
Big picture of All Search Implementations
  • Finite state machine data structure could unify
    • align,
    • allphone,
    • Baum-Welch,
    • FSG search
  • Time will show whether it is also applicable in tree search.
  • Search implementation has more short-term demand.
    • Mode 5 will be our new flag ship
    • By Oct, 3 out of 4 goals in mode 5 should be completed.
  • Between different searches, code should be shared as much as possible
summary of re factorings
Summary of Re-factorings
  • Not gentle any more
  • But it is mild
  • Several useful things to know
    • Language model routine revamping
    • S3.0 family of tools
    • Overall status of merging
lm routine
LM routine
  • Current capability
    • Read both text-based and DMP-based LM
    • Allow switching of LM
    • Allow inter-conversion between text and DMP format of LM
    • Provide single interface to all applications
  • Tool of the month : lm_convert
    • lm3g2dmp++
    • Will be the application for future language model inter-conversion
      • Other formats? CMULMTK’s format?
s3 0 family of tools
S3.0 family of tools
  • Architecture drives many changes in the code
    • Align, allphone and decode_anytopo now use
      • kbcore
      • Same version of multi-stream GMM Computation routine
      • Simplified search structure.
      • ctl_process mechanism
    • Next step is to use srch.c interface.
    • All tools are now sharing
      • Sets of common command-line macros
code merging
Code Merging
  • Sphinx3.0, Sphinx 3.X and share are now unified.
  • Alex: “It’s time to fix the training algorithms!”
  • Ravi: “It’s time to add full n-gram and full n-phones to the recognizer!!”
  • Dave: ”It’s time to work on pronunciation modeling!”
  • Yitao: “It’s time to implement a CFG-based search!!”
  • Evandro: “It’s time to do more regression test!”
  • Alan: “Don’t merge Sphinx with festival!!”
  • Next step:
    • It’s time to clean up SphinxTrain.
    • We will keep the pace to be <4 tools check-in/month.
hieroglyphs
Hieroglyphs
  • Halves of Chapter 3 and 5 are finished
    • Chapter 3: “Introduction to Speech Recognition”
      • Missing : Description of DTW, HMM and LM
    • Chapter 5: “Roadmap of building speech recognition system”
      • Missing
        • How to evaluate the system?
        • How to train a system? (Evandro’s tutorial will be perfect)
  • Still ~4 chapters (out of 12) of material to go before 1st draft is written
conclusion
Conclusion
  • We have done something.
  • Embedded Sphinx 2
    • Its completion will benefit both sphinx 2 and sphinx 3
  • Sphinx 3.6
    • Its completion will benefit
      • long term development
      • Short term need in funded projects
  • Tentative deadline: Beginning of October