1 / 14

CALO Decoder Progress Report for March

CALO Decoder Progress Report for March. Arthur (Decoder and ICSI Training) Jahanzeb (Decoder) Ziad (ICSI Training) Moss (ICSI Training) Carnegie Mellon University Apr 13, 2004. This Presentation. Progress report for March In February Batch mode recognizer completed

naeva
Download Presentation

CALO Decoder Progress Report for March

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CALO Decoder Progress Report for March Arthur (Decoder and ICSI Training) Jahanzeb (Decoder) Ziad (ICSI Training) Moss (ICSI Training) Carnegie Mellon University Apr 13, 2004

  2. This Presentation • Progress report for March • In February • Batch mode recognizer completed • Live-mode recognizer didn’t work • In March • More decoder work • Speed, Accuracy, Interface. • ICSI transcription conversion task • Resources, Conversion Scripts • Miscellaneous efforts in improving the decoder • Contact with other groups, web page(s), manual.

  3. Decoder work (Speed) • By Arthur and Jahanzeb • Sphinx 3.4 starts to work reasonably in Communicator task • 1G: 1.1xRT, 2G: 0.48xRT • Phoneme look-ahead research completed • 15-20% gain when CIGMMS applied • Will incorporate as a functionality • Outlook of April • Machine Optimization (Still there!) • WSJ evaluation • Technical report version of the results publishing.

  4. Decoder work (Accuracy) • First comparison between s2 and s3.4 • S3.0 ~ S2 > S3.3 > S3.4 • Not the fairest comparison • S3 model is trained by female speakers only • S3 model is less tuned • Outlook of April • Learn how to do training. Do a fairer comparison. • Change search structure.

  5. Decoder work (Interface) • Live-mode decoder works • Live-mode recognizer interface is still poorer than S2 • No config file yet. • Many users complained (Well, actually 2-3 of them) • Outlook of April • Focus on building better API-interface and command-line interface. • Jahanzeb will be there while Arthur is working on training.

  6. ICSI Training • Transcription Conversion Task • By Moss, Ziad and Arthur • Completion of Resource • <VocalSound> mapping (100%) • <NonVocalSound> mapping (100%) • OOV (~20%) • Conversion script (90%)

  7. ICSI Transcription: How does it look like? • <Segment StartTime="41.311" EndTime="43.773" Participant="me013" DigitTask="true"> • three six two four three zero seven <Comment Description="Digits"/> • </Segment> • <Segment StartTime="0.931" EndTime="3.611" Participant="me034"> • <VocalSound Description="whistling"/> • </Segment>

  8. XML tags conversion • Transcription is more detail than necessary. • Current Treatment: • <Comment> : Ignore whole sentence. Too many occurrences, too many varieties.. • <Emphasis> : Ignore. • <Pronounce> : Replace by ++GARBAGE++ • <Foreign> : Ignore whole sentence. Too few occurrence. Don’t want to care • <Uncertain> : Replace by ++GARBAGE++ • <VocalSound> & <NonVocalSound> : Use mapping.

  9. Plain-text Normalization • After XML Conversion • “I – I am no- , I mean C-zero” • ‘-’ can mean • “-” : Interruption/Interjection marks • “-XXX” or “XXX-” : Broken words • “XXX-XXX” : hyphenated words • AM transcription • Get rid all pronunciations and leave broken words alone • LM transcription • Interruption marks and broken words will be removed • (Optional) Leave interruption marks there.

  10. XML conversion script • Functionalities • Optional conversion • Resource (dict/mapping/rules) read-in • XML parser • Generate both transcription and control file for close-talking microphones • Generate both LM and AM transcription • TODO: • Incorporate Ziad’s script • Correct timing information • Generation of far-field channels • Fix small bugs.

  11. Outlook of ICSI training task in April • Complete OOVs transcription (Arthur, Moss and Ziad) • Fix bugs in conversion script (Arthur • Learn AM training (Ziad and Arthur) • LM training (Moss) • Fix potential problems in SphinxTrain.

  12. Miscellaneous (Contact with other group) • Want to seek a better interface for Sphinx • Try to contact other groups to see what’s up • XVoice-sphinx, • “command-and-control” application that tried to use Sphinx. • Actually it does dictation. • Not very happy with Sphinx after Sphinx’s default AM and LM in command-and-control • OSSRI • No clear goal yet • Start to gather funding. • Don’t really like Sphinx because “Sphinx is poorer than ViaVoice in C&C”

  13. We need to help them more…… • We need better …… • Release (to replace s3.3) • After WSJ evaluation, S3.4 will officially released to replace the current S3.3 • Sphinx web page (also CMU web page) • Sphinx’s web page need to have a more unified theme. • Task force will be gathered after ICSLP 2004. • Manual • Need to provide basic education to developers and “hard-core” hackers. • wrote the first outline of the manual. • 1st draft will appear in a quarter time-frame.

  14. Summary • Still need to build good model for ICSI first. (Arthur/Ziad/Moss) • Training is also critical to understand why s2> s3.3. • Better everything for the decoder • Arthur/Jahanzeb -> 50/50 • Others : always on my “priority queue”, will pop up at the right time.

More Related