1 / 47

Speech-to-Speech MT Design and Engineering

Speech-to-Speech MT Design and Engineering. Alon Lavie and Lori Levin MT Class April 16 2001. Outline. Design and Engineering of the JANUS speech-to-speech MT system The Travel & Medical Domain Interlingua (IF) Portability to new domains: ML approaches Evaluation and User Studies

ziven
Download Presentation

Speech-to-Speech MT Design and Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech-to-Speech MTDesign and Engineering Alon Lavie and Lori Levin MT Class April 16 2001

  2. Outline • Design and Engineering of the JANUS speech-to-speech MT system • The Travel & Medical Domain Interlingua (IF) • Portability to new domains: ML approaches • Evaluation and User Studies • Open Problems, Current and Future Research

  3. Overview • Fundamentals of our approach • System overview • Engineering a multi-domain system • Evaluations and user studies • Alternative translation approaches • Current and future research

  4. JANUS Speech Translation • Translation via an interlingua representation • Main translation engine is rule-based • Semantic grammars • Modular grammar design • System engineered for multiple domains • Recent focus on domain portability • using machine learning for rapid extension to a new domain

  5. The C-STAR Travel Planning Domain General Scenario: • Dialogue between one traveler and one or more travel agents • Focus on making travel arrangements for a personal leisure trip (not business) • Free spontaneous speech

  6. The C-STAR Travel Planning Domain Natural breakdown into several sub-domains: • Hotel Information and Reservation • Transportation Information and Reservation • Information about Sights and Events • General Travel Information • Cross Domain

  7. Semantic Grammars • Describe structure of semantic concepts instead of syntactic constituency of phrases • Well suited for task-oriented dialogue containing many fixed expressions • Appropriate for spoken language - often disfluent and syntactically ill-formed • Faster to develop reasonable coverage for limited domains

  8. Semantic Grammars Hotel Reservation Example: Input: we have two hotels available Parse Tree: [give-information+availability+hotel] (we have [hotel-type] ([quantity=] (two) [hotel] (hotels) available)

  9. The JANUS-III TranslationSystem

  10. The JANUS-III TranslationSystem

  11. The SOUP Parser • Specifically designed to parse spoken language using domain-specific semantic grammars • Robust - can skip over disfluencies in input • Stochastic - probabilistic CFG encoded as a collection of RTNs with arc probabilities • Top-Down - parses from top-level concepts of the grammar down to matching of terminals • Chart-based - dynamic matrix of parse DAGs indexed by start and end positions and head cat

  12. The SOUP Parser • Supports parsing with large multiple domain grammars • Produces a lattice of parse analyses headed by top-level concepts • Disambiguation heuristics rank the analyses in the parse lattice and select a single best path through the lattice • Graphical grammar editor

  13. SOUP Disambiguation Heuristics • Maximize coverage (of input) • Minimize number of parse trees (fragmentation) • Minimize number of parse tree nodes • Minimize the number of wild-card matches • Maximize the probability of parse trees • Find sequence of domain tags with maximal probability given the input words: P(T|W), where T= t1,t2,…,tn is a sequence of domain tags

  14. JANUS Generation Modules Two alternative generation modules: • Top-Down context-free based generator - fast, used for English and Japanese • GenKit - unification-based generator augmented with Morphe morphology module - used for German

  15. Modular Grammar Design • Grammar development separated into modules corresponding to sub-domains (Hotel, Transportation, Sights, General Travel, Cross Domain) • Shared core grammar for lower-level concepts that are common to the various sub-domains (e.g. times, prices) • Grammars can be developed independently (using shared core grammar) • Shared and Cross-Domain grammars significantly reduce effort in expanding to new domains • Separate grammar modules facilitate associating parses with domain tags - useful for multi-domain integration within the parser

  16. Translation with Multiple Domain Grammars • Parser is loaded with all domain grammars • Domain tag attached to grammar rules of each domain • Previously developed grammars for other domains can also be incorporated • Parser creates a parse lattice consisting of multiple analyses of the input into sequences of top-level domain concepts • Parser disambiguation heuristics rank the analyses in the parse lattice and select a single best sequence of concepts

  17. Translation with Multiple Domain Grammars

  18. A SOUP Parse Lattice

  19. DomainPortability: Travel to Medical Knowledge-Based Methods Re-usability of knowledge sources for translation and speech recognition Corpus-Based Methods Reduce the amount of new training data for translation and speech recognition

  20. Background • New domain: Medical • Doctor-patient diagnostic conversations • Global importance in emergencies and in machine translation for remote health care • Synergy with Lincoln Lab • Joint evaluation • Joint interlingua • Test case for portability

  21. Portability • Advantage: Interlingua • Problem: Writing semantic grammars • Domain dependent • Requires time, effort, and expertise • Approach: • Grammar modularity • Domain action learning • Automatic/Interactive semantic grammar induction

  22. Hybrid Stat/Rule-based Analysis • Developing large coverage semantic analysis grammars is time consuming  difficult to port analysis system to new domains • “low-level” argument grammars are more domain-independent: contain many concepts that are used across domains: time, location, prices, etc. • “high-level” domain-actions are domain-specific, must be redeveloped for each new domain: give-info+onset+symptom • Tagging data sets with interlingua representations is less time consuming, needed anyway for system development

  23. Hybrid Rule/Stat Approach • Combines grammar-based and statistical approaches to analysis: • Develop semantic grammars for phrase-level arguments that are more portable to new domains • Use statistical machine learning techniques for classifying into domain-actions • Porting to a new domain requires: • developing argument parse rules for new domain • tagging training set with domain-actions for new domain • training the classifiers for domain-actions on the tagged data

  24. The Hybrid Analysis Process • Parse an utterance for arguments • Segment the utterance into sentences • Extract features from the utterance and the single best parse output • Use a learned classifier to identify the speech act • Use a learned classifier to identify the concept sequence • Combine into a full parse

  25. Argument Parsing • The SOUP parser produces a forest of parse trees that cover as much of the input as possible • The parse forest can be a mixture of trees allowed by any of the grammars • Only the best parse is used for further processing

  26. Argument Parse Example We have a double room available for you at twenty-three thousand five hundred yen [=availability=]::PSD ( we have [super_room-type=] ( [room-type=] ( a [room:double] ( double room ) ) ) available ) [arg-party:for-whom=]::ARG ( for [you] ( you ) ) [arg:time=]::ARG ( [point=] ( at [hour-minute=] ( [big:hour=] ( [big:23] ( twenty-three ) ) ) ) ) [arg:super_price=]::ARG ( [price=] ( [one-price:main-quantity=] ( [n-1000=] ( thousand ) [price:n-100=] ( five hundred ) ) [currency=] ( [yen] ( yen ) ) ) )

  27. Automatic Classification of Domain Actions • Train classifiers for speech acts and concepts • Training data: Utterances labeled with speech act, concepts, and best argument parse • Input features • n most common words • Arguments and pseudo-arguments in best parse • Speaker • Predicted speech act (for concept classifier)

  28. Full Parse Example We have a double room available for you at twenty-three thousand five hundred yen give-information+availability+room ( [=availability=]::PSD ( we have [super_room-type=] ( [room-type=] ( a [room:double] ( double room ) ) ) available ) [arg-party:for-whom=]::ARG ( for [you] ( you ) ) [arg:time=]::ARG ( [point=] ( at [hour-minute=] ( [big:hour=] ( [big:23] ( twenty-three ) ) ) ) ) [arg:super_price=]::ARG ( [price=] ( [one-price:main-quantity=] ( [n-1000=] ( thousand ) [price:n-100=] ( five hundred ) ) [currency=] ( [yen] ( yen ) ) ) ) )

  29. Classification Results UsingMemory-based (TiMBL) Classifiers

  30. Status and Open Research • Preliminary analysis engine implemented, currently used for travel domain in NESPOLE! • Areas for further research and development: • Explore a variety of classifiers • Explore features for domain-action classification • Classification compositionality – how to claissify the components of the domain-action separately and combine them? • Taking advantage of additional knowledge sources: the interlingua specification, dialogue context • Better address segmentation of utterance into DAs

  31. Automatic Induction of Semantic Grammars • Seed grammar for a new domain has very limited coverage • Corpus of development data tagged with interlingua representations available • Expand the seed grammar by learning new rules for covering the same domain-actions • First step: how well can we do with no human intervention?

  32. Outline of Semantic Grammar Induction IF Parser Tree Matching Linearization Hypotheses Generation Rules Management Seed Grammar s[gi+onset+sym] ( [manner=] [sym-loc=] *+became [adj:sym-name=] ) Rules Induction Learned Grammar Knowledge

  33. Human vs Machine Experiment • Seed grammar • Extended by a human • Extended by automatic semantic grammar induction

  34. Seed Grammar Medical I have a burning sensation in my foot. Cross Domain Hello. My name is Sam. Medical Around 200 rules Around 600 rules and growing Shared Around 100 rules and 6000 lexical items

  35. A Parse Tree [request-information+existence+body-state]::MED ( WH-PHRASES::XDM ( [q:duration=]::XDM ( [dur:question]::XDM ( how long ) ) ) HAVE-GET-FEEL::MED ( GET ( have ) ) you HAVE-GET-FEEL::MED ( HAS ( had ) ) [super_body-state-spec=]::MED ( [body-state-spec=]::MED ( ID-WHOSE::MED ( [identifiability=] ( [id:non-distant] ( this ) ) ) BODY-STATE::MED ( [pain]::MED ( pain ) ) ) ) )

  36. Manual Grammar Development About five additional days of development after the seed grammar was finalized Focusing on medical rules only Domain-independent rules remain untouched

  37. Development and evaluation sets • Development set: 133 sentences • from one dialog • Evaluation set: 83 sentences • from two dialogs • unseen speakers • Only SDUs that could be manually tagged with a full IF according to the current specification were included.

  38. Grading Procedure: Recall and Precision of IF Components c:give-information+ speech act existence+body-state concepts (body-state-spec=(pain, top-level argument identifiability=no), sub-argument body-location= top-level argument (inside=head)) sub-argument • Recall • ignored if number of items is 0 • Precision • ignored if 0 out of 0

  39. Seed Extended Learned Speech Act Recall 43.3 48.2 49.3 Precision 71.0 75.0 45.8 Concept List Recall 2.2 10.1 32.5 Precision 12.5 42.2 25.1 Top-Level Args Recall 0.0 7.2 29.6 Precision 0.0 42.2 34.4 Top-Level Values Recall 0.0 8.3 29.8 Precision 0.0 50.0 39.2 Sub-Level Args Recall 0.0 28.3 14.1 Precision 0.0 48.2 12.6 Sub-level Values Recall 1.2 28.3 14.1 Precision 6.2 48.2 12.9 Human vs. Machine: Evaluation Results

  40. User Studies • We conducted three sets of user tests • Travel agent played by experienced system user • Traveler is played by a novice and given five minutes of instruction • Traveler is given a general scenario - e.g., plan a trip to Heidelberg • Communication only via ST system, multi-modal interface and muted video connection • Data collected used for system evaluation, error analysis and then grammar development

  41. System Evaluation Methodology • End-to-end evaluations conducted at the SDU (sentence) level • Multiple bilingual graders compare the input with translated output and assign a grade of: Perfect, OK or Bad • OK = meaning of SDU comes across • Perfect = OK + fluent output • Bad = translation incomplete or incorrect

  42. August-99 Evaluation • Data from latest user study - traveler planning a trip to Japan • 132 utterances containing one or more SDUs, from six different users • SR word error rate 14.7% • 40.2% of utterances contain recognition error(s)

  43. Evaluation Results

  44. Evaluation - Progress Over Time

  45. Current and Future Work • Expanding the interlingua: covering descriptive as well as task-oriented sentences • Developing the new portable approaches • development of the server-based architecture for supporting multiple applications: • NESPOLE!: speech-MT for advanced e-commerce • C-STAR: speech-to-speech MT over mobile phones • LingWear: MT and language assistance on wearable devices

  46. Students Working on the Project • Chad Langley: Hybrid Rule/Stat Analysis, Speech MT architecture • Ben Han: Automatic Grammar Induction • Alicia Tribble: Interlingua and grammar development for Medical Domain • Joy Zhang, Erik Peterson: Chinese EBMT for LingWear

  47. The JANUS Speech-MT Team • Project Leaders: Lori Levin, Alon Lavie, Tanja Schultz, Alex Waibel • Grammar and Component Developers: Donna Gates, Dorcas Wallace, Kay Peterson, Alicia Tribble, Chad Langley, Ben Han, Celine Morel, Susie Burger, Vicky MacLaren, Kornel Laskowski, Erik Peterson

More Related