1 / 71

Coupling between ASR and MT in Speech-to-Speech Translation

Coupling between ASR and MT in Speech-to-Speech Translation. Arthur Chan Prepared for Advanced Machine Translation Seminar. This Seminar (~35 pages). Introduction (6 slides) Ringger’s categorization of Coupling between ASR and NLU (7 slides) Interfaces in Loose Coupling

ted
Download Presentation

Coupling between ASR and MT in Speech-to-Speech Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coupling between ASR and MT in Speech-to-Speech Translation Arthur Chan Prepared for Advanced Machine Translation Seminar

  2. This Seminar (~35 pages) • Introduction (6 slides) • Ringger’s categorization of Coupling between ASR and NLU (7 slides) • Interfaces in Loose Coupling • 1 best and N-best (5 slides) • Lattices/Confusion Network/Confidence Estimation (9 slides) • Results from literature (4 slides) • Tight Coupling • Ney’s Theory and 2 methods of Implementation (4 slides) • ( Sorry, no FST approaches will be discussed) • Many Bonus Material at the back

  3. History of this presentation • V1: • Draft finished in Mar 1st • Tanja’s comment: • Direct modeling could be skipped. • We could focus on telling why/ASR • Generates the current outputs • Issues in MT searching could be ignored.

  4. History of this presentation (cont.) • V2 – V4: • Followed Tanja’s comment and finished in Mar 19th . • Reviewer’s comment • Too long (70 pages) • Ney’s search formulation is too difficult to follow • V5 – V6 • Significantly trimmed down the presentation • Moved a lot of things to the backup section. • V7 • Incorporated some comments from Alon, Stephan and the class.

  5. 4 papers on Coupling of Speech-to-Speech Translation H. Ney, “Speech translation: Coupling of recognition and translation,” in Proc. ICASSP, 1999. S.Saleem, S. C. Jou, S. Vogel, and T. Schultz, “Using word lattice information for a tighter coupling in speech translation systems,” in Proc. ICSLP, 2004. V.H. Quan et al., “Integrated N-best re-ranking for spoken language translation,” in In EuroSpeech, 2005. N. Bertoldi and M. Federico, “A new decoder for spoken language translation based on confusion networks,” in IEEE ASRU Workshop, 2005.

  6. A Conceptual Model of Speech-to-Speech Translation Speech Recognizer Machine Translator Speech Synthesizer Decoding Result(s) Translation waveforms waveforms

  7. Motivation of Tight Coupling between ASR and MT • One best of ASR could be wrong • MT could be benefited from wide range of supplementary information provided by ASR • N-best list • Lattice • Sentenced/Word-based Confidence Scores • E.g. Word posterior probability • Confusion network • Or consensus decoding (Mangu 1999) • MT quality may depend on WER of ASR (?)

  8. Scope of this talk. Speech Recognizer Machine Translator Speech Synthesizer 1-best? N-best? Translation waveforms waveforms Lattice? Confusion network? Loose Coupling/ Tight Coupling

  9. Topics Covered Today • The concept of Coupling • “Tightness” of coupling between ASR and Technology X. (Ringger 95) • Two questions: • What could ASR provide in loose coupling? • Discussion of interfaces between ASR and MT in loose coupling • What is the status of tight coupling? • Ney’s Formulation

  10. Topics not covered • Direct Modeling • Use both features in ASR and MT • Some referred as “ASR and MT unification” • FST approaches • [V7: I only read two papers and couldn’t do the justcice.] • Implication of the MT search algorithms on the coupling • Generation of speech from text.

  11. The Concept of Coupling

  12. Classification of Coupling of ASR and Natural Language Understanding (NLU) • Proposed in Ringger 95, Harper 94 • 3 Dimensions of ASR/NLU • Complexity of the search algorithm • Simple N-gram? • Incrementality of the coupling • On-line? Left-to-right? • Tightness of the coupling • Tight? Loose? Semi-tight?

  13. Tightness of Coupling Tight Semi-Tight Loose

  14. Notes: • Semi-tight coupling could appear as • Feedback loop between ASR and Technology X for the whole utterance of speech • Or Feedback loop between ASR and Technology X for every frame. • The Ringger framework • A good way to understand how speech-based system is developed

  15. Example 1: LM • Someone asserts that ASR has to be used with 13-grams. • In tight-coupling, • A search will be devised to search for the best word sequence with best acoustic score + 13 gram likelihood • In loose coupling • A simple search will be used to generate some outputs (N-best list, lattice etc.), • 13-gram will then use to rescore the output. • In semi-tight coupling • 1, A simple search will be used to generate results • 2, 13 gram will be applied at the word-end only (but exact history will not be stored)

  16. Example 2: Higher order AM • Segmental model assume obs. probability is not conditionally independent. • Someone assert that segmental model is better than just HMM. • Tight coupling: Direct search of the best word sequence using segmental model. • Loose coupling: Use segmental model to rescore • Semi-tight coupling: Hybrid HMM-Segmental model algorithm?

  17. Summary of Coupling between ASR and NLU

  18. Implication on ASR/MT coupling • Generalize many systems • Loose coupling • Any system which uses 1-best, n-best, lattice, or other inputs for 1-way module communication • (Bertoldi 2005) • CMU System (Saleem 2004) • Tight coupling • (Ney 1999) • Semi-tight coupling • (Quan 2005)

  19. Interfaces in Loose Coupling:1-best and N-best

  20. Perspectives • ASR outputs • 1-best results • N-best results • Lattice • Consensus network. • Confidence scores • How ASR generate these outputs? • Why they are generated? • What if there are multiple ASRs? • (and what if their results are combined?) • Note : we are talking about state-lattice now, not word-lattice. 

  21. Origin of the 1-best. • Decoding of HMM-based ASR = Searching the best path in a huge HMM-state lattice. • 1-best ASR result • The best path one could find from backtracking. • State Lattice in ASR (Next page)

  22. Note on 1-best in ASR • Most of the time 1-best Word Sequence • Why? • In LVCSR, storing the backtracking pointer table for state sequence takes a lot of memory (even nowadays) • [Compare this with the number of frames of score one need to be stored] • Usually a backtrack pointer storing • The previous words before the current word • Clever structure dynamically allocate back-tracking pointer table.

  23. What is N-best list? • Traceback not only from the 1st -best, also from the 2nd best and 3rd best, etc. • Pathway: • Directly from search backtrack pointer table • Exact N-best algorithm (Chow 90) • Word pair N-best algorithm (Chow 91) • A* search using Viterbi score as heuristic (Chow 92) • Generate lattice first, then generate N-best from lattice

  24. Interfaces in Loose Coupling:Lattice, Consensus Network and Confidence Estimation

  25. What is Lattice? • A word-based lattice • A compact representation of state-lattice • Only word node (or link) are involved • Difference between N-best and Lattice • Lattice could be compact representation of N-best list.

  26. How lattice is generated? • From the decoding backtracking pointer table • Only record all the links between word nodes. • From N-best list • Become a compact representation of N-best • [sometimes spurious link will be introduced] • Some complicated issue • Triphone contexts • Cause a lot of complicated issue • When lattice is too large • You want to trim it.

  27. Conclusions on lattices • Lattice generation itself could be a complicated issue • Sometimes, what post-processing stage (e.g. MT) will get is pre-filtered, pre-processed results.

  28. Confusion Network and Consensus Hypothesis • Confusion Network: • Or “Sausage Network”. • Or “Consensus Network”

  29. Special Properties • More “local” than lattice • One can apply simple criteria to find the best results • E.g. “consensus decoding” is to apply word-posterior probability on confusion network. • More tractable • In terms of size

  30. Note on Consensus Network: • Note: • Time information might not be preserved in confusion network • The similarity function directly affect the final output of the consensus network. • Other ways to generate confusion network • From the N-best list • Using Rover. • A mixture of voting and adding confidence of word

  31. Confidence Measure • Anything other than likelihood which could tell whether the answer is useful • E.g. • Word posterior probability • P(W|A) • Usually compute using lattices • Language model backoff mode • Other posterior probabilities (frame, sentence)

  32. Interfaces in Loose Coupling:Results from the Literature

  33. General Note • Coupling in SST is still pretty new • Papers are chosen according to whether some outputs have been used • Other techniques such as direct modeling might be mixed into the papers.

  34. N-best list (Quan 2005) • Using N-best list for reranking • Interpolation weights of AM and TM are then optimized. • Summary: • Reranking gives improvements.

  35. Lattices: CMU results (Saleem 2004) • Summary of results • Lattice word error rate improved when lattice density improves • Lattice density and Weight on Acoustic scores turns out to be an important parameter to tune • Too large and small could hurt.

  36. Consensus Network • Bertoldi 2005 is probably the only work on confusion-network based method • Summary of results: • When direct modeling is applied • Consensus Network doesn’t beat N-best method. • Author argues for speed and simplicity of the algorithm

  37. Confidence: Does it help? • According to Zhang 2006, Yes. • Confidence Measure (CM) filtering is used to filter out unnecessary results in N-best • Note: The approaches used is quite different.

  38. Conclusion on Loose Coupling • SR could give a rich set of outputs. • It seems that it is still an unknown what type of output should be used in pipeline. • Currently, it seem to lack of comprehensive experimental studies on which method is the best. • Usage of confusion network and confidence estimation seem to be under-explored.

  39. Comments about Consensus Network • From Stephan: • Reasons not using consensus networks *now* • 1, the consensus network might occasionally give spurious links in each sausage segment. • 2, lattices from the ASR teams could change from time to time. MT teams need time to consume them. • From Alon, Ralf and Stephan: • There are not much big reasons not to use consensus network because essentially it is just another type of network.

  40. Tight Coupling : Theory and Practice

  41. Theory (Ney 1999) Baye’s Rule Introduce f as hidden var. Baye’s Rule Assume x doesn’t depend on target lang. Sum to Max

  42. Layman point of view • Three factors • Pr(e) : target language model • Pr(f|e) : translation model • Pr(x|f) : acoustic model • Note: assumption has been made only the best matching f for e is used.

  43. Comparison with SR • In SR: • Pr(f) : Source language model • In Tight coupling • Pr(f|e), Pr(e) : Translation model and Target language model

  44. Algorithmic Point of View • Brute Force Method: Instead of incorporating LM into standard Viterbi algorithm • Incoporating P(e) and P(f|e) • => Very complicated • The backup slides in the presentation has detail about Ney’s implementations.

  45. Experimental Results in Matusov, Kanthak and Ney 2005 • Summary of the results • Translation quality is only improved by tight coupling when the lattice density is not high. • Same as Saleem 2004, incorporation of acoustic scores help.

  46. Conclusion: Possible Issues of tight coupling • Possibilities: • In SR, source n-gram LM is very closed to the best configuration. • The complexity of the algorithm is too high, approximation is still necessary to make it work. • When the criterion in tight coupling is used. It is possible that the LM and the TM need to be jointly estimated. • The current approaches still haven’t really implement tight-coupling • There might be bugs in the programs.

  47. Conclusion • Two major issues in coupling of SST is discussed • In loose coupling: • Consensus network and Confidence scoring is still not fully utilized • In tight coupling: • The approach seem to be haunted by very high complexity of search algorithm construction

  48. Discussion • Ian: It could be quite difficult to characterize a relationship of WER and BLEU. • Alan ask: Why not jointly optimize translation model and acoustic model? • Arthur: direct modeling could be useful • Stephan: (rephrase) will it really help?

More Related