robust translation of spontaneous speech a multi engine approach n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Robust Translation of Spontaneous Speech: A Multi-Engine Approach PowerPoint Presentation
Download Presentation
Robust Translation of Spontaneous Speech: A Multi-Engine Approach

Loading in 2 Seconds...

play fullscreen
1 / 114

Robust Translation of Spontaneous Speech: A Multi-Engine Approach - PowerPoint PPT Presentation


  • 116 Views
  • Uploaded on

Seventeenth International Joint Conference on Artificial Intelligence, IJCAI-01 Seattle Wednesday, 8 August 2001. Robust Translation of Spontaneous Speech: A Multi-Engine Approach. Wolfgang Wahlster. German Research Center for Artificial Intelligence DFKI GmbH www.dfki.de/~wahlster.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Robust Translation of Spontaneous Speech: A Multi-Engine Approach


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Seventeenth International Joint Conference on Artificial Intelligence, IJCAI-01 Seattle Wednesday, 8 August 2001 Robust Translation of Spontaneous Speech: A Multi-Engine Approach Wolfgang Wahlster German Research Center for Artificial Intelligence DFKI GmbH www.dfki.de/~wahlster

    2. Mobile Speech-to-Speech Translation of Spontaneous Dialogs As the name Verbmobil suggests, the system supports verbal communication with foreign dialog partners in mobile situations. 1 face-to-face conversations telecommunication 2

    3. Mobile Speech-to-Speech Translation of Spontaneous Dialogs Verbmobil Speech Translation Server Conference Call: The Verbmobil Speech Translation Server connects GSM cell phone users

    4. Robust Realtime Translation with Verbmobil At a German Airport: An American business man calls the secretary of a German business partner.

    5. Outline l Verbmobil‘s Multi-Blackboard and Multi-Engine Architecture l Exploiting Underspecification in a Multi-Stratal Semantic Representation Language l Combining Deep and Shallow Processing Strategies for Robust Dialog Translation l Evaluation and Technology Transfer l Lessons Learned and Conclusions

    6. German German GermanEnglish English German English English Telephone-based Dialog Translation Verbmobil Server Cluster German Dialog Partner l ISDN Conference Call (3 Participants): -German Speaker -Verbmobil -American Speaker l Speech-based Set-up of the Conference Call Bianca/Brick XS BinTec ISDN-LAN Router American Dialog Partner LINUX Server Sun Server 450 Sun ULTRA 60/80

    7. Verbmobil: The First Speech-Only Dialog Translation System American Speaker: “Verbmobil” (Voice Dialing) Mobile GSM Phone Mobile DECT Phone

    8. Verbmobil: The First Speech-Only Dialog Translation System American Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server +49 631 3111911 Mobile GSM Phone Mobile DECT Phone

    9. Verbmobil: The First Speech-Only Dialog Translation System American Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server +49 631 3111911 Mobile GSM Phone Mobile DECT Phone Verbmobil: “Welcome to the Verbmobil Translation System. Please speak the telephone number of your partner.”

    10. Verbmobil: The First Speech-Only Dialog Translation System American Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server +49 631 3111911 Mobile GSM Phone Mobile DECT Phone Verbmobil: “Welcome to the Verbmobil Translation System. Please speak the telephone number of your partner.” American Speaker: “0177555”

    11. Verbmobil: The First Speech-Only Dialog Translation System American Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server +49 631 3111911 Foreign Participant is placed into the Conference Call Mobile GSM Phone Mobile DECT Phone Verbmobil: “Welcome to the Verbmobil Translation System. Please speak the telephone number of your partner.” To German Participant To American Participant American Speaker: “0177555” Verbmobil: Verbmobil hat eine neue Verbindung aufgebaut. Bitte sprechen Sie jetzt. Verbmobil: Welcome to the Verbmobil server. Please start your input after the beep.

    12. Verbmobil is a Multilingual System English (American) German Japanese German Chinese (Mandarine) German It supports bidirectional translation between:

    13. Verbmobil Partner TU-BRAUNSCHWEIG DAIMLERCHRYSLER RHEINISCHE FRIEDRICH WILHELMS-UNIVERSITÄT BONN LUDWIG MAXIMILIANS UNIVERSITÄT MÜNCHEN Phase 2 UNIVERSITÄT BIELEFELD UNIVERSITÄT DES SAARLANDES TECHNISCHE UNIVERSITÄT MÜNCHEN UNIVERSITÄT HAMBURG FRIEDRICH- ALEXANDER- UNIVERSITÄT ERLANGEN-NÜRNBERG RUHR-UNIVERSITÄT BOCHUM EBERHARDT-KARLS UNIVERSITÄT TÜBINGEN UNIVERSITÄT STUTTGART UNIVERSITÄT KARLSRUHE  W. Wahlster, DFKI

    14. Three Levels of Language Processing Speech Telephone Input Acoustic Language Models Speech Recognition What has the caller said? 100 Alternatives Word Lists Sprachanalyse Speech Analysis Grammar Reduction of Uncertainty What has the caller meant? 10 Alternatives Lexical Meaning Speech Under- stan- ding Discourse Context Knowledge about Domain of Discourse What does the caller want? Unambiguous Understanding in the Dialog Context

    15. Challenges for Language Engineering Close-Speaking Microphone/ Headset Push-to-talk Speaker Dependent Isolated Words Monolog Dictation Speaker Independent Information- seeking Dialog Read Continuous Speech Telephone, Pause-based Segmentation Increasing Complexity Spontaneous Speech Open Microphone, GSM Quality Multiparty Negotiation Speaker adaptive Verbmobil Input Conditions Naturalness Adaptability Dialog Capabilities

    16. Verbmobil II: Three Domains of Discourse Scenario 2 Travel Planning & Hotel Reservation Scenario 3 PC-Maintenance Hotline Scenario 1 Appointment Scheduling

    17. Verbmobil II: Three Domains of Discourse Scenario 2 Travel Planning & Hotel Reservation Scenario 3 PC-Maintenance Hotline Scenario 1 Appointment Scheduling When? What? When? Where? How? When? Where? How?

    18. Verbmobil II: Three Domains of Discourse Scenario 2 Travel Planning & Hotel Reservation Scenario 3 PC-Maintenance Hotline Scenario 1 Appointment Scheduling When? What? When? Where? How? When? Where? How? Focus on temporal expressions Integration of special sublanguage lexica Focus on temporal and spatial expressions

    19. Verbmobil II: Three Domains of Discourse Scenario 2 Travel Planning & Hotel Reservation Scenario 3 PC-Maintenance Hotline Scenario 1 Appointment Scheduling When? What? When? Where? How? When? Where? How? Focus on temporal expressions Integration of special sublanguage lexica Focus on temporal and spatial expressions Vocabulary Size: 6000 Vocabulary Size: 30000 Vocabulary Size: 10000

    20. Context-Sensitive Speech-to-Speech Translation Wann fährt der nächste Zug nach Hamburg ab? When does the next train to Hamburg depart? Wo befindet sich das nächste Hotel? Whereis the nearest hotel? Verbmobil Server

    21. The Control Panel of Verbmobil

    22. The Control Panel of Verbmobil

    23. The Control Panel of Verbmobil

    24. The Control Panel of Verbmobil

    25. The Control Panel of Verbmobil

    26. The Control Panel of Verbmobil

    27. The Control Panel of Verbmobil

    28. The Control Panel of Verbmobil

    29. The Control Panel of Verbmobil

    30. The Control Panel of Verbmobil

    31. The Control Panel of Verbmobil

    32. Verbmobil‘s Massive Data Collection Effort Transliteration Variant 1 Transliteration Variant 2 Lexical Orthography Canonical Pronounciation Manual Phonological Segmentation 3,200 dialogs (182 hours) with 1,658 speakers 79,562 turns distributed on 56 CDs, 21.5 GB Automatic Phonological Segmentation Word Segmentation Prosodic Segmentation Dialog Acts Noises Superimposed Speech Syntactic Category Word Category Syntactic Function Prosodic Boundaries The so-called Partitur (German word for musical score) orchestrates fifteen strata of annotations

    33. Extracting Statistical Properties from Large Corpora Segmented Speech with Prosodic Labels Treebanks & Predicate- Argument Structures Annotated Dialogs with Dialog Acts Aligned Bilingual Corpora Transcribed Speech Data Machine Learning for the Integration of Statistical Properties into Symbolic Models for Speech Recognition, Parsing, Dialog Processing, Translation Neural Nets, Multilayered Perceptrons Probabilistic Transfer Rules Hidden Markov Models Probabilistic Automata Probabilistic Grammars

    34. Multilinguality Japanese German English 100 90 80 Word accuracy [%] 70 60 50 '97 '98 2000 '99.1 '99.2 '99.3 VM1

    35. Multilinguality Language Identification (LID) German Recognizer Independent LID- Module w1 … wn Speech English Recognizer Japanese Recognizer

    36. From a Multi-Agent Architecture to a Multi-Blackboard Architecture Verbmobil I Verbmobil II  Multi-Agent Architecture  Multi-Blackboard Architecture M3 M1 M2 M3 Blackboards M1 M2 BB 1 BB 2 BB 3 M4 M5 M6 M4 M5 M6 Each module must know, which module produces what data  Direct communication between modules  Heavy data traffic for moving copies around  All modules can register for each blackboard dynamically  No direct communication between modules  No copies of representation structures (word lattice, VIT chart)

    37. Multi-Blackboard/Multi-Engine Architecture Module 2.1 Module 1.1 Module 3.1 2.2 3.2 1.2 . . . . . . Blackboard 3 Syntactic Representation: Parsing Results Blackboard 1 Preprocessed Speech Signal Blackboard 4 Semantic Representation: Lambda DRS Blackboard 5 Dialog Acts Blackboard 2 Word Lattice Module 5.1 Module 4.1 Module 6.1 5.2 4.2 6.2 . . . . . .

    38. A Multi-Blackboard Architecture for the Combinationof Results from Deep and Shallow Processing Modules Command Recognizer Channel/Speaker Adaptation Audio Data Spontaneous Speech Recognizer Prosodic Analysis

    39. A Multi-Blackboard Architecture for the Combinationof Results from Deep and Shallow Processing Modules Command Recognizer Channel/Speaker Adaptation Audio Data Spontaneous Speech Recognizer Prosodic Analysis Statistical Parser Chunk Parser Word Hypotheses Graph with Prosodic Labels Dialog Act Recognition HPSG Parser

    40. A Multi-Blackboard Architecture for the Combinationof Results from Deep and Shallow Processing Modules Command Recognizer Channel/Speaker Adaptation Audio Data Spontaneous Speech Recognizer Prosodic Analysis Statistical Parser Chunk Parser Word Hypotheses Graph with Prosodic Labels Dialog Act Recognition HPSG Parser Semantic Construction Semantic Transfer VITs Underspecified Discourse Representations Robust Dialog Semantics Generation

    41. VIT (Verbmobil Interface Terms) as a Multi-Stratal Representation Language l used as a common representation scheme for information exchange between all components and processing threads l design inspired by underspecified discourse representation structures (UDRS, Reyle/Kamp 1993) l compact representation of lexical and structured ambiguities and scope underspecifications of quantifiers, negations and adverbs l variable-free sets of non-recursive terms: [beginning (35, i37), arg3 (35, i37 ,i38),come (27, i35),arg1 (27, i35, i36),decl (37, h43),pron (26, i36),at (36, i35, i37),mofy (34 ,i38, aug),def (28, i37, h42, h41),udef (31, i38, h45, h44)], l streams of literals as flat multi-stratal representations that are very efficient for incremental processing

    42. VIT for ‘He is coming at the beginning of August‘ Vit (vitID (sid (104,a,en,10,80,1,en,y,semantics), % Segment Identifier [word (he, 1, [26]), word(is, 2, []),word(coming, 3, [27]),word(at, 4, [36]),word(the ,5, [28]),word(beginning, 6, [35]),word(of, 7, [35]),word(``August'', 8, [34])]),% WHG String index (38, 25 ,i35), % Index [beginning (35, i37), arg3 (35, i37 ,i38),come (27, i35),arg1 (27, i35, i36),decl (37, h43),pron (26, i36),at (36, i35, i37),mofy (34 ,i38, aug),def (28, i37, h42, h41),udef (31, i38, h45, h44)], % Conditions [in_g (26, 25), in_g (37, 38), in_g (27, 25), in_g (28, 30),in_g (31, 33), in_g (34, 32),in_g (35, 29), in_g (36, 25),leq (25, h41), leq (25, h43),leq (29, h42), leq (29, h44),leq (30, h43), leq (32, h45),leq (33, h43)], % Scope and Grouping Constraints [s_sort (i35, situation), s_sort (i37, time),s_sort (i38, time)],% Sortal Specifications for Instance Variables [dialog_act (25, inform), dir (36, no),prontype (i36, third,std)], % Discourse and Pragmatics [cas (i36, nom), gend (i36, masc),num (i36, sg), num (i37, sg), num (i38, sg),pcase (l135, i38, of)], % Syntax [ta_aspect (i35, progr), ta_mood (i35, ind),ta_perf (i35, nonperf),ta_tense (i35, pres)], % Tense and Aspect [pros_accent (35)] % Prosody

    43. Information between Layers is Linked TogetherUsing Constant Symbols Instances are constants interpreted as skolemized variables [word (he, 1, [26]), word(is, 2, []),word(coming, 3, [27]),word(at, 4, [36]),word(the ,5, [28]),word(beginning, 6, [35]),word(of, 7, [35]),word(``August'', 8, [34])]),% WHG String [beginning (35, i37), arg3 (35, i37 ,i38),come (27, i35),arg1 (27, i35, i36),decl (37, h43),pron (26, i36),at (36, i35, i37),mofy (34 ,i38, aug),def (28, i37, h42, h41),udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time),s_sort (i38, time)],% Sorts [cas (i36, nom), gend (i36, masc),num (i36, sg), num (i37, sg),], % Syntax

    44. Information between Layers Linked TogetherUsing Constant Symbols Instances are constants interpreted as skolemized variables [word (he, 1, [26]), word(is, 2, []),word(coming, 3, [27]),word(at, 4, [36]),word(the ,5, [28]),word(beginning, 6, [35]),word(of, 7, [35]),word(``August'', 8, [34])]),% WHG String [beginning (35, i37), arg3 (35, i37 ,i38),come (27, i35),arg1 (27, i35, i36),decl (37, h43),pron (26, i36),at (36, i35, i37),mofy (34 ,i38, aug),def (28, i37, h42, h41),udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time),s_sort (i38, time)],% Sorts [cas (i36, nom), gend (i36, masc),num (i36, sg), num (i37, sg),], % Syntax

    45. Information between Layers Linked TogetherUsing Constant Symbols Instances are constants interpreted as skolemized variables [word (he, 1, [26]), word(is, 2, []),word(coming, 3, [27]),word(at, 4, [36]),word(the ,5, [28]),word(beginning, 6, [35]),word(of, 7, [35]),word(``August'', 8, [34])]),% WHG String [beginning (35, i37), arg3 (35, i37 ,i38),come (27, i35),arg1 (27, i35, i36),decl (37, h43),pron (26, i36),at (36, i35, i37),mofy (34 ,i38, aug),def (28, i37, h42, h41),udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time),s_sort (i38, time)],% Sorts [cas (i36, nom), gend (i36, masc),num (i36, sg), num (i37, sg),], % Syntax

    46. Information between Layers Linked TogetherUsing Constant Symbols Instances are constants interpreted as skolemized variables [word (he, 1, [26]), word(is, 2, []),word(coming, 3, [27]),word(at, 4, [36]),word(the ,5, [28]),word(beginning, 6, [35]),word(of, 7, [35]),word(``August'', 8, [34])]),% WHG String [beginning (35, i37), arg3 (35, i37 ,i38),come (27, i35),arg1 (27, i35, i36),decl (37, h43),pron (26, i36),at (36, i35, i37),mofy (34 ,i38, aug),def (28, i37, h42, h41),udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time),s_sort (i38, time)],% Sorts [cas (i36, nom), gend (i36, masc),num (i36, sg), num (i37, sg),], % Syntax

    47. Information between Layers Linked TogetherUsing Constant Symbols Instances are constants interpreted as skolemized variables [word (he, 1, [26]), word(is, 2, []),word(coming, 3, [27]),word(at, 4, [36]),word(the ,5, [28]),word(beginning, 6, [35]),word(of, 7, [35]),word(``August'', 8, [34])]),% WHG String [beginning (35, i37), arg3 (35, i37 ,i38),come (27, i35),arg1 (27, i35, i36),decl (37, h43),pron (26, i36),at (36, i35, i37),mofy (34 ,i38, aug),def (28, i37, h42, h41),udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time),s_sort (i38, time)],% Sorts [cas (i36, nom), gend (i36, masc),num (i36, sg), num (i37, sg),], % Syntax

    48. The Use of Underspecified Representations Two Readings in the Source Language Wir telephonierten mit Freunden aus Schweden. A compact representation of scope ambiguities in a logical language without using disjunctions Underspecified Semantic Representation Ambiguity Preserving Translations Two Readings in the Target Language We called friends from Sweden.

    49. Verbmobil is the First Dialog Translation System that Uses Prosodic Information Systematicallyat All Processing Stages Speech Signal Word Hypotheses Graph Multilingual Prosody Module Prosodic features: l duration l pitch l energy l pause Boundary Information Boundary Information Sentence Mood Accented Words Prosodic Feature Vector Dialog Act Segmentation and Recognition Search Space Restriction Lexical Choice Speaker Adaptation Constraints for Transfer Speech Synthesis Dialog Understanding Translation Parsing Generation

    50. Using Syntactic-Prosodic Boundaries to Speed-Upthe Parsing Process yes S1 no problem S4 Mister Mueller S4 when would you like to go to HannoverS4 without boundaries: # chart edges: 1256 runtime: 1.31 secs with boundaries: #chart edges: 632 runtime: 0.62 secs speed-up: 53%