1 / 16

SpeechBuilder: Facilitating Spoken Dialogue System Creation

SpeechBuilder: Facilitating Spoken Dialogue System Creation. Eugene Weinstein Project Oxygen Core Team MIT Laboratory for Computer Science ecoder@mit.edu. Language Generation. Speech Synthesis. Dialogue Management. Hub. Audio. Database Server. Speech Recog. Context Resolution.

bunme
Download Presentation

SpeechBuilder: Facilitating Spoken Dialogue System Creation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SpeechBuilder: Facilitating Spoken Dialogue System Creation Eugene Weinstein Project Oxygen Core Team MIT Laboratory for Computer Science ecoder@mit.edu

  2. Language Generation Speech Synthesis Dialogue Management Hub Audio Database Server Speech Recog. Context Resolution Language Processing Speech Builder Bridging the Experience Gap • Developing robust, mixed-initiative spoken dialogue systems is difficult • Complex systems can be created by human-language technology experts • Novice developers must overcome a considerable technical challenge • SpeechBuilder aims to help novices rapidly create speech-based systems • Uses intuitive methods for specifying domain-specific constraints • Automatically configures HLT components using MIT GALAXY architecture • Leverages future technical advances • Encourages research on portability

  3. CGI Parameter Generation Speech Synthesis Hub Audio Server SpeechBuilder Server HTTP Speech Recognition Language Processing Baseline Configuration • Communication with Galaxy via simple HTTP protocol • Gives developer total control over application functionality Developer Application

  4. CGI Parameter Generation Speech Synthesis Hub Audio Server Frame Relay Server Semantic Frame TCP Socket Speech Recognition Language Processing Modified Baseline Configuration (this class) • Still gives developer total control over application functionality • Frame Relay server exposes Galaxy meaning representation to app Developer Application

  5. Language Generation Speech Synthesis Dialogue Management Audio Server Audio Server Hub Database Server INFO I/O Server Speech Recognition Discourse Resolution Language Processing Database Access Configuration ** • No programming required; specify table(s) and constraints • For a speech-based interface to structured data

  6. NLG Upload INFO TTS Dialog HUB Compile NLU NLG Hub Audio SB Disc Dialog ASR ASR Discours Response NLU INFO Query Creating a Speech-Based Application Step 1:Off-line creation and compilation Step 2: On-line deployment

  7. Generates ‘E-form’, SQL, & responses • Default entries made • Galaxy programmable hub controls interactions between all components Language Generation • Generic server handles interaction • Accesses back-end database Speech Synthesis Dialogue Management • Commercial product Hub Audio Server Database Server • Telephone or lightweight audio server • New component performs concept inheritance & masking • Processes ‘E-form’ Speech Recognition Context Resolution Language Processing • Generic acoustic models • Unknown word model • Class or hierarchical n-gram • N-best interface with ASR • Grammar from attributes & actions • Backs off to concept spotting Human Language Technologies

  8. Extracting Database Information ** • Some columns are used to access entries (e.g., Name) • Column entries must be incorporated into ASR & NLU • Some columns are only used in responses (e.g., Phone) • Column names must be incorporated into ASR & NLU “What is the phone number for Victor Zue?”

  9. Knowledge Representation • Concepts and actions form basis for understanding • Concepts become key/value entries in meaning representation • city:Boston, New York…day:Monday, Tuesday • Actions provide sentence-level patterns of specific queries • “I want to fly from Boston to Taipei…” action=lookup_flight • Action text can be bracketed to define hierarchical concepts ** • “I want to fly source=(from Boston) destination=(to Taipei)” • source=Boston destination=Taipei • Concepts and actions used to configure the following components • Speech Recognition • Natural Language Understanding • Discourse • Database columns define basic concepts • Column names can be grouped into concepts • property:phone, email…weather:snow, rain…

  10. rain snow hail Language Modeling and Understanding “Will it snow?” weather:snow • By default, concepts are used for language modeling, parsing grammar, and meaning representation • Concept usage can be fine-tuned to improve performance:** • For language modeling and parsing grammar only (i.e., no meaning) • For keyword spotting only (i.e., no role in language modeling) • For fine-grained language modeling with coarser meaning representation snowfall snowstorm sprinkles breezy accumulation showers snowy thunderstorm flurries blizzard rainfall rainy weather:snow

  11. Current Status • SpeechBuilder has been operational for over two years • Used by over 50 developers from MIT and elsewhere • Used in undergraduate classes at MIT and Georgetown University • ASR capabilities benchmarked against main systems • Achieves same ASR performance as MIT Jupiter weather information system (6.8% word error rate on clean data) (phone #) • Several prototype systems have been developed • Information about faculty, staff and students at LCS and AI Labs (phone, email, room, voice messages, transfer, etc.) • Application to control the various physical items in a typical office (lights, curtains, TV, VCR, projector, etc.) • Others include TV schedules, real-time weather forecasts, hotel and restaurant information etc. • SpeechBuilder used for initial design of many more complex domains

  12. Ongoing and Future Work • Increase sophistication of discourse and dialogue manager to handle more complex dialogues • Enable finer specification of discourse capabilities • Add generic capabilities for times, dates, etc. • Incorporate confidence scoring and implement unsupervised training of acoustic and language models • Create functionality to allow developers to create domain-specific concatenative speech synthesis • Create alternative methods of domain specifications to streamline development • Advanced developers don’t necessarily use web interface • Allow for more efficient automatic generation of SpeechBuilder domains

  13. Acknowledgements Issam Bazzi Scott Cyphers Ed Filisko Jim Glass TJ Hazen Lee Hetherington Joe Polifroni Stephanie Seneff Michelle Spina Eugene Weinstein Jon Yi Misha Zitser

  14. SpeechBuilder Hands-on Activity Eugene Weinstein Project Oxygen Core Team MIT Laboratory for Computer Science ecoder@mit.edu

  15. CGI Parameter Generation Speech Synthesis Hub Audio Server Frame Relay Server TCP Socket Speech Recognition Language Processing Modified Baseline Configuration (this class) Jaim • Still gives developer total control over application functionality • Frame Relay server exposes Galaxy meaning representation to app Developer Application Semantic Frame

  16. SpeechBuilder API Galaxy Frame Relay • Galaxy meaning representation provided through frame relay • Applications connect via TCP sockets • API provided in Perl, Python, and Java • This class: Python API Python class galaxy.server.Server TCP Socket galaxy.frame.Frame methods: getAction() getAttribute(attr_name) getText() toString() galaxy.server.Server methods: Constructor(machine,port,ID) connect() processMessage(blocking) disconnect() Python class galaxy.frame.Frame Python API Application

More Related