1 / 38

KTH speech platform

KTH speech platform. Generic framework for building demonstrators for research built mostly on in-house components Two major components Atlas – speech-technology platform SesaME - generic dialogue manager . KTH multimodal dialogue systems. Gulan. Waxholm. AdApt. August. Olga.

marinel
Download Presentation

KTH speech platform

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KTH speech platform • Generic framework • for building demonstrators • for research • built mostly on in-house components • Two major components • Atlas – speech-technology platform • SesaME - generic dialogue manager

  2. KTH multimodal dialogue systems Gulan Waxholm AdApt August Olga

  3. The Waxholm system IN SPEECH ASR “WIZARD OF OZ” NLP LEXICON DIALOGUE MANAGEMENT DATABASES GRAFIK TTS & MULTIMODALAGENT OUT

  4. Common features • built on in-house components • under continuos development • limited reuse of software resources • during development: • expert knowledge is required • highly labor intensive

  5. Atlas

  6. Flat model application, dialog engine SQL ASR ASV TTS audio coder audio device animated agent SQL datab. ASR ASV TTS desktop audio desktop audio animated agent ASR TTS ASR TTS

  7. Single-layer model application, dialog engine component APIs SQL ASR ASV TTS audio coder audio device animated agent SQL datab. ASR ASV TTS desktop audio desktop audio animated agent ASR TTS ASR TTS

  8. Multi-layer model (1) application, dialog engine speech-tech API component APIs SQL ASR ASV TTS audio coder audio device animated agent SQL datab. ASR ASV TTS desktop audio desktop audio animated agent ASR TTS ASR TTS

  9. Multi-layer model (2) application, dialog engine speech-tech API dialog components high-level primitives services component interaction component APIs SQL ASR ASV TTS audio coder audio device animated agent SQL datab. ASR ASV TTS desktop audio desktop audio animated agent ASR TTS ASR TTS

  10. Components component APIs ASR pseudo ASR pseudo ASR pseudo ASR bridge stub ? (J)SAPI Broker, CORBA Communicator stub ASR ASR ASR

  11. Middleware levels (1) • Component interaction • resource handling (create, monitor, allocate, ..) • media streams (connect, disconnect, split) • representing information (text-hypotheses, syntactic and semantic info, speaker info, ...)

  12. Middleware levels (2) • Services • resource access • play • load and send media data • make media device(s) render it • log the action • say • TTS • send media data to media device(s) • make media device(s) render it • log the action

  13. Middleware levels (3) • Services • listen • engage media processors (ASR, ASV, parser, …) • make media device record data • detect utterance • send data in right format to processor(s), file(s), and other objects • make processors work • wait for processors to finish • fuse results and deliver the “answer” • log actions and results

  14. Middleware levels (4) • High-level primitives • ask • ‘say’ prompt • ‘listen’ to answer • give caller full access to processors and their results • log actions and results • askSimple • same as ask, but returns fused results only

  15. Middleware levels (5) • Dialog components • user interaction for a special purpose • has domain knowledge • error handling/recovery • no answer • invalid amount, account, etc. • re-ask, formulation variation • can provide help • database lookup • cf. Nuance “SpeechObjects”, Philips “Speech Blocks”, ...

  16. Middleware levels (6) • Dialog components (cont.) • login procedure • one or more operations (steps) • each step produces or validates speaker hypotheses • procedure returns a speaker hypothesis with status • includes database lookup, etc. • enrollment procedure • special case of login procedure • enrollment operation is iterative when asking for data

  17. Middleware levels (7) • Dialog components (cont.) • “complex question”: • in CTT-bank • money amount • account name • yes/no

  18. ATLAS application, dialog engine (atlas.app) component APIs [atlas.rc.api] component interaction [atlas.rc / media / rc.audio / uinfo] services [atlas.app.SpeechIO / rc.api.AppResources] high-level prim. [atlas.app.SpeechActs] dialog comp. [atlas.login,..] speech-tech API [atlas.internal.rc] [atlas.broker.rc] [atlas.communicator.rc]

  19. ATLAS Core packages atlas.basic atlas.uinfo atlas.media atlas.rc atlas.terminal atlas.rc.audio atlas.rc.api atlas.app

  20. System model Application Terminal 1 Session Terminal 2 Terminal N Resources

  21. Project packages atlas.* broker.* atlas.internal.* atlas.broker.* cttbank.* per.*

  22. ATLAS Common platform CTT-bank, PER Generic dialogue management? speech-tech API component APIs SQL ASR ASV TTS audio coder audio device animated agent SQL datab. ASR ASV TTS desktop audio desktop audio animated agent ASR TTS ASR TTS

  23. SesaME

  24. SesaME – the playground • focus on simple task oriented dialogues • accessing information (personal, public) • controlling appliances & services • hypothesis - task oriented dialogues can be described in a formalised way

  25. ATLAS Common platform Application / Service platform dialogue descriptions Common platform Generic dialogue manager - SesaME speech-tech API SQL ASR ASV TTS audio coder audio device animated agent SQL datab. ASR ASV TTS desktop audio desktop audio animated agent ASR TTS ASR TTS

  26. SesaME - goals • platform for research & demonstrators • dialogue management • task oriented • generic, dynamic • asynchronous • support for • multi-domain approach • adaptations & personalisation • user modeling • situation awareness

  27. SesaME • features: • dynamic plug & play dialogues • modular, agent based architecture • information state approach • event based dialogue management • domain descriptions are based on extended VoiceXML descriptions

  28. Major components • Interaction manager – IM • controls the in formation flow • interaction management with • system components • user • Dialog engine - DE • dialogue interpretation • Application interface - AI • application specific component • communication with the application/service

  29. On start • AI – collects all available – Dialogue Descriptions • Dialogue Descriptions represented in an extended VoiceXML formalism • seminar.vxml, meeting.vxml, curs.vxml, visitor.vxml • IM - builds a register over available DD • the Dialogue Description Collection DDC • a vector is built on topics and associated keywords • ”seminarium”, ”möte”, ”besök”... • IM – controls the activation of the DD

  30. New utterance • ”Jag vill gå på Mats Blombergs seminarium.” • Prediction of the most plausible DD - • through topic prediction ”seminarium” • other mechanism are planed (context, user models) • DE activates the chosen DD • seminar.vxml • internal data structures – are created • DE performs the dialogue interpretation

  31. Interaction Manager • controls and synchronises the components • priority structures • topic prediction – predicts which DD to use • supervises the DE • may suggest plausible parameters based on the context & user models • supervises the interaction with the user • error detection, management • deadline management etc.

  32. Interaction Manager – How? • event based • autonomous modules (software agents) • carry out one atomic task each • are triggered by a set of preconditions • high level of parallelism • concurrency • cooperation • centralised information management - blackboard • all information is available for all modules • information is not destroyed • information handling through:prenumerate – notify – fetch mechanism

  33. Plug & play dialogues Application Interface Interaction Manager Dialoguedescription collection Keyword handler A-Agent Blackboard A-Agent Dialogue Engine VoiceXML notify VoiceXMLactivator (JAXB translator) A-Agent Dialogue bridge Dialog interpreter ATLAS Speech Technology API

  34. Dialogue Engine • Internal parallel slot structures • system prompt • acceptable answers • reprompts etc. • Parallel system slots • used for predictions, • available for UM, CM • Parallel application specific slots • related information • available for DKM

  35. Interpretation • go to next empty slot • ask the prompt • interpret the answer • fill the slot • … or re-prompt • if all slots filled - successful transaction • AI sends the required parameters, commands to the application • eventual next DD is activated • unsuccessful transaction • the DD with all parameters is saved • specific DD for error management is activated • error management

  36. What is left to be done? • NLP analysis to be integrated in Atlas and SesaME • NLP generation in SesaME • more elaborated dialogue management formalism in SesaME • support for adaptation and pesonalisation • enabling conversational dialogues

  37. The End

More Related