1 / 27

Adapting and Learning Dialogue Models

Adapting and Learning Dialogue Models. Discourse & Dialogue CMSC 35900-1 November 19, 2006. Roadmap. The Problem: Portability Task domain: Call-routing Porting: Speech recognition Call-routing Dialogue management Conclusions Learning DM strategies HMMs and POMDPs. SLS Portability.

cheyennet
Download Presentation

Adapting and Learning Dialogue Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adapting and Learning Dialogue Models Discourse & Dialogue CMSC 35900-1 November 19, 2006

  2. Roadmap • The Problem: Portability • Task domain: Call-routing • Porting: • Speech recognition • Call-routing • Dialogue management • Conclusions • Learning DM strategies • HMMs and POMDPs

  3. SLS Portability • Spoken language system design • Record or simulate user interactions • Collect vocabulary, sentence style, sequence • Transcribe/label • Expert creates vocabulary, language model, dialogue model • Problem: Costly, time-consuming, expert

  4. Call-routing • Goal: Given an utterance, identify type • Dispatch to right operator • Classification task: • Manual rules or data-driven methods • Feature-based classification (Boosting) • Pre-defined types, e.g.: • Hello? -> Hello; I have a question -> request(info) • I would like to know my balance. > request(balance)

  5. Dialogue Management • Flow Controller • Pluggable dialogue strategy modules • ATN: call-flow, easy to augment, manage context • Inputs: context, semantic rep. of utterance • ASR • Language models • Trigrams, in probabilistic framework

  6. Adaptation: ASR • ASR: Language models • Usually trained from in-domain transcriptions • Here: out-of-domain transcriptions • Switchboard, spoken dialog (telecomm, insur) • In-domain web pages • New domain: pharmaceuticals • Style differences: SLS:pronouns; OOV: med best • Best accuracy: spoken dialogue+web • SWBD too big/slow

  7. Adaptation: Call-routing • Manual tagging: Slow, expensive • Here: Existing out-of-domain labeled data • Meta call-types: Library • Generic: all apps • Re-usable: in-domain, but already exist • Specific: only this app • Grouping done by experts • Bootstrap: Start with generic, reusable

  8. Call-type Classification • Boostexter: word n-gram features; 1,100 iter • ASR output basis • Telecomm based call-type library • Two classifications: reject-yn; classification • In-domain: true: 78%; ASR: 62% • Generic: test on generic: 95%; 91% • Bootstrap: generic+reuse+rules: 79%, 68%

  9. Dialogue Model • Build dialogue strategy templates • Based on call-type classification • Generic: • E.g.. Yes, no, hello, repeat, help • Cause generic context dependent reply • Tag as vague/concrete: • Vague: “I have a question” -> clarification • Concrete:clear routing, attributes – sub-dialogs

  10. Dialogue Model Porting • Evaluation: • Compare to original transcribed dialogue • Task 1: DM category: 32 clusters of calls • Bootstrap 16 categories – 70% of instances • Using call-type classifiers: get class, conf, concrete? • If confident/concrete/correct -> correct; • If incorrect, error • Also classify vague/generic • 67-70% accuracy for DM, routing task

  11. Conclusions • Portability: • Bootstrapping of ASR, Call-type, DM • Generally effective • Call-type success high • Others: potential

  12. Learning DM Strategies • Prior approaches: • Hand-coded: state-, frame- or agent-based • Adaptation bootstraps from existing structure • Alternative: • Capture prior interaction patterns • Learn dialogue structure and management

  13. Training HMM DM • Construct training corpus • E.g. Record human-human interactions • Identify and label states • Train HMM dialogue management • Use tagged sequences to learn • Correspondences between utterances and states • State transition probabilities • Effective, still requires initial tagging

  14. Reinforcement Learning • Model dialogues with (partially observable) Markov decision processes • Users form stochastic env, • Actions are system utterances, • State is dialogue so far • Goal: maximize some utility measure • Task completion/user satisfaction • Learn policy – implemented as actions in state • That optimizes utility measure

  15. Applications • Toot – train information • Litman, Kearns, et al • Learned different initiative/confirmation strategies • Air travel bookings (Young et al 2006) • Problem: huge number of possible states • More airports, dramatically more possible utts • Approach: Collapse all alternative slot fillers • Represent with single default

  16. Turn-taking Discourse and Dialogue CS 35900-1 November 16, 2004

  17. Agenda • Motivation • Silence in Human-Computer Dialogue • Turn-taking in human-human dialogue • Turn-change signals • Back-channel acknowledgments • Maintaining contact • Exploiting to improve HCC • Automatic identification of disfluencies, jump-in points, and jump-ins

  18. Turn-taking in HCI • Human turn end: • Detected by 250ms silence • System turn end: • Signaled by end of speech • Indicated by any human sound • Barge-in • Continued attention: • No signal

  19. Yielding & Taking the Floor • Turn change signal • Offer floor to auditor/hearer • Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause • Likelihood of change increases with more cues • Negated by any gesticulation • Speaker-state signal: • Shift in head direction AND/OR Start of gesture

  20. Retaining the Floor • Within-turn signal • Still speaker: Look at hearer as end clause • Continuation signal • Still speaker: Look away after within-turn/back • Back-channel: • ‘mmhm’/okay/etc; nods, • sentence completion. Clarification request; restate • NOT a turn: signal attention, agreement, confusion

  21. Improving Human-Computer Turn-taking • Identifying cues to turn change and turn start • Meeting conversations: • Recorded, natural research meetings • Multi-party • Overlapping speech • Units = “Spurts” between 500ms silence

  22. Tasks • Sentence/disfluency/non-boundary ID • End of sentence, break off, continue • Jump-in points • Times when others “jump in” • Jump-in words • Interruption vs start from silence • Off- and on- line • Language model and/or prosodic cues

  23. Text + Prosody • Text sequence: • Modeled as n-gram language model • Hidden event prediction – e.g. boundary as hidden state • Implement as HMM • Prosody: • Duration, Pitch, Pause, Energy • Decision trees: classify + probability • Integrate LM + DT

  24. Interpreting Breaks • For each inter-word position: • Is it a disfluency, sentence end, or continuation? • Key features: • Pause duration, vowel duration • 62% accuracy wrt 50% chance baseline • ~90% overall • Best combines LM & DT

  25. Jump-in Points • (Used) Possible turn changes • Points WITHIN spurt where new speaker starts • Key features: • Pause duration, low energy, pitch fall • No lexical/punctuation features used • Forward features useless • Look like SB but aren’t • Accuracy: 65% wrt 50% baseline • Performance depends only on preceding prosodic features

  26. Jump-in Features • Do people speak differently when jump-in? • Differ from regular turn starts? • Examine only first words of turns • No LM • Key features: • Raised pitch, raised amplitude • Accuracy: 77% wrt 50% baseline • Prosody only

  27. Summary • Prosodic features signal conversational moves • Pause and vowel duration distinguish sentence end, disfluency, or fluent continuation • Jump-ins occur at locations that sound like sent. ends • Raise voice when jump in

More Related