1 / 19

Agust ín Gravano 1,2 Julia Hirschberg 1

Turn-Yielding Cues in Task-Oriented Dialogue. Agust ín Gravano 1,2 Julia Hirschberg 1. Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina. Introduction. Interactive Voice Response Systems. Quickly spreading. “Uncomfortable”, “awkward”.

tamika
Download Presentation

Agust ín Gravano 1,2 Julia Hirschberg 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Turn-Yielding Cuesin Task-Oriented Dialogue Agustín Gravano1,2 Julia Hirschberg1 • Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina

  2. Introduction Interactive Voice Response Systems • Quickly spreading. • “Uncomfortable”, “awkward”. • ASR+TTS account for most IVR problems. • Other problems revealed. • Coordination of system-user exchanges. • Long pauses after user turns; interruptions. • Modeling turn-taking behavior should lead to improved system-user coordination. Agustín Gravano SIGdial 2009

  3. Introduction Goal • Learn when the speaker is likely to end her/his conversational turn. • Find turn-yielding cues. • Cues displayed by the speaker when approaching a potential turn boundary. • This should improve the coordination of IVRs: • Speech understanding: Detect the end of the user’s turn. • Speech generation: Display cues signalling the end of system’s turn. Agustín Gravano SIGdial 2009

  4. Talk Outline • Previous work • Material • Method • Results • Conclusions Agustín Gravano SIGdial 2009

  5. Previous Work on Turn-Taking • Duncan 1972, 1973, 1974, inter alia. • Hypothesized 6 turn-yielding cues in face-to-face dialogue. • Conjectured a linear relation between the number of displayed cues and the likelihood of a turn-taking attempt. • Studies formalized and verified some of Duncan’s hypotheses.[For&Tho96; Wen&Sie03; Cut&Pea86; Wic&Cas01] • Implementations of turn-boundary detection. • Simulations[Fer&al.02,03; Edl&al.05; Sch06; Att&al.08; Bau08] • Actual systems: Let’s Go![Rau&Esk08] • Exploiting turn-yielding cues improves performance. Agustín Gravano SIGdial 2009

  6. Material Columbia Games Corpus • 12 task-oriented spontaneous dialogues. • Standard American English. • 13 subjects: 6 female, 7 male. • Series of collaborative computer games. • No eye contact. No speech restrictions. • 9 hours of dialogue. • Manual orthographic transcription, alignment. • Manual prosodic annotations (ToBI). Agustín Gravano SIGdial 2009

  7. Material Columbia Games Corpus Player 1: Describer Player 2: Follower Agustín Gravano SIGdial 2009

  8. Turn-Yielding Cues • Cues displayed by the speaker when approaching a potential turn boundary. Agustín Gravano SIGdial 2009

  9. Hold Smooth switch IPU1 IPU2 Speaker A: IPU3 Speaker B: Turn-Yielding Cues Method • IPU (Inter Pausal Unit): Maximal sequence of words from the same speaker surrounded by silence ≥ 50ms. • Smooth switch: Speaker A finishes her utterance; speaker B takes the turn with no overlapping speech. • Trained annotators distinguished Smooth switches from Interruptions and Backchannels using a scheme based on Ferguson 1977, Beattie 1982. Agustín Gravano SIGdial 2009

  10. Hold Smooth switch IPU1 IPU2 Speaker A: IPU3 Speaker B: Turn-Yielding Cues Method • To find turn-yielding cues, we compare: • IPUs preceding Holds, • IPUs preceding Smooth switches. • ~200 features: acoustic, prosodic, lexical, syntactic. Agustín Gravano SIGdial 2009

  11. Turn-Yielding Cues Individual Cues • Final intonation: • Falling (L-L%) or high-rising (H-H%). • Faster speaking rate. • Reduction of final lengthening. • Lower intensity level. • Lower pitch level. • Higher jitter, shimmer, NHR. • Related to perception of voice quality. • Longer IPU duration (seconds and #words). Agustín Gravano SIGdial 2009

  12. Before smooth switches: Before holds: Incomplete 18% Complete 47% 53% 82% (X2 test, p ~ 0) Turn-Yielding Cues Individual Cues • Textual completion (independent of intonation). (1) Manually annotated a portion of the data. Labelers read up to the end of a target IPU (no right context), judged whether it could constitute a ‘complete’ utterance. 400 tokens. K=0.81. (2) Trained an SVM classifier.19 lexical + syntactic features.Accuracy: 80%. Maj-class baseline: 55%. Human agreement: 91%. (3) Labeled all IPUs in the corpus with the SVM model. Agustín Gravano SIGdial 2009

  13. Turn-Yielding Cues Individual Cues • Final intonation: L-L% or H-H%. • Faster speaking rate. • Lower intensity level. • Lower pitch level. • Higher jitter, shimmer, NHR. • Longer IPU duration. • Textual completion. Agustín Gravano SIGdial 2009

  14. Turn-Yielding Cues Defining Presence of a Cue • 2-3 representative features for each cue: • Define presence/absence based on whether the value is closer to the mean before S or H. Agustín Gravano SIGdial 2009

  15. Top Frequencies of Complex Cues digit == cue present dot == cue absent Turn-yielding cues: 1: Final intonation 2: Speaking rate 3: Intensity level 4: Pitch level 5: IPU duration 6: Voice quality 7: Completion Agustín Gravano SIGdial 2009

  16. Turn-Yielding Cues Combined Cues r2=0.969 Percentage of turn-taking attempts Number of cues conjointly displayed Agustín Gravano SIGdial 2009

  17. Turn-Yielding Cues IVR Systems • After each IPU from the user: if estimated likelihood > threshold then take the turn • To signal the end of a system’s turn: Include as many cues as possible in the system’s final IPU. Agustín Gravano SIGdial 2009

  18. Summary • Study of turn-yielding cues. • Objective, automatically computable. • Combined cues. • Improve turn-taking decisions of IVR systems. • Results drawn from task-oriented dialogues. • Not necessarily generalizable. • Suitable for most IVR domains. • Interspeech 2009: Study of backchannel-inviting cues. Agustín Gravano SIGdial 2009

  19. Special thanks to… • Julia Hirschberg • Thesis Committee Members • Maxine Eskenazi, Kathy McKeown, Becky Passonneau, Amanda Stent. • Speech Lab at Columbia University • Stefan Benus, Fadi Biadsy, Sasha Caskey, Bob Coyne, Frank Enos, Martin Jansche, Jackson Liscombe, Sameer Maskey, Andrew Rosenberg. • Collaborators • Gregory Ward and Elisa Sneed German (Northwestern U); Ani Nenkova (UPenn); Héctor Chávez, David Elson, Michel Galley, Enrique Henestroza, Hanae Koiso, Shira Mitchell, Michael Mulley, Kristen Parton, Ilia Vovsha, Lauren Wilcox. Agustín Gravano SIGdial 2009

More Related