1 / 35

Intonational Variation in Spoken Dialogue Systems

Intonational Variation in Spoken Dialogue Systems. Generation and Understanding Julia Hirschberg Charles University March 2001. Talking to a Machine….and Getting an Answer.

oshin
Download Presentation

Intonational Variation in Spoken Dialogue Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intonational Variation in Spoken Dialogue Systems Generation and Understanding Julia Hirschberg Charles University March 2001

  2. Talking to a Machine….and Getting an Answer • Today’s spoken dialogue systems make it possible to accomplish real tasks, over the phone, without talking to a person • Real-time speech technology enables real-time interaction • Speech recognition and understanding is ‘good enough’ for limited, goal-directed interactions • Careful dialogue design can be tailored to capabilities of component technologies • Limited domain • Judicious use of system initiative vs. mixed initiative

  3. Some RepresentativeSpoken Dialogue Systems Deployed Brokerage (Schwab-Nuance) Mixed Initiative User E-MailAccess (myTalk) System Initiative Air Travel (UA Info-SpeechWorks) Directory Assistant (BNR) Communicator (DARPA Travel) MIT Galaxy/Jupiter Communications (Wildfire, Portico) Customer Care (HMIHY – AT&T) Banking (ANSER) ATIS (DARPA Travel) Multimodal Maps (Trains, Quickset) Train Schedule (ARISE) 1980+ 1990+ 1993+ 1995+ 1997+ 1999+

  4. But we have a long way to go…

  5. Course Overview • Spoken Dialogue Systems today • Evaluating their weaknesses • Role of intonational variation • Importance of corpora and conventions for annotating them • Intonational ‘meanings’ • Prosody in Speech Generation • Prosody in Speech Recognition/ Understanding

  6. Course Overview • Spoken Dialogue Systems today • Evaluating their strengths and weaknesses • Role of intonational variation • Importance of corpora and conventions for annotating them • Intonational ‘meanings’ • Prosody in Speech Generation • Prosody in Speech Recognition/ Understanding

  7. Evaluating Dialogue Systems • PARADISE framework (Walker et al ’00) • “Performance” of a dialogue system is affected both by whatgets accomplished by the user and the dialogue agent and howit gets accomplished Maximize Task Success Minimize Costs Efficiency Measures Qualitative Measures

  8. Task Success • Task goals seen as Attribute-Value Matrix • ELVIS e-mail retrieval task(Walker et al ‘97) • “Find the time and place of your meeting with Kim.” Attribute Value Selection Criterion Kim or Meeting Time 10:30 a.m. Place 2D516 • Task success defined by match between AVM values at end of with “true” values for AVM

  9. Metrics • Efficiency of the Interaction:User Turns, System Turns, Elapsed Time • Quality of the Interaction: ASR rejections, Time Out Prompts, Help Requests, Barge-Ins, Mean Recognition Score (concept accuracy), Cancellation Requests • User Satisfaction • Task Success: perceived completion, information extracted

  10. Experimental Procedures • Subjects given specified tasks • Spoken dialogues recorded • Cost factors, states, dialog acts automatically logged; ASR accuracy,barge-in hand-labeled • Users specify task solution via web page • Users complete User Satisfaction surveys • Use multiple linear regression to model User Satisfaction as a function of Task Success and Costs; test for significant predictive factors

  11. Was Annie easy to understand in this conversation? (TTS Performance) In this conversation, did Annie understand what you said? (ASR Performance) In this conversation, was it easy to find the message you wanted? (Task Ease) Was the pace of interaction with Annie appropriate in this conversation? (Interaction Pace) In this conversation, did you know what you could say at each point of the dialog? (User Expertise) How often was Annie sluggish and slow to reply to you in this conversation? (System Response) Did Annie work the way you expected her to in this conversation? (Expected Behavior) From your current experience with using Annie to get your email, do you think you'd use Annie regularly to access your mail when you are away from your desk? (Future Use) User Satisfaction:Sum of Many Measures

  12. Performance Functions from Three Systems • ELVIS User Sat.= .21* COMP + .47 * MRS - .15 * ET • TOOT User Sat.= .35* COMP + .45* MRS - .14*ET • ANNIE User Sat.= .33*COMP + .25* MRS +.33* Help • COMP: User perception of task completion (task success) • MRS: Mean recognition accuracy (cost) • ET: Elapsed time (cost) • Help: Help requests (cost)

  13. Performance Model • Perceived task completion and mean recognition score are consistently significant predictors of User Satisfaction • Performance model useful for system development • Making predictions about system modifications • Distinguishing ‘good’ dialogues from ‘bad’ dialogues • But can we also tell on-line when a dialogue is ‘going wrong’

  14. Course Overview • Spoken Dialogue Systems today • Evaluating their weaknesses • Role of intonational variation • Importance of corpora and conventions for annotating them • Intonational ‘meanings’ • Prosody in Speech Generation • Prosody in Speech Recognition/ Understanding

  15. How to Predict Problems ‘On-Line’? • Evidence of system misconceptions reflected in user responses (Krahmer et al ‘99, ‘00) • Responses to incorrect verifications • contain more words (or are empty) • show marked word order (especially after implicit verifications) • contain more disconfirmations, more repeated/corrected info • ‘No’ after incorrect verifications vs. other ynq’s • has higher boundary tone • wider pitch range • longer duration • longer pauses before and after • more additional words after it

  16. User information state reflected response (Shimojima et al ’99, ‘01) • Echoic responses repeat prior information – as acknowledgment or request for confirmation S1: Then go to Keage station. S2: Keage. • Experiment: • Identify ‘degree of integration’ and prosodic features (boundary tone, pitch range, tempo, initial pause) • Perception studies to elicit ‘integration’ effect • Results: fast tempo, little pause and low pitch signal high integration

  17. Can Prosodic Information Help Identify Dialogue System Problems ‘On Line’?

  18. Motivation • Prosody conveys information about: • The state of the interaction: • Is the user having trouble being understood? • Is the user having trouble understanding the system? • What the speaker is trying to convey • Is this a statement or a question? • The structure of the dialogue • Is the user or the system trying to start a new topic? • The emotions of the speaker • Is the speaker getting angry, frustrated?

  19. Past Research Issues and Applications • How prosodic variation influences ‘meaning’ • Focus or contrast • Given/new • How prosodic variation is related to other linguistic components • Syntax • Semantics • How to model prosodic variation effectively • Applications: Text-to-Speech

  20. Current Trends • New description schemes (e.g. ToBI) • Corpus-based research and machine learning • Emphasis on evaluation of algorithms and systems (NLE ‘00 special issue) • Investigation of spontaneous speech phenomena and variation in speaking style • Applications to CTS, ASR and SDS

  21. Course Overview • Spoken Dialogue Systems today • Evaluating their weaknesses • Role of intonational variation • Importance of corpora and conventions for annotating them • Intonational ‘meanings’ • Prosody in Speech Generation • Prosody in Speech Recognition/ Understanding

  22. Corpora • Public and semi-public databases • ATIS, SwitchBoard, Call Home (NIST/DARPA/LDC) • TRAINS/TRIPS (U. Rochester) • FM Radio (BU) • Private collections • Acquired for speech or dialogue research (e.g. August, Gustafson & Bell ’00) • Meeting, call center, focus group collections • Accidentally collected • The Web • Mud/Moo dialogues

  23. To(nes and)B(reak)I(ndices) • Developed by prosody researchers in four meetings over 1991-94 • Goals: • devise common labeling scheme for Standard American English that is robust and reliable • promote collection of large, prosodically labeled, shareable corpora • ToBI standards also proposed for Japanese, German, Italian, Spanish, British and Australian English,....

  24. Minimal ToBI transcription: • recording of speech • f0 contour • ToBI tiers: • orthographic tier: words • break-index tier: degrees of junction (Price et al ‘89) • tonal tier: pitch accents, phrase accents, boundary tones (Pierrehumbert ‘80) • miscellaneous tier: disfluencies, non-speech sounds, etc.

  25. Sample ToBI Labeling

  26. Online training material,available at: • http://www.ling.ohio-state.edu/phonetics/ToBI/ • Evaluation • Good inter-labeler reliability for expert and naive labelers: 88% agreement on presence/absence of tonal category, 81% agreement on category label, 91% agreement on break indices to within 1 level (Silverman et al. ‘92,Pitrelli et al ‘94)

  27. Course Overview • Spoken Dialogue Systems today • Evaluating their weaknesses • Role of intonational variation • Importance of corpora and conventions for annotating them • Intonational ‘meanings’ • Prosody in Speech Generation • Prosody in Speech Recognition/ Understanding

  28. Pitch Accent/Prominence in ToBI • Which items are made intonationally prominent and how? • Accent type: • H* simple high (declarative) • L* simple low (ynq) • L*+H scooped, late rise (uncertainty/ incredulity) • L+H* early rise to stress (contrastive focus) • H+!H* fall onto stress (implied familiarity)

  29. Downstepped accents: • !H*, L+!H*, L*+!H • Degree of prominence: • within a phrase: HiF0 • across phrases

  30. Functions of Pitch Accent • Given/new information • S: Do you need a return ticket. • U: No, thanks, I don’t need a return. • Contrast (narrow focus) • U: No, thanks, I don’t need a RETURN…. (I need a time schedule, receipt,…) • Disambiguation of discourse markers • S: Now let me get you the train information. • U: Okay (thanks) vs. Okay….(but I really want…)

  31. Prosodic Phrasing in ToBI • ‘Levels’ of phrasing: • intermediate phrase: one or more pitch accents plus a phrase accent (H- or L- ) • intonational phrase: 1 or more intermediate phrases + boundary tone (H% or L% ) • ToBI break-index tier • 0 no word boundary • 1 word boundary • 2 strong juncture with no tonal markings • 3 intermediate phrase boundary • 4 intonational phrase boundary

  32. Functions of Phrasing • Disambiguates syntactic constructions, e.g. PP attachment: • S: You should buy the ticket with the discount coupon. • Disambiguates scope ambiguities, e.g. Negation: • S: You aren’t booked through Rome because of the fare. • Or modifier scope: • S: This fare is restricted to retired politicians and civil servants.

  33. Contours: Accent + Phrasing • What do intonational contours ‘mean’ (Ladd ‘80, Bolinger ‘89)? • Speech acts (statements, questions, requests) S: That’ll be credit card? (L* H- H%) • Propositional attitude (uncertainty, incredulity) S: You’d like an evening flight.(L*+H L- H%) • Speaker affect (anger, happiness, love) U: I said four SEVEN one! (L+H* L- L%) • “Personality” S: Welcome to the Sunshine Travel System.

  34. Pitch Range and Timing • Level of speaker engagement • S: Welcome to InfoTravel. How may I help you? • Contour interpretation • S: You can take the L*+H bus from Malpensa to Rome L-H%. • U: Take the bus. vs. Take the bus! • Discourse/topic structure

  35. Can systems make use of this information?Can they produce it??Can they recognize it??

More Related