1 / 51

Dialogue Systems

Dialogue Systems. Julia Hirschberg CS 4705. Today. Dialogue Systems and Human Conversation Turns and Turn-taking Speech Acts and Dialogue Acts Grounding and Intentional Structure Pragmatics Presupposition Conventional Implicature Conversational Implicature. Dialogue System Applications.

pettifordj
Download Presentation

Dialogue Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dialogue Systems Julia Hirschberg CS 4705

  2. Today • Dialogue Systems and Human Conversation • Turns and Turn-taking • Speech Acts and Dialogue Acts • Grounding and Intentional Structure • Pragmatics • Presupposition • Conventional Implicature • Conversational Implicature

  3. Dialogue System Applications • Information providing • 800-BING-411, Google Mobile App, Amtrak’s Julie, • Customer Care • T-Mobile’s Call Center, AT&T Call Routing • Training • Language tutoring: e.g. Carnegie Speech, KTH Ville • Other research platforms: e.g. ItSpoke at UPitt • Fun and games…. • Goal: Emulate Human-Human Behavior?

  4. Today • Dialogue Systems and Human Conversation • Turns and Turn-taking • Speech Acts and Dialogue Acts • Grounding and Intentional Structure • Pragmatics • Presupposition • Conventional Implicature • Conversational Implicature

  5. Turn-taking Behavior • Dialogue characterized by turn-taking • How do speakers know what to say and when to say it? • Conversational partners expect certain patterns of behavior in normal conversation Pat: You got an A? That’s great! Chris: Yeah, I’m really smart you know. Chris: Well, I was just lucky I happened to read the chapter on dialogue systems right before the test. Otherwise I never would have squeaked through. • Deviation is significant: dispreferred utterances

  6. Children learn turn taking within first 2 years (Stern ’74) • General individual differences • Shy people pause longer and speak less and less often (Pilkonis ’77) • Schizophrenics, neurotics, depressed people less skilled in turn-taking

  7. Cultural Differences in Turn-Taking • Chinese telephone conversations • Openings (Zhu ’04) • Mandarin vs. British • Identification differences • British self-report • Chinese callees ask the caller • Closings (Sun ’05) • 39 female-female Mandarin telephone conversations • Closings initiated through matter-of-fact statement of intention to end conversation • Verbalized thanking occurs except in mother/daughter closings – not the standard English model • Finnish business calls (Halmari ’93) vs. American • Americans get right to the point • Finns chat

  8. Conversational Analysis (Sacks et al ’74) • Can we characterize expectations of ‘what to say’ more generally? • ‘Rules’ of turn-taking • If, during this turn the current speaker has selected A as the next speaker, then A must speak next • If the current speaker does not select the next speaker, any other speaker may take the next turn • If no one else takes the next turn, the current speaker may take the next turn • Rules Apply at Transition Relevance Places (TRPs) where something allows speaker changes to occur

  9. Where Can Speaker Shifts Occur • Adjacency pairs • Question/answer • Greeting/greeting • Compliment/downplayer • Dispreferred responses • Silence • ‘No’ to a simple request without explanation • Changing the topic abruptly without transition • Important for Spoken Dialogue Systems

  10. Diarization: Automatic Speaker Identification/Segmentation • Segment audio corpora (Broadcast News, meetings, telephone conversations) into speaker segments • Speaker segmentation • Speaker identification • Speech and music • Speaker segmentation (Diarization) • Initial segmentation • Segment clustering based on acoustic features • State-of-the-art: 8.47% error

  11. Speaker identification • Linguistic information to identify speaker types and speaker names (LIMSI ’04) • Templates (“<name> has this report from <location>”) • Results: 10.9% error on test set • But only 10% of segments contain relevant patterns • Estimate 25% error on broadcast news if segmentation and clustering is done to id all of each speaker’s segments

  12. Turn-taking Behaviors Important for SDS • System understanding: • Is the user backchanneling or is she taking the turn (does ‘ok’ mean ‘I agree’ or ‘I’m listening’)? • Is this a good place for a system backchannel? • System generation: • How to signal to the user that the system system’s turn is over? • How to signal to the user that a backchannel might be appropriate?

  13. Types of Behavior • Smooth Switch: S1 is speaking and S2 speaks and takes and holds the floor • Hold: S1 is speaking, pauses, and continues to speak • Backchannel: S1 is speaking and S2 speaks -- to indicate continued attention -- not to take the floor (e.g. mhmm, ok, yeah) • How do people coordinate these behaviors with their interlocutor? • Acoustic-prosodic and lexical cues….

  14. Smooth Switch, Backchannel, and Hold Differences

  15. Today • Dialogue Systems and Human Conversation • Turns and Turn-taking • Speech Acts and Dialogue Acts • Grounding and Intentional Structure • Pragmatics • Presupposition • Conventional Implicature • Conversational Implicature

  16. Speech Act Theory (Austin, Searle) • Locutionary acts: the act of uttering (semantic meaning) • Illocutionary acts: the act S intends to convey by the utterance (e.g. request, promise, statement) • Perlocutionary acts: the rhetorical act S intends the utterance to produce on H (e.g. regret, fear, hope) • Indirect Speech Acts (a type of illocutionary act): • It’s cold in here. • Can you tell me the time.

  17. NLP Speech Acts • Often identified with illocutionary force • Can be indicated by performative verbs • E.g. promise, order, ask, beseech, deny, apologize, curse • NB: Perlocutionary force cannot (I convince you to vote for me for president) • Searle’s ’75 taxonomy (assertives, directives, commissives, expressives, declarations) now vastly expanded

  18. Dialogue Acts in SDS • Roughly correspond to Illocutionary acts • Motivation: Improving Spoken Dialogue Systems • Many coding schemes (e.g. DAMSL) • Many-to-many mapping between DAs and words • Agreement DA can realized by Okay, Um, Right, Yeah, … • But each of these can express multiple DAs, e.g. S: You should take the 10pm flight. U: Okay …that sounds perfect. …but I’d prefer an earlier flight. …(I’m listening)

  19. DA recognition important for • Turn recognition (which grammar to use when) • Turn disambiguation, e.g. S: What city do you want to go to? U1: Boston. (reply) U2: Boston? (request for information) S: Do you want to go to Boston? U1: Boston. (confirmation) U2: Boston? (question)

  20. Automatic DA Detection • Rosset & Lamel ’04: Can we detect DAs automatically w/ minimal reliance on lexical content? • Lexicons are domain-dependent • ASR output is errorful • Corpora (3912 utts total) • Agent/client dialogues in a French bank call center, in a French web-based stock exchange customer service center, in an English bank call center

  21. DA tags (44) • Conventional (openings, closings) • Information level (items related to the semantic content of the task) • Forward Looking Function: • statement (e.g. assert, commit, explanation) • infl on Hearer (e.g. confirmation, offer, request) • Backward Looking Function: • Agreement (e.g. accept, reject) • Understanding (e.g. backchannel, correction) • Communicative Status (e.g. self-talk, change-mind) • NB: Each utt could receive a tag for each class, so utts represented as vectors • But…only 197 combinations observed

  22. Method: Memory-based learning (TIMBL) • Uses all examples for classification • Useful for sparse data • Features • Speaker identity • First 2 words of each turn • # utts in turn • Previously proposed DA tags for utts in turn • Results • With true utt boundaries: • ~83% accuracy on test data from same domain • ~75% accuracy on test data from different domain

  23. On automatically identified utt units: 3.3% ins, 6.6% del, 13.5% sub • Which DAs are easiest/hardest to detect?

  24. Conclusions • Strong ‘grammar’ of DAs in Spoken Dialogue systems • A few initial words perform as well as more

  25. Today • Dialogue Systems and Human Conversation • Turns and Turn-taking • Speech Acts and Dialogue Acts • Grounding and Intentional Structure • Pragmatics • Presupposition • Conventional Implicature • Conversational Implicature

  26. Grounding (Stalnaker ’78, Clark & Schaefer ’89) • Common Ground: the set of propositions mutually believed by S and H • Principle of Closure: agents performing an action require evidence that they have succeeded – and S needs to know when s/he has succeeded in communicating • Presentation of utterance by S • Acceptance of utterance by H • How does grounding take place in conversation?

  27. Grounding Strategies from Weak to Strong I need to get your homework by Monday. • Continued attention … • Next contribution I should be finished Sunday night. • Acknowledgment Mhmm… • Demonstration You need this soon. • Display You need to get my homework Monday.

  28. Discourse Structure and Intention Welcome to word processing. That’s using a computer to type letters and reports. Make a typo? No problem. Just back up, type over the mistake, and it’s gone. And, it eliminates retyping. And, it eliminates retyping.

  29. Structures of Discourse Structure (Grosz & Sidner ‘86) • Leading alternative to Rhetorical Structure Theory • Provides for multiple levels of analysis: S’s purpose as well as content of utterances and S and H’s attentional state • Identifies only a few, general relations that hold among intentions • Three components: • Linguistic structure • Intentional structure • Attentional structure

  30. Linguistic Structure • What is actually said/written • How is this represented? • Assume discourse is segmented into Discourse Segments (DS) -- how? • what is basic unit of analysis? • segmentation agreement • automatic segmentation • Embedding relations: topic structure • Cue phrases

  31. Intentional Structure • Discourse purpose (DP): basic purpose of the discourse • Discourse segment purposes (DSPs): how this segment contributes to the overall DP • Segment relations: • Satisfaction-precedence: DSP1 must be satisfied before DSP2 • Dominance: DSP1 dominates DSP2 if fulfilling DSP2 constitutes part of fulfilling DSP1

  32. Attentional State • Focus stack: • Stack of focus spaces, each containing objects, properties and relations salient during each DS, plus the DSP (content plus purpose) • State changes modeled by transition rules controlling the addition/deletion of focus spaces • Information at lower levels may or may not be available at higher levels • Focus spaces are pushed onto the stack when • new DS or embedded DS (e.g. DS that are dominated by other DS) are begun • popped when they are completed

  33. Limits of G&S ‘86 • Assumes that discourses are task-oriented • Assumes there is a single, hierarchical structure shared by S and H • How do we identify entities that are salient (on the focus stack)? Grammatical function? • Do people really build such structures when they converse? Use them in interpreting what others say?

  34. How are these structures recognized from a discourse? • Linguistic markers: • tense and aspect • cue phrases • intonational variation • Inference of S intentions • Inference from task structure • Intonational Information

  35. Today • Dialogue Systems and Human Conversation • Turns and Turn-taking • Speech Acts and Dialogue Acts • Grounding and Intentional Structure • Pragmatics • Presupposition • Conventional Implicature • Conversational Implicature

  36. Implicit Information • Question interpretation in SDS S: Are you traveling to La Guardia? U: I’m going to New York. U: When does the 5 o’clock train leave from Newark? S : <U believes there is a 5 o’clock train from Newark.> S: I heard you say New York City? U: New York City?

  37. Cooperative responses in SDS • Correcting misconceptions U: When does the 5 o’clock train leave from Newark? S (thinks): <U believes there is a 5 o’clock train from Newark> S: There is no 5 o’clock train from Newark; there is a 5:20 tho. • Providing more information than is asked for U: Do I have the $500 minimum in that account? S1: Yes. S2: You have $739.

  38. Discourse Pragmatics • Context-dependent meaning, invited inference, intended meaning – vs. “propositional content” • Indirect Speech Acts • Presupposition • Implicature • Conversational • Conventional

  39. Presupposition • What is `taken for granted’, given some linguistic expression X The King of France is bald. (Is there a King of France? All of Herman’s children are bright. (Does Herman have children?) • Linguistic Test: Negative, interrogative, and embedded X preserve the same assumption The King of France is not bald. Is the King of France bald? I thought that the King of France was bald.

  40. Presuppositions can be suspended but they cannot be felicitously denied All of Herman’s children are bright, if he indeed has children. *All of Herman’s children are bright, though he has no children.

  41. Presupposition and SDS • Presuppositional information adds facts/beliefs to the dialogue history • Information to store and check for accuracy • My wife will also be a driver (S has a spouse) • My number is 212-555-1212 (S has a telephone account) • I’ll take the red-eye (S believes there is a red-eye) • I’m upset about being charged for a call to Ethiopia (S was charged for a call to Ethiopia) • I’m a bachelor. (S is an unmarried male person)

  42. Conversational Implicature • H. Paul Grice: Conversation is not formal logic • and is not ‘^’, or is not ‘v’, some is not • George got married and had a baby. • Was it a boy or a girl? • Some people sent baby gifts. • Principles of Cooperative Conversation: Make your conversational contribution such as is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged

  43. Maxims of Cooperative Conversation • Maxim of Quantity: • 1. Make your contribution as informative as is required (for the current purposes of the exchange) • 2. Do not make your contribution more than is required. • Maxim of Quality: • Try to make your contribution one that is true. • 1. Do not say what you believe to be false. • 2. Do not say that for which you lack adequate evidence. • Maxim of Relation: Be relevant

  44. Maxim of Manner: Be perspicuous • 1. Avoid obscurity of expression. • 2. Avoid ambiguity. • 3. Be brief (avoid unnecessary prolixity). • 4. Be orderly. • Maxims may be • Observed John got into Columbia and won a scholarship. • Violated quietly I never said that. • Flouted He has excellent handwriting….

  45. Speakers may not be able to observe all maxims simultaneously • Implicature interpretation requires both S and H to understand the CP and Maxims • That which S licenses and H infers via the CP and the Maxims A. I got an A on that exam. B. And I’m Queen Marie of Rumania. A. Where did you go? B. Out.

  46. A: Where does Arnold live? B: Somewhere in southern California.

  47. Other Implicatures • Generalized Conversational, e.g. indefinites A car ran over John’s foot. (not John’s car) John broke a foot yesterday. (John’s foot) John broke a nose yesterday. (not his own) • Conventional George is short but brave. George is short; therefore he is brave.

  48. Summary • Dialogue Systems and Human Conversation • Turns and Turn-taking • Speech Acts and Dialogue Acts • Grounding and Intentional Structure • Pragmatics • Presupposition • Conventional Implicature • Conversational Implicature

  49. Spoken Language Processing • These are only a few of the challenges of Spoken Language Processing (CS 4706) • How does it go beyond CS 4705? • Speech analysis tools and techniques • Deception, charisma, emotional speech, medical states • Speech technologies • Text-to-Speech • Automatic Speech Recognition • Speaker ID • Language and dialect ID

  50. Project • Build a Spoken Dialogue System of your own • Choose the domain and task • Build a speech recognizer, a text-to-speech synthesis system, and a dialogue manager (from libraries) • Demo your system and maybe win a prize

More Related