Spoken Language Generation: Information Presentation

Spoken Language Generation:Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Introduction • Much of the research in SDS has focused on the information gathering phase of the dialogue • Communicator evaluation indicates that task duration is negatively correlated with user satisfaction (r = -0.31, p < .001) (Walker, Passonneau, Boland, ACL 2001) • What contributes most to dialogue duration?

Communicator 2001 Dataset

Typical information presentation SYS: I found 9 round trips from Hartford to Orlando on 31st and returning Hartford on November 6th. For option 1, I have an outbound flight with US Air departing at 7 AM and arriving at 9 49 AM non-stop. A return flight is with U S Air departing at 10 30 AM and arriving at 12 28 AM with 1 stopover. Ticket price is 165 dollars. Please say “next option”, or, “flight details”, or, “I’ll take it”. USER: NEXT OPTION SYS: For option 2, I have an outbound flight with U S Air departing at 3 30 PM and arriving at 6 19 PM nonstop. A return flight is with U S Air departing at 8 25 PM and arriving at 12 28 AM with 1 stopover. Ticket price is 165 dollars. Say “next option”, or, “flight details”, or, “I’ll take it”. USER: NEXT OPTION SYS: . . .

Current spoken dialogue systems • Presents each options and selected attributes individually • No comparisons • Relies on users to remember trade-offs between multiple options • Places large memory burden on users • Potentially large payoff for improvements in information presentation

Improving Information Presentation • Apply techniques from text generation • Allows customization to user(s) and discourse history • Improve quality of synthesis by using NL generator to provide info about both meaning and linguistic structure of utterance • Overview of talk: • Case study: FLIGHTS system • Statistical approaches to generation See also Computer Speech and Language (2002) 16. Special Issue on Spoken Language Generation

What NLG Can Do For You… User: I want to travel from Edinburgh to Brussels, arriving by 5 pm. System: There’s a direct flight on BMI with a good price. It arrives at four ten p.m. and costs one hundred and twelve pounds. The cheapest flight is on Ryanair. It arrives at two p.m. and it costs just fifty pounds, but you’d need to connect in Dublin. For a starving student System: You can fly business class on British Airways, arriving at four twenty p.m., but you’d need to connect in London Heathrow. There is a direct flight on BMI, arriving at four ten p.m., but there’s no availability in business class. For a business traveller

Sentence Realizer Text Planner Knowledge Sources Comm Goals Content Selection Discourse Planning Discourse Strategies Dialogue History Domain Model User Model Text Plan Sentence Planner Linguistic Knowledge Sources Aggregation Referring Expression Gen Lexical Choice Aggregation Rules Referring Expression Generation Algorithm Lexicon Grammar Sentence Plan(s) English

ASR (HTK) FLIGHTS architecture Semantic Interpretation User Input Text String Natural Language Understander (Word spotting) Dialogue Manager (DIPPER) Comm Goals Response Generator System Response TTS (Festival) Content Selection Sentence Planner (XSLT) Text Planning (O-Plan) Realizer (OpenCCG) Text String w/ APML Markup User Model Flight DB

Customization happens everywhere • Content selection: what flights and attributes to present to user • Discourse planning: ordering of content, discourse relations • Referring Expression Generation: e.g., The cheapest flight, the five-fifteen, a KLM flight • Aggregation: grouping propositions into clauses and sentences, e.g., There’s a KLM flight arriving Brussels at ten to five, but business class is not available and you’d need to connect in Amsterdam • Discourse cues: e.g., Although, because, but • Scalar Adjectives: e.g., good price, just fifty pounds

Content Selection • Need a domain (or genre) specific method for determining what to say • In FLIGHTS: • Rank options based on predicted utility for the user • Select all options whose value is over a threshold • Select attributes that contribute most to value of selected options (Moore, Foster, Lemon & White, FLAIRS 2004, Carenini & Moore, AI Journal, 2006)

Discourse Planning • Using discourse strategies for producing user-adapted recommendations, comparisons • Produces text plans consisting of basic dialogue acts and rhetorical relations • Orders presentation of options • Groups attributes into positive and negative lists for contrasts • Selects attributes to identify flights • a direct flight, the cheapest flight, the KLM flight • Marks items as theme/rheme for information structure

Information Structure • Theme/Rheme • Theme: part of utterance that connects it to prior discourse • Rheme: part of utterance that advances the discussion by contributing novel information • Theme and rheme phrases marked by distinctive combinations of pitch accents and boundary tones • Focus/Background • Focus: words whose interpretations contribute to distinguishing the theme or rheme from other contextually available alternatives; marked by pitch accents • Background: the unmarked parts of themes and rhemes (Steedman 1991-2002)

Examples • Ex 1: I know when the Ryanair flight LEAVES, but when does it ARRIVE? (The Ryanair flight ARRIVESfocus)theme (at FIVEfocus)rheme L+H* LH% H* LL% • Ex 2: I know the KLM flight arrives at FOUR, but which flight arrives at FIVE? (The RYANAIRfocus flight)rheme (arrives at FIVEfocus)theme H* LL% L+H* LH%

Assigning theme/rheme in FLIGHTS • First option all rheme • Subsequent items: • Identifying, contrastive information is theme • Implements notion of an implicit Question Under Discussion, e.g., After presenting a flight that’s not direct, there’s an implicit question: Are there any direct flights? You can fly business class on British Airways, arriving at four twenty p.m., but you’d need to connect in Manchester. [There’s a DIRECT flight]theme on BMI, arriving at four ten p.m., but there’s no availability in business class.

Controlling Intonation with OpenCCG • OpenCCG realizer adapts previous work on chart realization to CCG, enabling CCG’s unique accounts of coordination and intonation to be employed in NLG systems • Uses information structure to determine types and locations of pitch accents and boundary tones • Measures similarity of realizations to n-gram language model • Treats agenda as priority queue ordered by n-gram scores • Yields best-first anytime algorithm: returns best scoring realization at “any time”, for interactive applications (White and Baldridge, EWNLG9 2003; White, INLG 2004; White, RLaC 2006)

The cheapest(L+H*) flight(LH) is on Ryanair(H* LH). It arrives at two p.m (H* LH) and it costs just fifty(H*) pounds(H* LH), but you’d need to connect(H*) in Dublin(H* LL). unit selection limited domain with APML markup Even though the first(L+H*) flight is not on BMI(L+H* LH), it is the cheapest(H*) one available(LH). unit selection limited domain with APML markup Examples

Q: I'd like a cheap flight from Frankfurt to Geneva, please. And I'd prefer to fly direct. A: There's a direct flight on Lufthansa with a good price, arriving in Geneva at ten thirty nine am and it costs two hundred and fifty five pounds. The cheapest flight is on Air France arriving at one twenty five pm and it costs only one hundred and five pounds, but it requires a connection in Paris Charles de Gaulle. limited domain limited domain with APML markup Examples

Is Tailoring Effective? Evaluation in MATCH Project: • Restaurant recommendation system built using same user modeling techniques • Subjects heard dialogues where recommendations and comparisons were based on own user model or random other model • Subjects judge tailored responses significantly higher • Information quality: System’s response is easy to understand and provides exactly the information I am interested in when choosing a restaurant. • Ranking confidence: Recommended restaurant is somewhere I would like to go. (Walker, Whittaker, Stent, Maloor, Moore, Johnson, Vassiredy, Cognitive Science 28, 2004)

Does Intonation Matter? • Affects meaning • “She only ATE the banana” vs. • “She only ate the BANANA” • Human judgements of output in travel domain show that overall German speech produced with GToBI markup judged better than default intonation (Kruijff-Korbayova, EACL03) • Naturalness • (Ease of comprehension)

Evaluation • Compared three synthesizers • Unit Selection Multisyn • Limited Domain • Limited Domain APML • Hypotheses: • LD_APML >> USM • LD_APML > LD (Neide Franca Rocha, MSc, 2004)

Results: US vs. LD_APML

Results: LD vs. LD_APML

Using N-gram LM in Generation

UM Approach to Info Pres + UM provides information users want to make choices with high confidence + Enables concise presentation of options and their tradeoffs + Users prefer recommendations tailored to their model • Doesn’t scale to large number of options • Does not provide users with an overview of options • Users may (perceive) that they’ve missed out on options

Summarize-and-Refine Approach • Clusters options in database • Based on attributes that yield smallest number of clusters • Summarize these clusters • enumerate, majority, count, singleton • User provides additional constraints (Polifroni, Chung & Seneff, Eurospeech 2003, Chung ACL 2004)

Example User: What restaurants do you have? Sys: I found 983 restaurants. Most of them are located in Boston and Cambridge. There are 32 choices for cuisine. I also have information about price range. User: Okay tell me about the ones in Boston. Sys: I found 401 restaurants in Boston. There are 29 choices for cuisine. User: Do you have any that serve seafood? Sys: I have found 19 seafood restaurants. They are predominantly in the Back Bay, the North End, South Boston and the South End. User: In the South End. Sys: I have found 3 restaurants. Grill Fish on …

SR Approach: + Useful when users don’t know the structure of the database + Domain independent • no fixed set of attributes • make dynamic decisions based on analysis of data -Suboptimal choice of attribute(s) for summarization - Exploration of tradeoffs difficult - May include irrelevant entities

The Combined UM+SR Approach • select relevant options • structure them based on ranking of attributes in user model • automatically determine tradeoffs • tailor summaries • improve overview of options space by briefly summarizing irrelevant options (Demberg and Moore, EACL 2006)

Example Dialogue Turn User Profile “student”: 1 price 2 number of legs departure time arrival time travel time 6 airline fare class layover airport User: I'd like to book a flight from Edinburgh to Brussels for tomorrow. Sys: Ryan Air offers the cheapest flights to Brussels. They cost just 49 pounds but you need to connect in Dublin. There are two flights to choose from. There's an early flight leaving Edinburgh at 8:05 am and arriving at 12:45 pm. To leave later, you can take the 1pm flight arriving Brussels at 5:30 pm. If you want to fly direct, there's a flight on BMI that leaves Edinburgh at 12 pm. It arrives at 1:35 pm and costs 112 pounds. All other flights are more expensive.

Evaluation Within-participants laboratory experiment • 38 subjects • Participants read 6 dialogue pairs (UM+SR vs. SR) Four questionsafter each dialogue: • Did the system give the information in a way that was easy to understand? • Did the system give X a good overview of the available options? • Do you think there were better options the system did not tell X about? • How quickly did the system allow X to find the optimal flight? Forced-choice question after each pair: • Which system would you recommend to a friend?

Results - Forced Choice Q.

Results - Likert Scale Questions 7 . 0 0 Significance levels using two-tailed paired t-test Q2: p = 0.97 Q3: p < 0.0001 Q4: p < 0.0001 Q5: p < 0.001 U M + S R S R 6 . 0 0 5 . 0 0 Mean Likert Scale Value 4 . 0 0 3 . 0 0 2 . 0 0 1 . 0 0 Q 2 : U n d e r - Q 3 : Q 4 : C o n - Q 5 : Q u i c k s t a n d a b i l i t y O v e r v i e w f i d e n c e a c c e s s ( 1 - 3 s c a l e )

Exp 2: Overhearer mode Significance levels using two-tailed paired t-test Q2: p = 0.24 Q3: p < 0.01 Q4: p < 0.002 Q5: p < 0.10 7.00 5.82 5.68 5.67 6.00 5.34 5.24 4.71 5.00 4.00 Mean Likert Scale Value 3.00 2.42 2.31 2.00 1.00 Q2: Q3: Q4: Q5: Quick Understandability Overview Confidence Access

Summary Integration of UM and Clustering allows system to • navigate through a large set of options • structure options according to users' valuations • present relevant options only • automatically present tradeoffs between options Results in • increased overall user satisfaction • better overview of options • increased users' confidence in system

Learning Content Selection Rules • Content selection rules for biographical summaries (Duboue & McKeown, EMNLP 2003) • Uses a corpus of textual biographies and corresponding frame-based knowledge representation • Anchor-based alignment of extracted facts with sentences in text corpus • Learns whether semantic unit should be included in biography • Recall 94%, F-score 51% • Induce rules from included material

Learning Content Selection Rules • Collective classification for content selection (Barzilay and Lapata, HLT/EMNLP 2005) • Again, a binary classification task • All candidates considered simultaneously • Improves coherence because semantically related items often selected together • Evaluation: Aligned newswire summaries of NFL games with database of events • Recall 76.5%, F-score 60.15% • Include chosen events in summary (as in extractive summarization)

Learning for Sentence Planning & Realization • SPaRKy (Stent, Prasad & Walker, ACL 2004) • Input: content plan, a set of dialogue acts and rhetorical relations among them • Learns sentence plans from set of human-ranked training examples • Oh & Rudnicky, CS&L, 2002 • Produces surface realizations for sentence plans based on n-gram statistics • Achieves performance comparable to hand-crafted versions

Credits: The FLIGHTS System Fancy Linguistically Inspired Generation of Highly Tailored Speech Rob Clark Steve Conway Mary Ellen Foster Kallirroi Georgila Oliver Lemon Michael White Thanks to: UK Engineering and Physical Science Research Council

Spoken Language Generation: Information Presentation

Spoken Language Generation: Information Presentation

Presentation Transcript

BODY SPEAKS: THE IMPORTANCE OF BODY LANGUAGE

Important information

Spoken Dialogue Systems

Principles of Plain Language

Phonetics

The Millennial Generation: The Next Generation in College Enrollment

The Millennial Generation: The Next Generation in College Enrollment

Facilitating spoken language development in the regular classroom

The Millennial Generation: The Next Generation in College Enrollment

Teaching Speaking

CS 224S / LINGUIST 285 Spoken Language Processing

Principles of Information Systems

Chapter One Invitations to Linguistics

FM GENERATION

Foundational Skills 2

The Millennial Generation: The Next Generation in College Enrollment

Speaker: Hung-yi Lee

What are the Essential Cues for Understanding Spoken Language? Steven Greenberg

Formal vs. Informal Language

Language

SI 760 / EECS 597 / Ling 702 Language and Information

Word up: A Lexicon and Guide to Communicating with Generations Y and Z