1 / 32

Prosody in Generation

Prosody in Generation. Natural Language Generation (NLG). Typical NLG system does Text planning transforms communicative goal into sequence or structure of elementary goals Sentence planning chooses linguistic resources to achieve those goals Realization produces surface output.

Download Presentation

Prosody in Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prosody in Generation

  2. Natural Language Generation (NLG) • Typical NLG system does • Text planning transforms communicative goal into sequence or structure of elementary goals • Sentence planning chooses linguistic resources to achieve those goals • Realization produces surface output

  3. Research Directions in NLG • Past focus • Hand-crafted rules inspired by small corpora • Very little evaluation • Monologue text generation • New directions • Large-scale corpus-based learning of system components • Evaluation important but howto do it still unclear • Spoken monologue and dialogue

  4. How to produce speech instead of text?

  5. Overview • Spoken NLG in Dialogue Systems • Text-to-Speech (TTS) vs. Concept-to-Speech (CTS) • Current Approaches to CTS • Hand-built systems • Corpus-based systems • NLG Evaluation • Open Questions

  6. Importance of NLG in Dialogue Systems • Conveying information intonationally for conciseness and naturalness • System turns in dialogue systems can be shorter S: Did you say you want to go to Boston? S: (You want to go to)Boston H-H% • Not providingmis-information through misleading prosody ...S: (You want to go to)Boston L-L%

  7. Silverman et al ‘93: • Mimicking human prosody improves transcription accuracy in reverse telephone directory task • Sanderman & Collier ‘97 • Subjects were quicker to respond to ‘appropriately phrased’ ambiguous responses to questions in a monitoring task Q: How did I reserve a room? vs. Which facility did the hotel have? A: I reserved a room L-H% in the hotel with the fax. A: I reserved a room in the hotel L-H% with the fax.

  8. Overview • Spoken NLG in Dialogue Systems • Text-to-Speech (TTS) vs. Concept-to-Speech (CTS) • Current Approaches to CTS • Hand-built systems • Corpus-based systems • NLG Evaluation • Open Questions

  9. Prosodic Generation for TTS • Default prosodic assignment from simple text analysis • Hand-built rule-based system: hard to modify and adapt to new domains • Corpus-based approaches (Sproat et al ’92) • Train prosodic variation on large labeled corpora using machine learning techniques • Accent and phrasing decisions • Associate prosodic labels with simple features of transcripts

  10. # of words in phrase • distance from beginning or end of phrase • orthography: punctuation, paragraphing • part of speech, constituent information • Apply learned rules to new text • Incremental improvements continue: • Adding higher-accuracy parsing (Koehn et al ‘00) • Collins ‘99 parser • More sophisticated learning algorithms (Schapire & Singer ‘00) • Better representations: tree based? • Rules always impoverished • How to define Gold Standard?

  11. Spoken NLG • Decisions in Text-to-Speech (TTS) depend on syntax, information status, topic structure,… information explicitly available to NLG • Concept-to-Speech (CTS) systems should be able to specify “better” prosody: the system knows what it wants to say and can specify how • But….generating prosody for CTS isn’t so easy

  12. Overview • Spoken NLG in Dialogue Systems • Text-to-Speech (TTS) vs. Concept-to-Speech (CTS) • Current approaches to CTS • Hand-built systems • Corpus-based systems • NLG evaluation • Open questions

  13. Relying upon Prior Research • MIMIC CTS (Nakatani & Chu-Carroll ‘00) • Use domain attribute/value distinction to drive phrasing and accent: critical information focussed Movie: October Sky Theatre: Hoboken Theatre Town: Hoboken • Attribute names and values always accented • Values set off by phrase boundaries • Information status conveyed by varying accent type (Pierrehumbert & Hirschberg ‘90) • Old (given) L* • Inferrable (by MIMIC, e.g. theatre name from town) L*+H

  14. Key (to formulating valid query) L+H* • New H* • Marking Dialogue Acts • NotifyFailure: U: Where is “The Corrupter” playing in Cranford. S: “The Corrupter”[L+H*] is not [L+H*] playing in Cranford [L*+H]. • Other rules for logical connectives, clarification and confirmation subdialogues • Contrastive accent for semantic parallelism (Rooth ‘92, Pulman ‘97) used in GoalGetter and OVIS (Theune ‘99) The cat eats fish. The dog eats meat.

  15. But … many counterexamples • Association of prosody with many syntactic, semantic, and pragmatic concepts still an open question • Prosody generation from (past) observed regularities and assumptions: • Information can be ‘chunked’ usefully by phrasing for easier user understanding • But in many different ways • Information status can be conveyed by accent: • Contrastive information is accented? S: You want to go to L+H* Nijmegen, L+H* not Eindhoven.

  16. Given information is deaccented? Speaker/hearer givenness U: I want to go to Nijmegen. S: You want to go to H* Nijmegen? • Intonational contours can convey speech acts, speaker beliefs: • Continuation rise can maintain the floor? S: I am going to get you the train information [L-H%]. • Backchanneling can be produced appropriately? S: Okay. Okay? Okaaay… Mhmm..

  17. Wh and yes-no questions can be signaled appropriately? S: Where do you want to go. S: What is your passport number? • Discourse/topic structure can be signaled by varying pitch range, pausal duration, rate?

  18. Overview • Spoken NLG in Dialogue Systems • Text-to-Speech (TTS) vs. Concept-to-Speech (CTS) • Current Approaches to CTS • Hand-built systems • Corpus-based systems • NLG Evaluation • Open Questions

  19. MAGIC • MM system for presenting cardiac patient data • Developed at Columbia by McKeown and colleagues in conjunction with Columbia Presbyterian Medical Center to automate post-operative status reporting for bypass patients • Uses mostly traditional NLG hand-developed components • Generate text, then annotate prosodically • Corpus-trained prosodic assignment component • Corpus: written and oral patient reports • 50min multi-speaker, spontaneous + 11min single speaker, read • 1.24M word text corpus of discharge summaries

  20. Transcribed, ToBI labeled • Generator features labeled/extracted: • syntactic function • p.o.s. • semantic category • semantic ‘informativeness’ (rarity in corpus) • semantic constituent boundary location and length • salience • given/new • focus • theme/ rheme • ‘importance’ • ‘unexpectedness’

  21. Very hard to label features • Results: new features to specify TTS prosody • Of CTS-specific features only semantic informativeness (likeliness of occuring in a corpus) useful so far (Pan & McKeown ‘99) • Looking at context, word collocation for accent placement helps predict accent (Pan & Hirschberg ‘00) RED CELL (less predictable) vs. BLOOD cell (more) Most predictable words are accented less frequently (40-46%) and least predictable more (73-80%) Unigram+bigram model predicts accent status w/77% (+/-.51) accuracy

  22. Stochastic, Corpus-based NLG • Generate from a corpus rather than hand-built system • For MT task, Langkilde & Knight ‘98 over-generate from traditional hand-built grammar • Output composed into lattice • Linear (bigram) language model chooses best path • But … • no guarantee of grammaticality • How to evaluate/improve results? • How to incorporate prosody into this kind of generation model?

  23. FERGUS (Bangalore & Rambow ‘00) • Corpus-based learning to refine syntactic, lexical and prosodic choice • Domain is DARPA Communicator task (air travel information) • Uses stochastic tree model + linear LM + XTAG (hand-crafted) grammar • Trained on WSJ dependency trees tagged with p.o.s., morphological information, syntactic SuperTags (grammatical function, subcat frame, arg realization), WordNet sense tags and prosodic labels (accent and boundary)

  24. Input: • Dependency tree of lexemes • Any feature can be specified, e.g. syntactic, prosodic control poachers <L+H*> now trade the underground

  25. Tree Chooser: • Selects syntactic/prosodic properties for input nodes based match with features of mothers and daughters in corpus control poachers<L+H*> now trade the underground

  26. Unraveler: • Produces lattice of all syntactically possible linearizations of tree using XTAG grammar underground poachers trade now control the s now poachers underground trade

  27. Linear Precedence Chooser: • Finds most likely lattice traversal, using trigram language model Now [H*] poachers [L+H*] [L-] control the underground trade [H*] [L-L%]. • Many ways to implement each step • How to choose which works ‘best’? • How to evaluate output?

  28. Overview • Spoken NLG in Dialogue Systems • Text-to-Speech (TTS) vs. Concept-to-Speech (CTS) • Current Approaches to CTS • Hand-built systems • Corpus-based systems • NLG Evaluation • Open Questions

  29. Evaluating NLG • How to judge success/progress in NLG an open question • Qualitative measures: preference • Quantitative measures: • task performance measures: speed, accuracy • automatic comparison to a reference corpus (e.g. string edit-distance and variants, tree-similarity-based metrics) • Not always a single “best” solution • Critical for stochastic systems to combine qualitative judgments with quantitative measures (Walker et al ’97)

  30. Qualitative Validation of Quantitative Metrics • Subjects judged understandability and quality • Candidates proposed by 4 evaluation metrics to minimize distance from Gold Standard (Bangalore, Rambow & Whittaker ‘00) • Tree-based metrics correlate significantly with understandability and quality judgments -- string metrics do not • New objective metrics learned • Understandability accuracy = (1.31*simple tree accuracy -.10*substitutions=.44)/.87 • Quality accuracy = (1.02*simple tree accuracy - .08*substitutions - .35)/.67

  31. Overview • Spoken NLG in Dialogue Systems • Text-to-Speech (TTS) vs. Concept-to-Speech (CTS) • Current Approaches to CTS • Hand-built systems • Corpus-based systems • NLG Evaluation • Open Questions

  32. More Open Questions for Spoken NLG • How much to model human original? • Planning for appropriate intonational variation even important in recorded prompts • Timing and backchanneling • What kind of output is most comprehensible? • What kind of output elicits most easily understood user response? (Gustafson et al ’97,Clark & Brennan ‘99) • Implementing variations in dialogue strategy • Implicit confirmation • Mixed initiative

More Related