1 / 20

Lecture 22

Lecture 22. Intonation and Discourse. What does prosody convey?. In general, information about: What the speaker is trying to convey Is this a statement or a question? The speaker state Is the speaker getting angry, frustrated? In dialogue, information about:

Download Presentation

Lecture 22

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 22 Intonation and Discourse CS 4705

  2. What does prosody convey? • In general, information about: • What the speaker is trying to convey • Is this a statement or a question? • The speaker state • Is the speaker getting angry, frustrated? • In dialogue, information about: • The structure of the dialogue • Is the user or the system trying to start a new topic? • Is the speaker talking about given or new information? • The state of the interaction: • Is the user having trouble being understood? • Is the user having trouble understanding the system?

  3. Current Trends • New description schemes (e.g. ToBI) • Corpus-based research and machine learning • Emphasis on evaluation of algorithms and systems (NLE ‘00 special issue) • Investigation of spontaneous speech phenomena and variation in speaking style • Applications to CTS, ASR and SDS

  4. Corpora • Public and semi-public databases • ATIS, SwitchBoard, Call Home, Meetings (NIST/DARPA/LDC) • TRAINS/TRIPS (U. Rochester), FM Radio (BU), BDC (Harvard, AT&T) • Private collections • Acquired for speech or dialogue research (August, KTH; Voicemail, AT&T, IBM) • Meetings, call centers, operator services, focus group collections • The Web • Newscasts, radio

  5. To(nes and)B(reak)I(ndices) • Developed by prosody researchers in four meetings over 1991-94 • Goals: • devise common labeling scheme for Standard American English that is robust and reliable • promote collection of large, prosodically labeled, shareable corpora • ToBI standards also proposed for Japanese, German, Italian, Spanish, British and Australian English,....

  6. Minimal ToBI transcription: • recording of speech • f0 contour • ToBI tiers: • orthographic tier: words • break-index tier: degrees of junction (Price et al ‘89) • tonal tier: pitch accents, phrase accents, boundary tones (Pierrehumbert ‘80) • miscellaneous tier: disfluencies, non-speech sounds, etc.

  7. Sample ToBI Labeling

  8. Online training material,available at: • http://www.ling.ohio-state.edu/phonetics/ToBI/ • Evaluation • Good inter-labeler reliability for expert and naive labelers: 88% agreement on presence/absence of tonal category, 81% agreement on category label, 91% agreement on break indices to within 1 level (Silverman et al. ‘92,Pitrelli et al ‘94)

  9. Pitch Accent/Prominence in ToBI • Which items are made intonationally prominent and how? • Accent type: • H* simple high (declarative) • L* simple low (ynq) • L*+H scooped, late rise (uncertainty/ incredulity) • L+H* early rise to stress (contrastive focus) • H+!H* fall onto stress (implied familiarity)

  10. Downstepped accents: • !H*, L+!H*, L*+!H • Degree of prominence: • within a phrase: HiF0 • across phrases

  11. Functions of Pitch Accent • Given/new information • S: Do you need a return ticket? • U: No, thanks, I don’t need a return. • Contrast (narrow focus) • U: No, thanks, I don’t need a RETURN…. (I need a time schedule, receipt,…) • Disambiguation of discourse markers • S: Now let me get you the train information. • U: Okay (thanks) vs. Okay….(but I really want…)

  12. Predicting Accent: Is it accented or not? • Applications: TTS and CTS • Corpora: read and spontaneous speech • Features: pos window of 3, sentence position, position within NP, # of syllables, position in complex nominal, inferred given/new status, inferred focus, mutual information • Results: 75-85% correct, depending on genre

  13. Prosodic Phrasing in ToBI • ‘Levels’ of phrasing: • intermediate phrase: one or more pitch accents plus a phrase accent (H- or L- ) • intonational phrase: 1 or more intermediate phrases + boundary tone (H% or L% ) • ToBI break-index tier • 0 no word boundary • 1 word boundary • 2 strong juncture with no tonal markings • 3 intermediate phrase boundary • 4 intonational phrase boundary

  14. Functions of Phrasing • Disambiguates syntactic constructions, e.g. PP attachment, restrictive/non relative clause: • S: You should buy the ticket with the discount coupon. • S: The itinerary which I faxed includes deluxe accommodations • Disambiguates scope ambiguities, e.g. Negation: • S: You aren’t booked through Rome because of the fare. • Or modifier scope: • S: This fare is restricted to retired politicians and civil servants.

  15. Predicting Phrase Boundaries • Applications: TTS, CTS, ASR • Corpora: AP news, Penn Treebank, ATIS • Features: sentence position, sentence length, pos window of 4, location of previous predicted boundary, mutual information, constituent information, dependency structure • Results: 96% correct

  16. Contours: Accent + Phrasing • What do intonational contours ‘mean’ (Ladd ‘80, Bolinger ‘89)? • Speech acts (statements, questions, requests) S: That’ll be credit card? (L* H- H%) • Propositional attitude (uncertainty, incredulity) S: You’d like an evening flight.(L*+H L- H%) • Speaker affect (anger, happiness, love) U: I said four SEVEN one! (L+H* L- L%) • “Personality” S: Welcome to the Sunshine Travel System.

  17. Pitch Range and Timing • Level of speaker engagement • S: Welcome to InfoTravel. How may I help you? • Contour interpretation • S: You can take the L*+H bus from Malpensa to Rome L-H%. • U: Take the bus. vs. Take the bus! • Discourse/topic structure • Topic beginnings have higher pitch range, faster, preceded by longer pauses • Endings the opposite

  18. Prosody and Speaker Emotion • What makes an utterance sound angry? Sad? • How much comes from the lexical information? • How much from the acoustic/prosodic? • Does all anger, e.g., sound the same? • Cahn ‘88 (examples)

  19. Applications • Text-to-Speech and Concept-to-Speech generation: improve naturalness • Speech Recognition: identify suprasegmental meaning • Spoken Dialogue Systems: understand when people are confused, angry • Audio Browsing: format corpora for browsing and search

  20. Challenges • We don’t really know what most contours ‘mean’ • Our accent prediction needs more sensitivity to better model of given/new, focus, grammatical function • Our phrasing prediction needs better information about e.g. attachment • We don’t know much about emotional speech or ‘personality’ -- critical to applications

More Related