1 / 56

Processing with Prosody & Predicting Prosody

Processing with Prosody & Predicting Prosody. Taal- en spraaktechnologie Fall 2005 Jennifer Spenader. Today and tomorrow. Today: 1. Why we need to be able to recognize prosody 2. Elements that correlate with prosody in synthetic speech Tomorrow

clare
Download Presentation

Processing with Prosody & Predicting Prosody

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Processing with Prosody&Predicting Prosody Taal- en spraaktechnologie Fall 2005 Jennifer Spenader

  2. Today and tomorrow Today: 1. Why we need to be able to recognize prosody 2. Elements that correlate with prosody in synthetic speech Tomorrow • How do categories like new-given relate to choice of lexical and syntactic form? • How do we determine the interpretation of underspecified forms?

  3. Structure of Today’s Lecture • What makes speech sound good? • What role does prosody play in language understanding? • Categories that are relevant to generation of prosody • Defining, identifying, operalizing, implementing, testing • How is the information used to generate natural synthetic speech?

  4. What makes good synthetic speech good? • Idealized synthetic speech • Good synthetic speech (AT & T’s Crystal) • BAD synthetic speech

  5. Characteristics of good synthetic speech • Intelligibility • It should support the listener’s decoding of the speaker’s message • Naturalness • It should follow the rules of discourse and information structure • Pleasant to listen to? Friendly sounding?

  6. How do we evaluate synthetic speech? • Present listeners with samples and ask them • Their opinion (give rating, e.g. 1 to 5) • To compare two samples • To compare two samples to a third reference • To ‘type what you hear’

  7. Problems with evaluation • Are all listeners informative subjects? • consistency (do the scores make sense when taken together) • reliability (do people’s scores have same range? same mean?) • native language, experience, etc.? • What are we judging anyway? naturalness, understandability, likeableness, coverage, intelligibility – how are these the same or different? Slide slightly modified from Tina Bennett (2004)

  8. Prosody in synthetic speech • Using the expected accentuation patterns makes synthetic speech more predictable • If applications used in real world, e.g. noisy environments, then we need to have high intelligibility • (Does it make the message more redundant?) • Meaning is sometimes effected by prosody • Important for analysis, for machine translation, etc. • Ex. • Jag behöver en biljet. • Jag behöver EN biljet.

  9. What role does prosody play in language? • Lexicon • Some languages make meaningful lexical distinctions with prosody, e.g. Chinese, even Japanese • Ame_candy vs. Ame rain • Syntactic Structure • Identify constituents or phrases? • Discourse structure • Identifies referents, distinguishes given from new • Identifies contrasts, emphasizes key points? • Marks topic changes • Aides in identifying rhetorical relations

  10. Prosodic prominence aids processing • Word initial phonemes are recognized faster in words with pitch accent • (Shields et al. 1974; Cutler & Foss 1977) • Phoneme identification tasks • Mispronunciations are recognized faster if the word has pitch accent • (Cole et al. 1978, Cole & Jakimik, 1980) • Words with pitch accent have clearer acoustics

  11. Why not just give everything prosodic prominence? • Information theory and coding & • “Speaker economy”: • An efficient code has a low average length per message compared to an inefficient code • Giving everything prosodic prominence might be helpful to the hearer but makes things harder for the speaker • Language is already redundant, speaker’s utilize this

  12. What does prosody tell us about the message? • So far we’ve just said something about prosody being helpful in decoding and recognize words

  13. Syntactic form • Rising and falling fundamental frequency, with final lengthening function as boundary tones • For many years linguists assumed prosody mirrored syntactic structure

  14. Prosody not isomorphic with syntax • No one-to-one prosodic correlates of syntactic structure • Accepted only fairly recently • Major syntactic boundaries: • show greater F0 movement and longer segmental durations • Major syntactic boundaries may be accurately located from prosodic information alone • (Collier & ‘t Hart, 1975) • How good is Crystal? • Note break before “during”

  15. Prosody disambiguates local ambiguities Ex. • John believes Mary implicitly. • John believes Mary to be a professor. • Prosody helps online processing • Ex. • Earlier my sister took a dip/in the pool/at the club/on the hill. • Grosjean (1983) Subjects could distinguish whether the target word “dip” was followed by zero, three, or six more words. • Language specific: French listener’s couldn’t do more than recognize sentence finality of English sentences

  16. Information structure • From Eady and Cooper (1986) (version of “Question Test of Harjicova et al. 1995) • Ex. George has flowers for Mary. • Who has flowers for Mary? • What does George have for Mary? • Who does George have flowers for? • Depending on the question (=context), different words will receive phonetic focus.

  17. Listeners actively search for sentence focus Cutler (1976) Phoneme monitoring task (listen for a particular phoneme, e.g. /d/) • That summer four years ago I ate roast DUCK for the first time. • That summer four years ago I ate roast duck for EVERY MEAL. • “duck” edited out and replaced by neutral version • Subjects faster in recognizing target word’s phoneme in context where it would have been focused

  18. Prosodic prominence also triggers extra semantic processing • Homophomes “gelijkklinkend woord” • hart vs. hard, (de) bal vs. (het) bal • Blutner & Sommer (1988) • If a homophone (a word with several meanings) is focused, its multiple meanings are activated • Unaccented activates only the contextually correct interpretation

  19. Why deaccent and accent? • Let your hearer know what’s important! • New-given :New items receive accent, given items are deaccented • Receive accent: The stressed syllable is produced so that it coincides with an F0 maxima… • As well as longer duration, increased intensity? • Be deaccented: • Get cliticized: clitic: An unstressed word incapable of standing on its own and attaches in pronunciation to a stressed word, with which it forms a single accentual unit. • the pronoun 'em in I see 'em • the definite article in French l'arme, "the arm." • (modified from Free Online Dictionary)

  20. Sentence processing is sensitive to new-given • Response to comprehension task better with correct new-given prosody • (Bock & Mazzella, 1983) • Simple definition of new-given • First occurance = NEW • Second occurance = GIVEN

  21. Correct accenting • Mark “new information” by using question test form (Harjicova et al. 1995) • Ex. • Who won the lottery? • It was won by a phonologist. (Target phonemes: /b/ or /f/) • Cutler & Fodor (1979) phoneme-identification is faster when the word in the phoneme identification was the same as focus word.

  22. Correct deaccenting • Verification of given information in pictures faster when given information deaccented • When this information was accented reaction times became longer • (Terken & Nooteboom 1987)

  23. Do speakers deaccent to distinguish given information from new, or do the deaccent because they can?

  24. New-given: how defined • Actually until now we just used a simple definition, repetition of same word form • This is also the type of data used in most testing • But surely there is more to new-given!

  25. When is something “given”?

  26. When is something given? • Threshold and scope of givenness • How does an item become given • Same word earlier? • Reference to same referent earlier? • Reference to same concept earlier? • How much earlier? Is 6 pages/20 minutes earlier too long ago? How long does something remain given?

  27. Theories: Threshold and scope • Chafe (1976) • Scope of givenness depends on number of intervening concepts, number of words. Change of topic might remove given items from consciousness • Grosz & Sidner (1981) • Local focus: items that are now in focus, stored in stack, this are “popped” at topic change • Global focus items are always given: references to topic of article of conversation

  28. Experimental evidence threshold and scope • Terken & Nooteboom (1987) Studied radio program speech. Mentioning a word once was enough for the time to be deaccented for the rest of the program • If deaccentuation in this situation corresponds to givenness then givenness is established after one mention

  29. Inheritence of givenness • Can items be considered given even if the same exact surface form wasn’t used before? • Referents are given or new, not the words used to refer to them! • E.g. purse - handbag

  30. Deaccentuation of given forms or given concept? • Donselaar (1995a) • Ship-boat vs. boat-boat • Subjects asked to make true-false judgements about spoken sentences Ex. The millionaire bought a surprise for his wife. He gave her a boat/ship/mink. The wife UNEXPECTEDLY got a BOAT/boat. • BOAT: accented, or not accented. • Sentences with unaccented synonyms verified more quickly than accented synomyms • No difference for same word

  31. Chafe (1976) Inheritance patterns • Generic concept  specific instance • I don’t like Norwegians. I met a Norwegian yesterday. • I met a Norwegian yesterday. I don’t like Norwegians. • Specific concepts implies more general concepts if the distance is not more than one step • Table  furniture • Mentioning furniture does not make tables given

  32. When is something new? • We just bought a new house. The roof needs repairing. • We just bought a new house. The sauna is fabulous!

  33. Summary • Experimental results show that correct prosody aids in processing • Incorrect prosody makes processing harder • Getting the prosody right should greatly increase the intelligibility and naturalness of synthetic speech

  34. Predicting prosody What do we expect to be accented or deaccented?

  35. Development of TTS • Original TTS systems: used one of two strategies • Accent all open class words, and deaccent all closed class words • This results in too many accents • Accent the last open class word in a phrase • Deaccent everything else • This sounds terrible for many languages, though is “OK” for English

  36. Vos en Haas:1 (Sylvia van den Heiden, ilistrations The Tjong-Khing) • Koekboek Haas is niet thuis. Vos hang lui in de stoel. Hij heeft nergens zin in. Of toch wel. Hij heeft zin in iets lekkers. Koek of zo. Iets ZOETS. Is er nog koek? Vast wel. Vos loopt naar de keuken. HIj doetde kastopen. Daar staatde koektrommel. Maar er zit bijna niks meer in. Drie kleinekoekjes! En hoopkruimels.

  37. Use other strategies • Information structure • Identify new-given information • Accent new information, deaccent given information • Identify contrasted elements • Emphasize them • Identify most important part of message • Focus this

  38. Hirschberg (1993) • Algorithm to assign pitch accent • Implemented in NewSpeak, Bell Laboratories TTS system • Input: unrestricted text, output: tagged text • Used FM Radio texts, ATIS texts and ??? To predict accent • Closed-open class word strategy gets 85% of accents right in FM Radio texts • Tendency for news readers to accent final phrase content words even though most people would not • E.g. TRIAL lawyer vs. TRIAL LAWYER

  39. Not all function words deaccented • “and” as a conjunction vs. “and” as discourse particle (Example from Hirschberg, 1992) • They left after lunch AND landed in France in time for dinner. • ?? They left after lunch. AND, they landed in France in time for dinner.

  40. NewSpeak’s treatment of closed-class items • Three categories • closed-class and frequently deaccented • Possessive pronouns, definite and indefinite articles, copulas, coordinating and subordinating subjections, existential “there”, have, accusative pronouns, most prepositions, positive modals, positive do, as well as certain adverbials, nominative they, nominative and accusative it, some nominal pronouns (e.g. something)

  41. Commonly accented closed class items • Negative article, negative modals, negative do, most nominal pronouns, most nominative and all reflexive pronouns, pre-quantifiers (e.g. all), post-determiners (e.g. next) nominal adverbials (e.g. here), interjections, particles, most wh-words, plus some prepositions

  42. Not all content words are accented • Complex nominals • CAMPAIGN promise • MASSACHUSSETS BAR Association • Semantico-syntactic structure maps to differences in stress assignment • Some stress to left, some to right.

  43. Identifying new-given information • Harder to “tag” for information structure than it is to construct your own examples • For each word • Identify its root • If root is already mentioned in the context, treat as given • If root isn’t mentioned in context, treat as new • Context = local context, should coincide with topics

  44. Vos en Haas:2 (Sylvia van den Heiden, ilistrations The Tjong-Khing) • Koekboek Haas is niet thuis. Vos hang lui in de stoel. Hij heeft nergens zin in. Of toch wel. Hij heeft zin in iets lekkers. Koek of zo. Iets ZOETS. Is er nog koek? Vast wel. Vos loopt naar de keuken. HIj doet de kast open. Daar staat de koektrommel. Maar er zit bijna niks meer in. Drie kleinekoekjes! En hoop kruimels.

  45. Content vs. form words • Hirschberg (1992) • If a word with the same root as a word in the local focus stack, then it is treated as given • Ignores synonyms! Introduces errors because roots can’t always be identified easily • Horne et al. (1993) does same thing but used a network of synonyms an hyponyms to identify given concepts if the referential form was different • inform and information same root! • Koek and koekjes same root!

  46. Contrastive elements • NewSpeak: contrastiveness within a complex noun identified • If part of the complex noun is given, while others are new, then the new items are contrastive • TRIAL\N lawyer\N vs. CRIMINAL\contrastive lawyer\G

  47. Focused elements • Something the speaker considers particular important • ZOETs and LEEG in Kookboek?

  48. Certain closed class words almost always get focused • Negative adverbials Haas is niet thuis. Vos hang lui in de stoel. Hij heeft nergens zin in. Maar er zit bijna niks meer in.

More Related