1 / 31

Speech Output

Speech Output. Reading: Reiter and Dale, chap 7. Note: Simplenlg and Protege. Simplenlg Lexicaliser creates an SPhraseSpec from a Protégé instance Based on template mapping rules encoded in Protégé. Example. SPIKE: Subject = “there” Verb = “is” Complement = “a spike”

sonja
Download Presentation

Speech Output

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Output Reading: Reiter and Dale, chap 7

  2. Note: Simplenlg and Protege • Simplenlg Lexicaliser creates an SPhraseSpec from a Protégé instance • Based on template mapping rules encoded in Protégé

  3. Example • SPIKE: • Subject = “there” • Verb = “is” • Complement = “a spike” • Modifier = [“in [channel]”, “to [peak_value]” • Channel, peak_value are features of spikes • Results in texts such as • There is a spike in HR to 160

  4. Usage • Document Planner decides which instances to include in the text • Lexicaliser produces initial SPhraseSpec from these • Microplanner modifies SPhraseSpec • Add extra modifiers if necessary • Eg, “at 10.40” (if diff from last time mentioned) • Aggregation • Syntactic choice (passive, tense) • Referring exp (HR, Heart Rate) • Realiser produces text

  5. Simplenlg and Protege • Complex, very much under development • Happy to discuss more with interested students • Prof Mellish is very interested in NLG and Semantic Web

  6. Different Modalities • Many ways to communicate data • Visualisation • Written text • Spoken text (speech) • Combinations of above

  7. Speech output • Computers can talk as well as write • Prerecorded files (eg, WAV) • Text-to-speech (TTS) • Speaks arbitrary texts • Example app: spoken weather forecasts • Output of our weather-forecast generator spoken for premium-rate telephone weather information services

  8. Simple approach • Problem: speak aloud a written text • Simple approach • Record people speaking words • Given a text, combine recordings for all the words in the text • Telephone directory enquiries

  9. Problems • Intonation/prosody • Difficult to understand monotone intonation • Cannot determine which word is meant • He lives on Don St. • St. Louis is a great city. • Conventions • £20 is twenty pounds, not pound twenty • New words (names, technical terms)

  10. Problems • Pronouncing symbols • £ is pound or pounds ?? • I have £1 vs I have £5 vs I ate a £5 lunch • Pronouncing numbers • Individual digits or as a whole • 01224 273443 vs 1,224,273,443 people

  11. Lexical Disambiguation • Which word is meant • a cat has nine lives (noun) • She lives here (verb) • I have a bow and arrow • I will not bow to her

  12. Sophisticated text-to-speech • Determine grammatical structure • parsing • statistical techniques • Use this to determine • How to pronounce symbols, numbers • Lexical disambiguation • Rhetoric structure (for intonation)

  13. Example: ATT Natural Voices • One of several commercial TTS systems • Nice demo at • http://www.research.att.com/~ttsweb/tts/demo.php

  14. Prosodic Structure • Pitch change shows sentence type [?, ! ,.] • Hello. • Hello! • Hello? • Stress reflects importance, new information • *Mary gave John a book • Mary *gave John a book • etc

  15. Pronunciation of new words • Eg, “Inverurie” • Rule-based • Use rules describing how phonemes are said in different contexts • Maybe models of human vocal cords, mouth • Concatenative • library of acoustic units, human-spoken • merged together for new words • Problems with both approaches

  16. Markups • Speech markups (low-level) • pause • speed • volume • pitch • type (money, phone number) • Competing standards: • SAPI (Microsoft) • SSML (W3C)

  17. Example I want to go <break/> <prosody volume="loud"> home </prosody>.

  18. Speech Markups • Higher level markups • emphasis, deemphasis • character (eg, whisper) ?? • emotion ??? • Voice (accent, gender, age, …) ??

  19. When is speech useful? • Ideas from class?

  20. When (not) useful • Useful • Get attention (eg, urgent warning) • No screen or hands busy (eg, diver in water) • For visually impaired users • Not useful • Distracting (“you have spam”) • Long messages (text can be reread!) • Noisy environments • Deaf users

  21. Systems • FreeTTS – free Java-based text-to-speech • Low voice quality, limited func, easy to use • Microsoft – Speech SDK • Higher quality, more func than FreeTTS • Tied to Windows, stresses VB, .net, etc • Commercial – highest quality • Natural Voices, RealSpeak, … • rVoice (Scottish software, mostly defunct)

  22. Digression: rVoice • From Rhetorical Systems • Edinburgh Uni spinout • From Festival, also source of FreeTTS (practical) • High-profile “success story” of high-tech Scotland • rVoice • Very high quality voices (best in world?) • Could imitate a real person

  23. Digression: rVoice • Not very successful as a business • Too expensive? • Some users (eg, blind people) wanted cheap soln • When high-quality voices needed (weather info), cheaper to hire people to speak messages • Recently bought by a competitor • Essentially being closed down, customers encouraged to move to competitors product • Sad…

  24. Speech output from Java • Set up system • Set up a voice • Call “speak” method • (some systems) wait until speech finished • Speech takes time, system can do something else while speech is happening

  25. FreeTTS example VoiceManager voiceManager = VoiceManager.getInstance(); Voice helloVoice = voiceManager.getVoice(“kevin16”); helloVoice.allocate(); helloVoice.speak(“Mary had a little lamb."); helloVoice.deallocate();

  26. Advanced topic: concept-to-text • Currently NLG systems produce text, which is fed into speech synthesiser • But speech quality should improve if the NLG system gave more information • Syntactic structure (for pauses) • Desired meaning of word (for pronunciation) • Importance (for emphasis) • How integrate NLG and speech?

  27. Speech Input • Talk to the computer instead of type • Commands (select from limited list) • Like cinema information line • Eg say name of movie you want to watch • Dictation • Dictate arbitrary texts • In recent versions of Office • Many errors

  28. Speech dialogue • Dialogue with the computer, just like in science fiction movies • C: your first ascent was dangerous • H: why? • C: because you came up too quickly • H: what should I have done? • C: you should have taken 5 minutes to come up instead of 3 minutes

  29. Speech dialogue • Key problems are • (a) dealing with speech input errors • Need to unobtrusively check that understood correctly • (b) dealing with strange things users say • Speech allows them to say anything, and they do! • (c) interpolating from ambiguous data • Does “Aberdeen” mean “Aberdeen, UK”, “Aberdeen, Maryland”, etc

  30. Example User: Hello, I want to fly to London next Thursday System: What airport will you be flying from when you go to London, UK? User: Aberdeen System: What time on Thursday, 16 March, do you wish to depart from Aberdeen, Scotland? User: mid-morning System: BA 1305 leaves Aberdeen at 940 and arrives into London Heathrow at 1115. Should I book one seat for you on Thursday, 16 March?

  31. Conclusion • Texts can be spoken instead of (or as well as) written • Harder than it seems, but technology exists and is getting better • Useful in some situations • In longer term, speech input and dialogue

More Related