310 likes | 442 Views
This document explores the development of Natural Language Generation (NLG) and speech output systems, particularly using Simplenlg and Protégé. It details the process of creating SPhraseSpec from Protégé instances based on template mapping rules, leading to effective text generation for various outputs, including spoken language. The complexities of text-to-speech (TTS) systems, challenges like intonation and prosody, and the significance of lexical disambiguation are discussed. Applications in real-world scenarios, including weather forecasting services, highlight these technologies' practical utility.
E N D
Speech Output Reading: Reiter and Dale, chap 7
Note: Simplenlg and Protege • Simplenlg Lexicaliser creates an SPhraseSpec from a Protégé instance • Based on template mapping rules encoded in Protégé
Example • SPIKE: • Subject = “there” • Verb = “is” • Complement = “a spike” • Modifier = [“in [channel]”, “to [peak_value]” • Channel, peak_value are features of spikes • Results in texts such as • There is a spike in HR to 160
Usage • Document Planner decides which instances to include in the text • Lexicaliser produces initial SPhraseSpec from these • Microplanner modifies SPhraseSpec • Add extra modifiers if necessary • Eg, “at 10.40” (if diff from last time mentioned) • Aggregation • Syntactic choice (passive, tense) • Referring exp (HR, Heart Rate) • Realiser produces text
Simplenlg and Protege • Complex, very much under development • Happy to discuss more with interested students • Prof Mellish is very interested in NLG and Semantic Web
Different Modalities • Many ways to communicate data • Visualisation • Written text • Spoken text (speech) • Combinations of above
Speech output • Computers can talk as well as write • Prerecorded files (eg, WAV) • Text-to-speech (TTS) • Speaks arbitrary texts • Example app: spoken weather forecasts • Output of our weather-forecast generator spoken for premium-rate telephone weather information services
Simple approach • Problem: speak aloud a written text • Simple approach • Record people speaking words • Given a text, combine recordings for all the words in the text • Telephone directory enquiries
Problems • Intonation/prosody • Difficult to understand monotone intonation • Cannot determine which word is meant • He lives on Don St. • St. Louis is a great city. • Conventions • £20 is twenty pounds, not pound twenty • New words (names, technical terms)
Problems • Pronouncing symbols • £ is pound or pounds ?? • I have £1 vs I have £5 vs I ate a £5 lunch • Pronouncing numbers • Individual digits or as a whole • 01224 273443 vs 1,224,273,443 people
Lexical Disambiguation • Which word is meant • a cat has nine lives (noun) • She lives here (verb) • I have a bow and arrow • I will not bow to her
Sophisticated text-to-speech • Determine grammatical structure • parsing • statistical techniques • Use this to determine • How to pronounce symbols, numbers • Lexical disambiguation • Rhetoric structure (for intonation)
Example: ATT Natural Voices • One of several commercial TTS systems • Nice demo at • http://www.research.att.com/~ttsweb/tts/demo.php
Prosodic Structure • Pitch change shows sentence type [?, ! ,.] • Hello. • Hello! • Hello? • Stress reflects importance, new information • *Mary gave John a book • Mary *gave John a book • etc
Pronunciation of new words • Eg, “Inverurie” • Rule-based • Use rules describing how phonemes are said in different contexts • Maybe models of human vocal cords, mouth • Concatenative • library of acoustic units, human-spoken • merged together for new words • Problems with both approaches
Markups • Speech markups (low-level) • pause • speed • volume • pitch • type (money, phone number) • Competing standards: • SAPI (Microsoft) • SSML (W3C)
Example I want to go <break/> <prosody volume="loud"> home </prosody>.
Speech Markups • Higher level markups • emphasis, deemphasis • character (eg, whisper) ?? • emotion ??? • Voice (accent, gender, age, …) ??
When is speech useful? • Ideas from class?
When (not) useful • Useful • Get attention (eg, urgent warning) • No screen or hands busy (eg, diver in water) • For visually impaired users • Not useful • Distracting (“you have spam”) • Long messages (text can be reread!) • Noisy environments • Deaf users
Systems • FreeTTS – free Java-based text-to-speech • Low voice quality, limited func, easy to use • Microsoft – Speech SDK • Higher quality, more func than FreeTTS • Tied to Windows, stresses VB, .net, etc • Commercial – highest quality • Natural Voices, RealSpeak, … • rVoice (Scottish software, mostly defunct)
Digression: rVoice • From Rhetorical Systems • Edinburgh Uni spinout • From Festival, also source of FreeTTS (practical) • High-profile “success story” of high-tech Scotland • rVoice • Very high quality voices (best in world?) • Could imitate a real person
Digression: rVoice • Not very successful as a business • Too expensive? • Some users (eg, blind people) wanted cheap soln • When high-quality voices needed (weather info), cheaper to hire people to speak messages • Recently bought by a competitor • Essentially being closed down, customers encouraged to move to competitors product • Sad…
Speech output from Java • Set up system • Set up a voice • Call “speak” method • (some systems) wait until speech finished • Speech takes time, system can do something else while speech is happening
FreeTTS example VoiceManager voiceManager = VoiceManager.getInstance(); Voice helloVoice = voiceManager.getVoice(“kevin16”); helloVoice.allocate(); helloVoice.speak(“Mary had a little lamb."); helloVoice.deallocate();
Advanced topic: concept-to-text • Currently NLG systems produce text, which is fed into speech synthesiser • But speech quality should improve if the NLG system gave more information • Syntactic structure (for pauses) • Desired meaning of word (for pronunciation) • Importance (for emphasis) • How integrate NLG and speech?
Speech Input • Talk to the computer instead of type • Commands (select from limited list) • Like cinema information line • Eg say name of movie you want to watch • Dictation • Dictate arbitrary texts • In recent versions of Office • Many errors
Speech dialogue • Dialogue with the computer, just like in science fiction movies • C: your first ascent was dangerous • H: why? • C: because you came up too quickly • H: what should I have done? • C: you should have taken 5 minutes to come up instead of 3 minutes
Speech dialogue • Key problems are • (a) dealing with speech input errors • Need to unobtrusively check that understood correctly • (b) dealing with strange things users say • Speech allows them to say anything, and they do! • (c) interpolating from ambiguous data • Does “Aberdeen” mean “Aberdeen, UK”, “Aberdeen, Maryland”, etc
Example User: Hello, I want to fly to London next Thursday System: What airport will you be flying from when you go to London, UK? User: Aberdeen System: What time on Thursday, 16 March, do you wish to depart from Aberdeen, Scotland? User: mid-morning System: BA 1305 leaves Aberdeen at 940 and arrives into London Heathrow at 1115. Should I book one seat for you on Thursday, 16 March?
Conclusion • Texts can be spoken instead of (or as well as) written • Harder than it seems, but technology exists and is getting better • Useful in some situations • In longer term, speech input and dialogue