speech output
Skip this Video
Download Presentation
Speech Output

Loading in 2 Seconds...

play fullscreen
1 / 31

Speech Output - PowerPoint PPT Presentation

  • Uploaded on

Speech Output. Reading: Reiter and Dale, chap 7. Note: Simplenlg and Protege. Simplenlg Lexicaliser creates an SPhraseSpec from a Protégé instance Based on template mapping rules encoded in Protégé. Example. SPIKE: Subject = “there” Verb = “is” Complement = “a spike”

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Speech Output' - sonja

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
speech output

Speech Output

Reading: Reiter and Dale, chap 7

note simplenlg and protege
Note: Simplenlg and Protege
  • Simplenlg Lexicaliser creates an SPhraseSpec from a Protégé instance
    • Based on template mapping rules encoded in Protégé
  • SPIKE:
    • Subject = “there”
    • Verb = “is”
    • Complement = “a spike”
    • Modifier = [“in [channel]”, “to [peak_value]”
    • Channel, peak_value are features of spikes
  • Results in texts such as
    • There is a spike in HR to 160
  • Document Planner decides which instances to include in the text
  • Lexicaliser produces initial SPhraseSpec from these
  • Microplanner modifies SPhraseSpec
    • Add extra modifiers if necessary
      • Eg, “at 10.40” (if diff from last time mentioned)
    • Aggregation
    • Syntactic choice (passive, tense)
    • Referring exp (HR, Heart Rate)
  • Realiser produces text
simplenlg and protege
Simplenlg and Protege
  • Complex, very much under development
  • Happy to discuss more with interested students
    • Prof Mellish is very interested in NLG and Semantic Web
different modalities
Different Modalities
  • Many ways to communicate data
    • Visualisation
    • Written text
    • Spoken text (speech)
    • Combinations of above
speech output7
Speech output
  • Computers can talk as well as write
  • Prerecorded files (eg, WAV)
  • Text-to-speech (TTS)
    • Speaks arbitrary texts
  • Example app: spoken weather forecasts
    • Output of our weather-forecast generator spoken for premium-rate telephone weather information services
simple approach
Simple approach
  • Problem: speak aloud a written text
  • Simple approach
    • Record people speaking words
    • Given a text, combine recordings for all the words in the text
      • Telephone directory enquiries
  • Intonation/prosody
    • Difficult to understand monotone intonation
  • Cannot determine which word is meant
    • He lives on Don St.
    • St. Louis is a great city.
  • Conventions
    • £20 is twenty pounds, not pound twenty
  • New words (names, technical terms)
  • Pronouncing symbols
    • £ is pound or pounds ??
    • I have £1 vs I have £5 vs I ate a £5 lunch
  • Pronouncing numbers
    • Individual digits or as a whole
    • 01224 273443 vs 1,224,273,443 people
lexical disambiguation
Lexical Disambiguation
  • Which word is meant
    • a cat has nine lives (noun)
    • She lives here (verb)
    • I have a bow and arrow
    • I will not bow to her
sophisticated text to speech
Sophisticated text-to-speech
  • Determine grammatical structure
    • parsing
    • statistical techniques
  • Use this to determine
    • How to pronounce symbols, numbers
    • Lexical disambiguation
    • Rhetoric structure (for intonation)
example att natural voices
Example: ATT Natural Voices
  • One of several commercial TTS systems
  • Nice demo at
    • http://www.research.att.com/~ttsweb/tts/demo.php
prosodic structure
Prosodic Structure
  • Pitch change shows sentence type [?, ! ,.]
    • Hello.
    • Hello!
    • Hello?
  • Stress reflects importance, new information
    • *Mary gave John a book
    • Mary *gave John a book
    • etc
pronunciation of new words
Pronunciation of new words
  • Eg, “Inverurie”
  • Rule-based
    • Use rules describing how phonemes are said in different contexts
    • Maybe models of human vocal cords, mouth
  • Concatenative
    • library of acoustic units, human-spoken
    • merged together for new words
  • Problems with both approaches
  • Speech markups (low-level)
    • pause
    • speed
    • volume
    • pitch
    • type (money, phone number)
  • Competing standards:
    • SAPI (Microsoft)
    • SSML (W3C)

I want to go



speech markups
Speech Markups
  • Higher level markups
    • emphasis, deemphasis
    • character (eg, whisper) ??
    • emotion ???
    • Voice (accent, gender, age, …) ??
when is speech useful
When is speech useful?
  • Ideas from class?
when not useful
When (not) useful
  • Useful
    • Get attention (eg, urgent warning)
    • No screen or hands busy (eg, diver in water)
    • For visually impaired users
  • Not useful
    • Distracting (“you have spam”)
    • Long messages (text can be reread!)
    • Noisy environments
    • Deaf users
  • FreeTTS – free Java-based text-to-speech
    • Low voice quality, limited func, easy to use
  • Microsoft – Speech SDK
    • Higher quality, more func than FreeTTS
    • Tied to Windows, stresses VB, .net, etc
  • Commercial – highest quality
    • Natural Voices, RealSpeak, …
    • rVoice (Scottish software, mostly defunct)
digression rvoice
Digression: rVoice
  • From Rhetorical Systems
    • Edinburgh Uni spinout
      • From Festival, also source of FreeTTS (practical)
    • High-profile “success story” of high-tech Scotland
  • rVoice
    • Very high quality voices (best in world?)
    • Could imitate a real person
digression rvoice23
Digression: rVoice
  • Not very successful as a business
    • Too expensive?
      • Some users (eg, blind people) wanted cheap soln
      • When high-quality voices needed (weather info), cheaper to hire people to speak messages
  • Recently bought by a competitor
    • Essentially being closed down, customers encouraged to move to competitors product
  • Sad…
speech output from java
Speech output from Java
  • Set up system
  • Set up a voice
  • Call “speak” method
  • (some systems) wait until speech finished
    • Speech takes time, system can do something else while speech is happening
freetts example
FreeTTS example

VoiceManager voiceManager = VoiceManager.getInstance();

Voice helloVoice = voiceManager.getVoice(“kevin16”);


helloVoice.speak(“Mary had a little lamb.");


advanced topic concept to text
Advanced topic: concept-to-text
  • Currently NLG systems produce text, which is fed into speech synthesiser
  • But speech quality should improve if the NLG system gave more information
    • Syntactic structure (for pauses)
    • Desired meaning of word (for pronunciation)
    • Importance (for emphasis)
  • How integrate NLG and speech?
speech input
Speech Input
  • Talk to the computer instead of type
  • Commands (select from limited list)
    • Like cinema information line
      • Eg say name of movie you want to watch
  • Dictation
    • Dictate arbitrary texts
    • In recent versions of Office
  • Many errors
speech dialogue
Speech dialogue
  • Dialogue with the computer, just like in science fiction movies
    • C: your first ascent was dangerous
    • H: why?
    • C: because you came up too quickly
    • H: what should I have done?
    • C: you should have taken 5 minutes to come up instead of 3 minutes
speech dialogue29
Speech dialogue
  • Key problems are
    • (a) dealing with speech input errors
      • Need to unobtrusively check that understood correctly
    • (b) dealing with strange things users say
      • Speech allows them to say anything, and they do!
    • (c) interpolating from ambiguous data
      • Does “Aberdeen” mean “Aberdeen, UK”, “Aberdeen, Maryland”, etc

User: Hello, I want to fly to London next Thursday

System: What airport will you be flying from when you go to London, UK?

User: Aberdeen

System: What time on Thursday, 16 March, do you wish to depart from Aberdeen, Scotland?

User: mid-morning

System: BA 1305 leaves Aberdeen at 940 and arrives into London Heathrow at 1115. Should I book one seat for you on Thursday, 16 March?

  • Texts can be spoken instead of (or as well as) written
    • Harder than it seems, but technology exists and is getting better
  • Useful in some situations
  • In longer term, speech input and dialogue