slide1
Download
Skip this Video
Download Presentation
Text to Speech for In-car Navigation Systems

Loading in 2 Seconds...

play fullscreen
1 / 18

Text to Speech for In-car Navigation Systems - PowerPoint PPT Presentation


  • 99 Views
  • Uploaded on

Text to Speech for In-car Navigation Systems. Luisa Cordano August 8, 2006. TTS in the Navigation Domain. Flexibility Perfect pronunciation Cost savings Expressivity. What kind of expectations does the Navigation market have on a TTS system when applying it in its domain?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Text to Speech for In-car Navigation Systems' - jackson-daniel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
tts in the navigation domain
TTS in the Navigation Domain
  • Flexibility
  • Perfect pronunciation
  • Cost savings
  • Expressivity

What kind of expectations does the Navigation market have on a TTS system when applying it in its domain?

key aspects of a tts system
Key aspects of a TTS system
  • Unit Selection Technique
  • Domain-dependent customization
  • Phonetic Input Interface
  • Native vs. Foreign Pronunciation
the importance of unit selection tts
The importance of Unit Selection TTS
  • Loquendo TTS is a general purpose (Unrestricted) Text-to-Speech system based on the Unit Selection Speech Synthesis Technique
  • The Unit Selection Technique is at present the only existing technique that enables natural sounding speech
  • Its basic idea is to concatenate long phoneme sequences extracted from large DataBases of speech recorded by a single voice talent
  • Advantages: little speech processing, the natural voice timbre is preserved
  • Drawbacks: scarce control, speech quality depends on the contents of the database
unit selection tts
Unit Selection TTS

Language Libraries

Domain-representative large text corpus

Input text

ITALIAN

Statistical Tools for Database Design

SPANISH

TEXT ANALYZER

ENGLISH

Dense text corpus

...

phonemes&prosodic labels (&prosodic values)

Tools for: Speech Acquisition Phoneme Segmentation Signal Analysis

SPEECH SYNTHESIZER

ITALIAN female voice

SPAGNOLO

Vocal DataBases

...

ITALIAN Voice-2

SPAGNOLO

SPANISH male voice

speech

...

...

...

slide6
The Synthesis Algorithm

Vocal DataBase

  • define a synthesis TARGET

p r `o v a d i s `i n t e s i

P P P P P P P P C C C C C W

... p r `o s a e l e g `a n t e...

P P PP P P P P P P PP PPP P P P P PPPPPP

  • match phonetic and prosodic labels
  • look for similar f0 values at unit junctions
  • preferably cut units at diphone boundaries
  • concatenate waveforms
  • adjust f0 to remove jumps (pitch scaling)

..r i n: `o v a d i s `o l i t o ...

P PP P P P P P P P P P P P P PPP S S S S SS SS UU

p r `o v a d i s `i n t e s i

...l a f o t o s `i n t e s i

PPP P PPPP PP PPPP C CCCCCCCCCWW

domain dependent customization
Domain Dependent Customization
  • Loquendo TTS can read any text
  • In order to improve accuracy and fluency on an application domain it can be customized
    • Lexical customization: adapting Text-to-Phoneme conversion via Pronunciation Lexicon, Phonetic Input
    • Vocal customization: enhancing the Vocal DataBase with application prompts and recurrent phrases
domain dependent customization1
Domain Dependent Customization
  • Thenavigation domain:
    • A limited number of recurring phrases: Ex. “turn left”, “…”
    • A peculiar and huge lexical domain: addresses, names and toponyms: due to their they historical/foreign origin, they do not necessarily follow the standard grapheme-to-phoneme rules of the language
  • Customizing Loquendo TTSfor Navigation:
    • Navigation Vocal Add-On: the recurring phrases can be recorded in case their synthesis does not sound perfect
    • The exact pronunciation of names and addresses can be specified via the Phonetic Input Interface or via the Pronunciation Lexicon (this does not prevent from possible acoustic defects)
phonetic input interface
Phonetic Input Interface
  • Loquendo TTS let the User control the exact pronunciation of words
  • An escape sequence in the input text allows skipping the grapheme-to-phoneme process driven by the Language Library…
  • And inputting the desired phoneme sequence directly to the Speech Synthesis process based on the Vocal Database
phonetic input interface1
Phonetic Input Interface

Language Libraries

Input text

ITALIAN

SPANISH

Input Phonetic Transcription (language independent SAMPA Alphabet)

TEXT ANALYZER

ENGLISH

...

phonemes&prosodic labels (&prosodic values)

SPEECH SYNTHESIZER

ITALIAN female voice

SPAGNOLO

Vocal DataBases

...

ITALIAN Voice-2

SPAGNOLO

SPANISH male voice

speech

...

phonetic input interface2
Phonetic Input Interface
  • The special feature is the language independence
  • The input phonemes are Mapped on the Phonemes of the Voice

Via the Phonetic Interface, an Italian voice can be made to pronounce

\SAMPA="san$te$na

for the Italian name “Santena”, overriding the the default san“tena.

But also

\SAMPA=% le#"gRa~Z

For the French “Les Granges”, otherwise transcribed les#"grandges.

The foreign R and a~ will be mapped on the Italian r and a

native vs foreign pronunciation
Native vs Foreign Pronunciation
  • Each Voice has its own native language
  • Each Vocal DataBase has its own:
    • Phoneme Set (the db does not contain foreign phonemes)
    • Coverage of Phoneme Sequences (the db contains only sequences that frequent in its native language)
  • The Foreign Pronunciation feature of Loquendo TTS maps foreign phonemes onto the most similar native phonemes.
  • But it can’t avoid obtaining phoneme sequences not present in the Vocal DB, requiring the concatenation of shorter speech units.
  • Foreign Pronunciation is:
          • Plausible
          • Approximated
          • Sometimes choppy
foreign pronunciation the phoneme mapping algorithm
Foreign Pronunciation: the Phoneme Mapping Algorithm

The algorithm centers around a Phonetic Similarity Function (PSF) that computes a similarity score of two phonemes that only depends on their phonetic-articulatory features

Such an approach overlooks:

  • Finer language specific aspects of speech perception
  • Pragmatic/cultural aspects that may affect foreign pronunciation

But… makes comparison between phonemes free from any language-specific knowledge!

phonetic similarity function
Phonetic Similarity Function

This approach requires, as a first step, the definition of a vector of phonetic-articulatory features for each phoneme

Each feature has a different weight in the computation of phonetic similarity

Values of a non-binary feature are placed in a scale of perceptual distance

Tongue Position

Tonicity

Non-binary features

Binary feature

foreign pronunciation demos
Foreign Pronunciation: Demos

English Text:

“Good afternoon everybody! This is a live demo of Loquendo's lifelike text-to-speech technology, using the mixed language feature”

Katrin (German Voice)

Jorge (Spanish Voice)

Bernard (French Voice)

Interactive demos are available on http://actor.loquendo.com/actordemo/

foreign pronunciation navigation demos
Foreign Pronunciation: Navigation Demos

Dave (American Voice)

Kate (British Voice)

“In 250 yards, bear left, at the roundabout, take the second exit, and follow the signpost indicating, Charles de Gaulle airport.”

“In 100 yards, stay in the right lane, and take the main road. After the tunnel, turn left, and follow the signpost indicating, Fiumicino, Leonardo da Vinci airport.”

future works for the foreign pronunciation
Future Works for the Foreign Pronunciation

The problem of the the unusual sequences of phones could be partially solved by :

  • Augmenting the speech databases with ad-hoc vocal material
  • Rewriting the PMM output by reducing the number of unusual phoneme sequences without modifying the plausibility of the reading
  • Adding new spectral features to better describe each speech database phone
ad