pronouncing words in tts systems
Download
Skip this Video
Download Presentation
Pronouncing Words in TTS Systems

Loading in 2 Seconds...

play fullscreen
1 / 22

Pronouncing Words in TTS Systems - PowerPoint PPT Presentation


  • 147 Views
  • Uploaded on

Pronouncing Words in TTS Systems. Julia Hirschberg CS 4706. Today. Motivation Improve TTS intelligibility and naturalness An application: Language Learning Challenges for automatic word pronunciation Standard methods Pronunciation by rule Pronouncing dictionaries Innovative solutions

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Pronouncing Words in TTS Systems' - omer


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
pronouncing words in tts systems

Pronouncing Words in TTS Systems

Julia Hirschberg

CS 4706

today
Today
  • Motivation
    • Improve TTS intelligibility and naturalness
    • An application: Language Learning
  • Challenges for automatic word pronunciation
  • Standard methods
    • Pronunciation by rule
    • Pronouncing dictionaries
  • Innovative solutions
    • Pronunciation by language origin
    • Pronunciation by rhyming analogy
    • Expanding the lexicon using Active Learning techniques
motivation
Motivation
  • Intelligibility
  • Naturalness
  • Applications to language learning
    • Unlimited vocabulary
    • Type a word or phrase and hear it spoken in your target language
      • To imitate
      • To learn to recognize
converting text to phonemes
Converting Text to Phonemes
  • Pronouncing numbers in different contexts
  • Identifying proper names
  • Expanding abbreviations and acronyms correctly
numbers
Numbers
  • Pronouncing numbers in different contexts
    • In 1996 she sold 1995 shares and deposited $42 in her 401(k).
    • The number is 212-555-1210.
    • That cc # is Visa 4444-3607-5959, expiration 2/07.
  • Conventions:
    • Years
    • Money
    • Phone numbers
    • Money amounts
cultural dependence
Cultural Dependence
  • Russia:
    • Article 3 of the rules attached to the Moscow Telephone Network Subscribers Directory, 1916:
      • “Numbers over a hundred are to be pronounced as follows: 1.23—one twenty three, 9.72—nine seventy two, 70.09—seventy zero nine. In numbers over 10,000 every figure of a hundred should be pronounced separately, for example, 1.20.48—one twenty forty eight, 2.08.35—two zero eight thirty five, 3.35.29—three thirty five twenty nine, 4.49.52—four forty nine fifty two, 5.15.86—five fifteen eighty six etc., not one hundred and twenty forty eight, two hundred and eight thirty five etc.”
slide7
In France
      • A French phone number is 10 digits given in series of two:
        • 01-43-48-12-85
        • "Zéro un, quarante-trois, quarante-huit, douze, quatre-vingt-cinq".
      • Numbers in addresses are always pronounced as a full number:
        • Chambre 823, 240 rie Rivoli
        • Chambre huit-cent-vingt-trois. Deux-cent-quarante, rue de Rivoli
pronouncing words
Pronouncing Words
  • Part-of-speech:
    • use, close, dove, multiply, coax
  • Origin:
    • shoe (ME shoo), phoenix (Gr)
    • mole, attaches, resume
  • Morphological analysis:
    • ferryboat, ferryboats
    • Popemobile
  • Letter-to-sound correspondences:
    • oo, th, qu, e (beet, bet, bite, weigh,…)
slide9
Conventions for numbers and symbols: &c, evalu8, cu, tsp
  • Genre: email, chat, recipe, want ad, software license…
word sense ambiguity and pronunciation
Word Sense Ambiguity and Pronunciation
  • Homographs
    • bass/bass
    • Nice/nice
    • desert/desert
  • Homograph disambiguation
letter to sound rules
Letter-to-Sound Rules
  • E.g.
    • I _{C}e$  /ai/ rise
    • Else I  /ih/ rip
  • Advantages
    • Pronounces anything, seen or not
  • Disadvantages
    • Must be built by hand
    • How to encode all the exceptions:
      • E.g. ripen/risen/riser/river
slide12
Proper names:
      • Nice, Ramirez, Ribeiro, Rise, Infiniti
  • Solutions
    • More complex rules
    • Exceptions dictionary
      • Consulted first
      • But how handle morphological analysis?
        • Rise’s hat
dictionary based approaches
Dictionary-based Approaches
  • Rely on very large dictionary
  • Disadvantages
    • Hand labor to create entries
    • Redundancy
      • Cat, cats, cat’s, cats’
    • Out-of-vocabulary items
      • Proper names: covering all U.K. surnames would require >5,000,000 entries
      • New words: fax, email, mudd, …
      • Technical terms: liposuction, anova, bernaise
      • Foreign borrowings: frappe, ciao, louche
slide14
Solutions
    • Morphological preprocessing before dictionary look-up
    • Fall back to L2Sound rules if no dictionary ‘hit’
more innovative approaches
More Innovative Approaches
  • Pronouncing OOV words
    • Handling proper names
      • Inferring country of origin: Takashita, Leroy, Kirov, Lima, Infiniti
    • Pronunciation by analogy
      • Analog/dialog
      • Risible/visible
      • Proper names: Alifano/Califano
bootstrapping phonetic lexicons maskey et al 04
Bootstrapping Phonetic Lexicons (Maskey et al ’04)
  • For some languages, online pronouncing lexicons exist – but for others….e.g. Nepali
    • How to minimize effort in creating lexicons?
  • Idea
    • Given a native speaker and a large amount of online text in the language…
      • Native speaker builds small lexicon by hand for seed set of N most common words in text, e.g.
        • is: /izh/
        • the: /dhax/
slide17
Derive L2S rules from lexicon automatically, e.g.
    • is ih{zh}
    • the  {dh}ax …
  • Loop: Choose the next N most common set of words from the text and use the lexicon + L2S rules to predict pronunciations, e.g.
    • telephone -> /telaxfown/
    • He -> /hax/?
    • Rise -> /rihzhax/?
  • Assign a confidence score to each prediction by comparing each word to all words in lexicon
    • If is -> /ihzh} in lexicon and no other orthographically similar words are pronounced differently, new rule his -> /hihzh/ scores high
  • For low confidence pronunciations, Active Learning step:
    • Inspect and calculate error rate
    • Hand correct errors and add all to lexicon
slide18
Build a new set of L2S rules from augmented lexicon
      • Iterate from Loop until performance stabilizes
  • Results
    • English:
      • 94% success on test set after 23 iterations, 16K entry lexicon
      • Performance comparable to CMUDict and 1/7 the size
    • German:
      • 90% accuracy after 13 iterations, 28K lexicon
    • Nepali
improving pronunciation dictionary coverage fackrell and skut 04
Improving Pronunciation Dictionary Coverage (Fackrell and Skut ’04)
  • Idea: Many proper names have more than one spelling (e.g. More, Moore; Smith Smythe)
    • Find a mapping between OOV (Out of Vocabulary) spellings and alternate spellings – a ‘fuzzy’ match
    • Identify spelling alternations that are ‘pronunciation-neutral’ in an existing lexicon to produce rewrite rules for OOV words
how do current systems do on pronunciation
How do current systems do on pronunciation?
  • Loquendo (temporarily unavailable)
  • CEPSTRAL
  • AT&T Naturally Speaking
next class
Next Class

Accenting and information status

ad