Phonology from a computational point of view. Phonemes, dialects, letter-to-sound conversion March 2001. Phonology:. The study of the sound patterns of languages. We will extend this to include the letter patterns of languages. Syntax. Information Retrieval. Morphology catch + PAST.
Phonemes, dialects, letter-to-sound conversion
The study of the sound patterns of languages.
We will extend this to include the letter patterns of languages.
Morphology catch + PAST
Phonemic representation K AO1 T
Text to speech (TTS) applications include a component which converts spelled words to sequences of phonemes ( = sound representations).
E.g., sightS AY1 T
John J AA1 N
THE DH AH0
THE(2) DH AH1
THE(3) DH IY0
THEA TH IY1 AH0
THEALL TH IY1 L
THEANO TH IY1 N OW0
THEATER TH IY1 AH0 T ER0
AA odd AA D units (“phonemes”) to which they do not add (or from which delete) during their adult lifetimes. 39 phonemes in American English.
AE at AE T
AH hut HH AH T
AO ought AO T
AW cow K AW
AY hide HH AY D
EH Ed EH D
ER hurt HH ER T
EY ate EY T
IH it IH T
IY eat IY T
OW oat OW T
OY toy T OY
UH hood HH UH D
UW two T UW
B be B IY units (“phonemes”) to which they do not add (or from which delete) during their adult lifetimes. 39 phonemes in American English.
D dee D IY
G green G R IY N
P pee P IY
T tea T IY
K key K IY
S sea S IY
SH she SH IY
F fee F IY
V vee V IY
DH thee DH IY
TH theta TH EY T AH
Z zee Z IY
ZH seizure S IY ZH ER
HH he HH IY
CH cheese CH IY Z
JH gee JH IY
L lee L IY
M me M IY
N knee N IY
NG ping P IY NG
R read R IY D
W we W IY
Y yield Y IY L D24 Consonants
But speech recognition systems need to be trained on this, just as people are in their youth.
Most phonemes have several different pronunciations (called their allophones), determined by nearby sounds, most usually by the following sound.
The most striking instance of such variation is in the realization of the phoneme /T/ in American English.
The syllable units (“phonemes”) to which they do not add (or from which delete) during their adult lifetimes. 39 phonemes in American English.
h e l p
*But in the words to, tonight, today, tomorrow, the toacts as if it were linked to the preceding word. “go [D]o bed”
s s units (“phonemes”) to which they do not add (or from which delete) during their adult lifetimes. 39 phonemes in American English.
onset rhyme onset rhyme
B UH1 T ER
This is where we get a flap in American English
Within a word: units (“phonemes”) to which they do not add (or from which delete) during their adult lifetimes. 39 phonemes in American English.
we don't say my tomato my [D]omato
If a word ends in a /t/ and the next word starts with a vowel, flap is normal:
at [D] all, What [D] is your name?, etc.
If a word ends in a vowel and the next word starts with a vowel, never a flap – unless the second word starts with the prefix to- !
the [t] tomato, the [t] topology of… but
go [D] to the moon, go [D] tomorrow…
by (always) treating phonemes in the context of their left- and right-hand neighbors.
Need to produce an AE? Find out what neighbors it needs to be produced next to. H AE T? Find an AE that was produced after an H and before a T.
William Labov is the master analyst of this material, and many papers are available at his web site:
See especially his
http://www.ling.upenn.edu/phono_atlas/ICSLP4.html …Dialect Diversity in North America
1. Loss of difference between AA (cot) and AO (caught). See also hot dog (h AA t d AO g).
Some speakers produce these vowels differently (I do). Others do not.
Labov’s group has produced the following map:
ink-pen versus baby-pin:
distinction lost in the South.
A very wide range of American speakers do NOT have the same vowels in sand and sang.
The vowels in cat and sang are the same, but in sand the vowel is much higher.
However, in the Northern Cities shift, all AE is pronounced like the last two syllables of idea – this is prevalent right here in the south Chicago area.
LTS: Letter to sound, or
In most languages, this is simple.
But in English and in French, it’s very messy.
Why? Because the spelling system in both is based on how the language usedto be pronounced, and the pronunciation has since changed.
In most other languages, spelling reflects current pronunciation much more accurately.
Stress: most languages don’t mark which syllable is stressed. In some languages, there are simple principles that tell us which syllable is stressed, but when there are no such principles (e.g. English, Russian), then you need to build word-lists with the stressed indicated.
There are always new words being found, and most of them are new proper names (people, places, products, companies, etc.)
Third ESCA/COCOSDA Workshop on SPEECH SYNTHESIS November 1998
They contest Liberman and Church’s statement in 1991:
“We will describe algorithms for pronunciation of English words…that reduce the error rate to only a few tenths of a percent for ordinary text, about two orders of magnitude better than the word error rates of 15% or so that were common a decade ago.”
“In this paper, we have shown that automatic pronunciation of novel words is not a solved problem in TTS synthesis. The best that can be done is about 70% words correct using PbA [Pronunciation by Analogy]…traditional rules…perform very badly – much worse than pronunciation by analogy and other data-driven approaches….”
Compare 4 approaches:
Systems typically use
Is it fair to test the backoff strategy on words in the first two sets, then?
Damper et al propose: to Sound
This approach is a variant on decision-tree learning (an important paradigm in machine learning)….
In simplest terms, a decision-tree approach studies a problem like, “What phoneme realizes this letter in this context?” by looking at all relevant examples in the data, and considering all context data (what precedes, what follows, etc.) and deciding, first, which factor “gives the most information”:
Measure the uncertainty first: uncertainty of how this “t” should be pronounced;
Measure the uncertainty if you know what the following letter is.
Set of possibilities for realizing ‘t’:
0.64 * log (0.64) + 0.36 * log (0.36)
and multiply by –1 = 0.94268
realization of ‘t’: problem like, “What phoneme realizes this letter in this context?” by looking at
if following letter is ‘h’ (36%)
Entropy: -1(.02*log(.02) + .98 log(.98) ) =
.14144 (base 2 logs!)
if following letter is anything else: (64%)
Entropy: -1 ( 1* log 1)+0 log 0 ) = 0
Total entropy now: 0.36 * .14144 + 0 =
.05092 – a huge decrease from 0.94268!
The idea is to use this method of testing to automatically determine which aspects of a letter’s neighborhood are most revealing in determining how that letter should be realized in that word.
But: 57.4% fully correct results in this experiment.
A first look at Viterbi in action
E X E C U T I O N answer that question, we have to make some specifications.
I N T E N T I O N
These are free; and there are no reduced fares for any kind
of partial match for the others.
Cost: 3 substitutions + 2 hangings = 8 answer that question, we have to make some specifications.
E X E C U T I O N
I N T E N T I O N
Cost: 1 substitutions + 6 hangings = 8 answer that question, we have to make some specifications.
Same cost – that’s how we’ve set up the problem.
E X E C U T I O N
I N T E N T I O N