slide1 l.
Skip this Video
Loading SlideShow in 5 Seconds..
CS 551/651: Structure of Spoken Language Lecture 7: Syllable Structure, Vowel Neutralization, and Coarticulation John-P PowerPoint Presentation
Download Presentation
CS 551/651: Structure of Spoken Language Lecture 7: Syllable Structure, Vowel Neutralization, and Coarticulation John-P

Loading in 2 Seconds...

play fullscreen
1 / 25

CS 551/651: Structure of Spoken Language Lecture 7: Syllable Structure, Vowel Neutralization, and Coarticulation John-P - PowerPoint PPT Presentation

  • Uploaded on

CS 551/651: Structure of Spoken Language Lecture 7: Syllable Structure, Vowel Neutralization, and Coarticulation John-Paul Hosom Fall 2010. Syllables Words are composed of phonetic clusters: syllables

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

CS 551/651: Structure of Spoken Language Lecture 7: Syllable Structure, Vowel Neutralization, and Coarticulation John-P

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

CS 551/651:

Structure of Spoken Language

Lecture 7: Syllable Structure, Vowel Neutralization, and Coarticulation

John-Paul Hosom

Fall 2010



  • Words are composed of phonetic clusters: syllables
  • Each syllable has a nucleus; typically the nucleus isa vowel or diphthong, sometimes a syllabic nasal or lateral (button, bottle) or rhotacized (r-colored) vowel (bird)
  • Nucleus is syllabic nasal or lateral only when following alveolar consonant in previous syllable of a word
  • Syllable boundaries sometimes ambiguous: “beefeater”: beef/eater bee/feater (Hunt, ICSLP04) “dolphin”: dol/phin dolph/in (Wells, 1990)“tender”: ten/der tend/er (Wells, 1990)
  • Syllable can be broken into components: syllable contains {onset, rhyme} rhyme contains {nucleus, coda}onset and coda are consonants, rhyme is a vowel, syllabicnasal, or syllabic lateral


Limitations on consonant


not all CCC combinations

are possible in syllable-initial

position. Of those that are

possible, almost half are very


possibly only one word in English: “spew”

only a few English words pronounced

(optionally) with /s t y/: “Stewart”, “steward”, “stew”

very few English words/root with /s k l/: “sclerosis”

very few English words with /s k y/:

“skew”, “askew”, “obscure”

graphic from



  • Sonority corresponds roughly to degree of constrictionalong vocal and/or nasal tract
  • Ordering of sonority: vowels, glides (/w/, /y/), liquids (/l/, /r/), nasals, fricatives, affricates, plosives
  • If a binary classification (sonorant/non-sonorant), then sonorant consists of all vowels, glides, liquids, and nasals.
  • Fricatives, affricates, and plosives may be clustered into onecategory, “obstruents,” for purposes of sonority
  • Syllabification can be done according to “sonority principle”;the sonority must rise and fall in a syllable
  • Also, there’s the Maximal Onset Principle:“Put a consonant in the onset rather than the coda when possible”


  • Because of rise and fall of sonority in syllables, the followingrestrictions occur: (a) glide (/w/,/y/) must be immediately adjacent to a vowel, (b) /r/ is next closest consonant to vowel, (c) /l/ is next closest consonant to vowel, (d) nasal is next closest, (e) obstruent is farthest from the vowel (but there may be more than one obstruent in onset or coda)
  • Obstruents in a cluster must have same voicing
  • In series of obstruents between two vowels, voicing can change only once, at the syllable boundary.
  • English allows up to 3 consonants in syllable initial position, 4 consonants at syllable final position


  • Examples: sphere /s f iy r/, streak /s t r iy k/, texts /t eh k s t s/, helms /h eh l m z/ but not /s t l iy/, /s p w iy/, /z b r ay/, etc.
  • The ordering of glides and liquids doesn’t matter for our purposes (applying to syllabification), because glides and liquids can not occur sequentially within the same syllablein English. (However, two liquids in the same syllable arepossible, e.g. “Carl”, as long as /r/ is closer to the vowel than /l/.)
  • In English, some burst-fricative pairs are represented as distinct phonemes (/ch/, /jh/), although there are some othercases of burst-fricative pairs that remain distinct (e.g. “tsunami,” “bishops”, “six”).
  • It’s also possible to have two or more adjacent fricatives: “eleven twelfths” (note 4 consonants after final vowel)

Vowel Neutralization

  • When speech is uttered very quickly (or is not well enunciated),the formants tend to shift toward that of a neutral vowel:

(from Daniloff, p. 320)

(from van Bergem 1993 p. 8)


Vowel Neutralization

  • Target undershoot:

/m ih pc ph ih eh/


Vowel Neutralization

/m ih pc ph ih eh/

Target undershoot: /ih/ extracted and concatenated from “mip”:


Vowel Neutralization

  • However, neutralization is not always so simple; sometimesvowel formants shift away from the neutral position,depending on their context, and vowels tend toward slightlydifferent neutral “targets”.
  • Neutralization is to some extent an artifact of averagingover speakers and contexts (van Bergem 1993)

vowels from one speaker

in different phonetic

contexts, and in “reduced”

and “isolated” speaking




  • Coarticulation is the “blending” of adjacent speech sounds,
  • due to gradual movement of the articulators.
  • Coarticulation makes automatic speech recognition andtext-to-speech synthesis difficult, but humans use coarticulationto conserve effort while speaking and provide robustnessduring recognition.
  • There is Right-to-Left (RL) or “anticipatory” and Left-to-Right (LR) or “carry-over” coarticulation
  • Models of coarticulation and syllabification:  Locus Theory Modified Locus Theory (Klatt)Öhman’s Theory Kozhevnikov-Chistovich (KC) Theory Wickelgren’s Theory, etc.


RL coarticulation occurs due to high-level planning of phonetic


“spoon”: [s p uw n]

rounding in isolation –– + –

rounding in context + + + +

more observable if neighboring sounds not specified with respect

to potentially coarticulated feature; e.g. /s/, /p/, /n/ not specified

with respect to lip rounding

(from Daniloff, pp. 323-324)


Coarticulation: Locus Theory

Locus Theory (Delattre, Liberman, and Cooper, 1955)

“there are, for each consonant, characteristic frequency positions,

or loci, at which the formant transitions begin, or to which they

may be assumed to point. On this basis, the transitions may be

regarded simply as movements of the formants from their

respective loci to the frequency levels appropriate for the next

phone … The spectrographic patterns …, which produce /d/ before

/iy/, /aa/, and /ow/, show how … these transitions seem to be pointing

to a [F2] locus in the vicinity of 1800 [Hz].”

 Each consonant has “target frequencies” independent of the

neighboring vowels.

 Formants transition from these target frequencies to the vowel

target frequencies.


Coarticulation: Locus Theory

  • Locus Theory:
  • Consonants and vowels both have “targets” of articulatorpositions and therefore formant frequency locations
  • Given sufficient duration of a syllable, all phonemes reachtheir targets
  • The slope of the formants during a transition from a consonantto a vowel is relatively constant until reaching the target
  • If the syllable duration doesn’t allow enough time for theformants to reach their targets, “target undershoot” occursand the formants change direction before fully realizingthe intended vowel

Coarticulation: Locus Theory

  • Locus Theory:

(From Klatt 1987, p. 753)


Coarticulation: Modified Locus Theory

  • Problems with Locus Theory:
  • A transition may have both rapid and slow components;rapid release of obstruction via tongue tip, followed by slow movement of tongue body.
  • Preceding vowel can influence F2 onset of a CV transition(Öhman, 1966)
  • F2 may be insensitive to oral constrictions (obstruents)if the tongue position is toward the front of the mouth (as in /iy/)
  • (as reported by Fant 1973, Klatt1987)

Coarticulation: Modified Locus Theory

  • Modified Locus Theory:
  • Klatt hypothesized that main effects of the vowel on thearticulation of consonants are front/back position and liprounding
  • Vowels divided into three sets: {+front} {+round} {–front, –round}(because there are no rounded front vowels in English,sets 1 and 2 are mutually exclusive)
  • {+front} /iy ih eh ae/
  • {+round} /uw ao ow er/
  • {–front, –round} /uh ah aa aw/
  • Predicted Fonset from Ftarget for these 3 classes (locus theory)
  • Achieved 95% intelligibility for CVC nonsense syllables

Coarticulation: Locus Theory

  • Modified Locus Theory:

×= -front, -round

° = +front

• = +round

(From Klatt 1987, p. 754)


Coarticulation: Öhman’s Theory

Öhman (1966) found that loci of consonants is NOT independent

of neighboring vowels:

and that for /g/ more than one locus is required


consonant “gestures” are superimposed on vowel “gestures”

that are present during the consonant; even when consonant

is being uttered in VCV, there is effect of both V on C.


Coarticulation: Öhman’s Theory

Öhman (1967) proposed model of coarticulation based on

vocal-tract shape evolving over time. Assumes that vocal-tract

shapes can be mapped to formant frequencies.

For VCV utterances:

where s(x,t) is the vocal tract shape at position x and time t,

v(x,t) is the vocal tract shape at position x for a given vowel

as it varies over time from vowel 1 to

vowel 2, c(x) is the vocal tract shape of

the consonant, k(t) is an interpolation

value (from 0 to 1), and wc(x) describes

the degree to which c(x) “resists”



Coarticulation: Kozhevnikov-Chistovich (KC) Theory

  • Syllabification using CnV pattern: CV, CCV, CCCV, …
  • phrase “give true answers”:
  • g ih v t r uw ae n s er z
  • −−−− −−−−−−−−−−− −− −−−−−−− −
  • S1 S2 S3 S4 S5
  • (2) Measured relative durations of words, “syllables”, vowels:
  • relative duration of vowel = Dvow / Dsyll,syllable = Dsyll / Dword word = Dword / Dphrase

Coarticulation: Kozhevnikov-Chistovich (KC) Theory

They measured articulatory effects of vowel on consonants.

They found coarticulation within syllable but not across syllables:

C1 V1 C2 C3 V2

  • articulatory gestures for consonant(s) and vowel begin nearly
  • simultaneously with onset of initial consonant in syllable
  • Example: lip rounding in /uw/ begins with /v/ in “give true answers”,
  • but nasalization of /ae/ does not occur.
  • focused only on LR coarticulation, effect of V on previous C.
  • assumes motor programming of speech is discontinuous at VC boundary
  • counter-examples showing LR coarticulation (Moll and Daniloff 1971, Kent, Carney, and Severeid 1974, Öhman 1966)

Coarticulation: Wickelgren’s Theory

Speech units are mentally coded as context-sensitive units:

in phonetic string /X Y Z/, Y is encoded as XYZ

“By assuming (context-sensitive) allophones to be the basic

unit of articulation, … it is trivial to account for how the ‘same

phoneme’ in different phonemic environments can be …

different in some respects at all levels of the speech process”

(Wickelgren 1969, p. 11)

However, coarticulation can spread over more than one phone

(up to seven phones distance). Other criticisms: MacNeilage 1970,

Whitaker 1970, Halwes and Jenkins 1971; “Allophonic richness

may only beget strategic poverty” (Kent and Minifie 1977)

However, Wickelgren’s is theonly model currently used in

ASR and concatenative text-to-speech (exceptions: Wouters 2001,

Wrede 2001).


Coarticulation: Gay’s Theory

  • Gay, 1977: The syllabic unit of motor organization is the CV unit
  • Based on X-ray motion pictures of VCV utterances
  • anticipatory tongue movements for V2 in V1CV2 sequencedon’t begin until closure of C has been attained
  • movement toward V2 occurs during closure of C, havinga large effect on position and shape of tongue during releaseof closure
  • V1 has little effect on position of tongue at moment ofclosure
  • supports KC theory; conflicts with Öhman’s findings


  • Other models: MacNeilage, Henke, Benguerel and Cowan,Moll and Daniloff, Liberman, Tatham, etc.
  • Some are “feature based” in that each phonetic segmentis assigned distinctive features which can then be modifiedin regular ways
  • Some are “hierarchical models”, with several levels oforganization and complex interaction between levels
  • However, “coarticulatory patterns are not explainedadequately by any … theories or models” (Kent and Minifie, 1977)
  • Conflicting evidence (Öhman and Kent & Moll vs. KC and Gay)