Speech Recognition

Speech Recognition Speech Sounds of American English

Introduction • Speech was created since the inception of the human race. • In contrast writing is at most a few thousand years old. • Speech is available to anyone and everyone learns to speak without formal instruction and attains a comparable level of skill and fluency. • Speech is the most common and most natural manifestation of language. • Phonetics, the study of speech sounds, is the bedrock of the scientific study of language. Henderson in 1877 said that “The form of language is its sounds”. Veton Këpuska

Introduction • “Language is not a cultural artifact that we learn the way we learn to tell time or how the federal government works. Instead, it is a distinct piece of biological makeup of our brain.” – Steven Pinker “The Language Instinct: How the Mind Creates Language” Veton Këpuska

Introduction • Language is: • a complex, • specialized skill, • which develops in the child spontaneously, • without conscious effort or formal instruction, • is deployed without awareness of its underlying logic, • is qualitatively the same in every individual, and • is distinct from more general abilities to process information or behave intelligently. • For these reasons some cognitive scientist have described language as a psychological faculty, a mental organ, a neural system, and a computational module. • The term “instinct” is preferred to all the above. Veton Këpuska

Speech Sounds of American English • There are over 40 speech sounds in American English which can be organized by their basic manner of production Veton Këpuska

Speech Sounds of American English • Vowels, glides, and consonants differ in degree of constriction • Sonorant • nasals, • liquids and • glides) • are consonants that have no pressure build up at constriction • Nasal consonants that have no pressure build up at constriction • Continuantconsonants that do not block airflow in oral cavity Veton Këpuska

Phonemes of American English Veton Këpuska

Phonetic Alphabets Reference Veton Këpuska

ARPA-Bet Phone Set • SPHINX\ARPAbetExample.pdf Veton Këpuska

SPHYNX Phone Set Veton Këpuska

Vocal Tract Veton Këpuska

Vowel Production • No significant constriction in the vocal tract • Usually produced with periodic excitation • Acoustic characteristics depend on the position of the jaw, tongue, and lips Veton Këpuska

Vowels of American English • There are approximately 18 vowels in American English made up of monothongs, diphthongs, and reduced vowels (schwa’s) • They are often described by the articulatory features: High/Low, Front/Back, Retroflexed, Rounded, and Tense/Lax Veton Këpuska

Articulatory Features • Place of articulation: • Palatal <-> Velar • Palatal: Front • Velar: Back • Manner of Articulation: • Degree of Closure • Secondary Articulations: e.g. Lip Rounding • Aperture: • Close – the narrowest constriction • Open – widest opening of vocal track. • Mid – midway opening relative to close position and open position. • Vocal Track Shape • Raising or lowering the tongue • Advancing or retracting the body of the tongue. • Raising or lowering the jaw • Rounding the lips. Veton Këpuska

Vowels Veton Këpuska

Spectrograms of the Cardinal Vowels Veton Këpuska

Vowel Formant Averages • Vowels are often characterized by the lower three formants: • High/Low is correlated with the first formant, F1 • Front/Back is correlated with the second formant, F2 • Retroflexion is marked by a low third formant, F3 Veton Këpuska

Vowel Durations • Each vowel has a different intrinsic duration • Schwa’s have distinctly shorter durations (50ms) • /I, ℇ, Λ, Ʊ/ are the shortest monothongs • Context can greatly influence vowel duration Veton Këpuska

Happy Little Vowel Chart Robb's "So inaccurate, yet so useful." Veton Këpuska

fricatives Veton Këpuska

Fricative Production • Turbulence produced at narrow constriction • Constriction position determines acoustic characteristics • Can be produced with periodic excitation Veton Këpuska

Fricatives of American English • There are 8 fricatives in American English • Four places of articulation: Labio-Dental (Labial), Interdental (Dental), Alveolar, and Palato-Alveolar (Palatal) • They are often described by the features Voiced/Unvoiced, or Strident/Non-Strident (constriction behind alveolar ridge) Veton Këpuska

Spectrograms of Unvoiced Fricatives Veton Këpuska

Fricative Energy • Strident fricatives tend to be stronger than non-strident fricatives. Veton Këpuska

Fricative Durations • Voiced fricatives tend to be shorter than unvoiced fricatives. Veton Këpuska

Examples of Fricative Voicing Contrast Veton Këpuska

Friendly Little Consonant Chart Robb's "Somewhat more accurate, yet somewhat less useful." Veton Këpuska

What is this word? Veton Këpuska

STOPS Veton Këpuska

Stop Production • Complete closure in the vocal tract, pressure build up • Sudden release of the constriction, turbulence noise • Can have periodic excitation during closure Veton Këpuska

Stops of American English • There are 6 stop consonants in American English • Three places of articulation: Labial, Alveolar, and Velar • Each place of articulation has a voiced and unvoiced stop • Unvoiced stops are typically aspirated • Voiced stops usually exhibit a “voice-bar’’ during closure • Information about formant transitions and release useful for classification Veton Këpuska

Spectrograms of Unvoiced Stops Veton Këpuska

Examples of Stop Voicing Contrast Veton Këpuska

Singleton Stop Durations Veton Këpuska

Voicing Cues for Stops Veton Këpuska

/s/-Stop Durations Veton Këpuska

Examples of Front and Back Velars Veton Këpuska

NASALS Veton Këpuska

Nasal Production • Velum lowering results in airflow through nasal cavity • Consonants produced with closure in oral cavity • Nasal murmurs have similar spectral characteristics Veton Këpuska

Nasal of American English • Three places of articulation: Labial, Alveolar, and Velar • Nasal consonants are always attached to a vowel, though can form an entire syllable in unstressed environments ([n], [m], [ŋ]) • /ŋ/ is always post-vocalic in English • Place identified by neighboring formant transitions Veton Këpuska

Spectrograms of Nasals Veton Këpuska

semivowels Veton Këpuska

Semivowel Production • Constriction in vocal tract, no turbulence • Slower articulatory motion than other consonants • Laterals form complete closure with tongue tip, airflow via sides of constriction Veton Këpuska

Semivowels of American English • There are 4 semivowels in American English • Sometimes referred to as Liquids or Glides • Glides are a more extreme articulation of a corresponding vowel • Similar, though more extreme, formant positions • Generally weaker due to narrower constriction • Semivowels are always attached to a vowel, though /l/ can form an entire syllable in unstressed environments ([l]) Veton Këpuska

Spectrograms of Semivowels Veton Këpuska

Acoustic Properties of Semivowels • /w/ and /l/ are the most confusable semivowels • /w/ is characterized by a very low F1, F2 • Typically a rapid spectral fallow above F2 • /l/ is characterized by a low F1 and F2 • Often presence of high frequency energy • Postvocalic /l/ characterized by minimal spectral discontinuity, gradual motion of formants • /y/ is characterized by very low F1, very high F2 • /y/ only occurs in a syllable onset position (i.e., pre-vocalic) • /r/ is characterized by a very low F3 • Prevocalic F3 < medial F3 < postvocalic F3 Veton Këpuska

Speech Recognition