Sonorant grab bag
1 / 38

Sonorant Grab Bag - PowerPoint PPT Presentation

  • Uploaded on

Sonorant Grab Bag. March 27, 2014. Speech Synthesis: A Basic Overview. Speech synthesis is the generation of speech by machine. The reasons for studying synthetic speech have evolved over the years: Novelty To control acoustic cues in perceptual studies

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Sonorant Grab Bag' - tawana

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Sonorant grab bag
Sonorant Grab Bag

March 27, 2014

Speech synthesis a basic overview
Speech Synthesis:A Basic Overview

  • Speech synthesis is the generation of speech by machine.

  • The reasons for studying synthetic speech have evolved over the years:

  • Novelty

  • To control acoustic cues in perceptual studies

  • To understand the human articulatory system

    • “Analysis by Synthesis”

  • Practical applications

    • Reading machines for the blind, navigation systems

Speech synthesis a basic overview1
Speech Synthesis:A Basic Overview

  • There are four basic types of synthetic speech:

  • Mechanical synthesis

  • Formant synthesis

    • Based on Source/Filter theory

  • Concatenative synthesis

    • = stringing bits and pieces of natural speech together

  • Articulatory synthesis

    • = generating speech from a model of the vocal tract.

1 mechanical synthesis
1. Mechanical Synthesis

  • The very first attempts to produce synthetic speech were made without electricity.

    • = mechanical synthesis

  • In the late 1700s, models were produced which used:

    • reeds as a voicing source

    • differently shaped tubes for different vowels

Mechanical synthesis part ii
Mechanical Synthesis, part II

  • Later, Wolfgang von Kempelen and Charles Wheatstone created a more sophisticated mechanical speech device…

    • with independently manipulable source and filter mechanisms.

Mechanical synthesis part iii
Mechanical Synthesis, part III

  • An interesting historical footnote:

    • Alexander Graham Bell and his “questionable” experiments with his dog.

  • Mechanical synthesis has largely gone out of style ever since.

    • …but check out Mike Brady’s talking robot.

The voder
The Voder

  • The next big step in speech synthesis was to generate speech electronically.

  • This was most famously demonstrated at the New York World’s Fair in 1939 with the Voder.

  • The Voder was a manually controlled speech synthesizer.

    • (operated by highly trained young women)

Voder principles
Voder Principles

  • The Voder basically operated like a vocoder.

  • Voicing and fricative source sounds were filtered by 10 different resonators…

  • each controlled by an individual finger!

  • Only about 1 in 10 had the ability to learn how to play the Voder.

Overtone singing
Overtone Singing

  • F0 stays the same (on a “drone”), while singer shapes the vocal tract so that individual harmonics (“overtones”) resonate.

  • What kind of voice quality would be conducive to this?

Vowels and sonorants
Vowels and Sonorants

  • So far, we’ve talked a lot about the acoustics of vowels:

    • Source: periodic openings and closings of the vocal folds.

    • Filter: characteristic resonant frequencies of the vocal tract (above the glottis)

  • Today, we’ll talk about the acoustics of sonorants:

    • Nasals

    • Laterals

    • Approximants

  • The source/filter characteristics of sonorants are similar to vowels… with a few interesting complications.


  • One interesting acoustic property exhibited by (some) sonorants is damping.

  • Recall that resonance occurs when:

    • a sound wave travels through an object

    • that sound wave is reflected...

    • ...and reinforced, on a periodic basis

  • The periodic reinforcement sets up alternating patterns of high and low air pressure

    • = a standing wave

Damping schematized
Damping, schematized

  • In a closed tube:

    • With only one pressure pulse from the loudspeaker, the wave will eventually dampen and die out.

  • Why?

    • The walls of the tube absorb some of the acoustic energy, with each reflection of the standing wave.

Damping comparison
Damping Comparison

  • A heavily damped wave wil die out more quickly...

  • Than a lightly damped wave:

Damping factors
Damping Factors

  • The amount of damping in a tube is a function of:

    • The volume of the tube

    • The surface area of the tube

    • The material of which the tube is made

  • More volume, more surface area = more damping

  • Think about the resonant characteristics of:

    • a Home Depot

    • a post-modern restaurant

    • a movie theater

    • an anechoic chamber

Resonance and recording
Resonance and Recording

  • Remember: any room will reverberate at its characteristic resonant frequencies

  • Hence: high quality sound recordings need to be made in specially designed rooms which damp any reverberation

  • Examples:

    • Classroom recording (29 dB signal-to-noise ratio)

    • “Soundproof” booth (44 dB SNR)

    • Anechoic chamber (90 dB SNR)



“soundproof” booth


anechoic chamber

Inside your nose
Inside Your Nose

  • In nasals, air flows through the nasal cavities.

  • The resonating “filter” of nasal sounds therefore has:

    • increased volume

    • increased surface area

    •  increased damping

  • Note:

    • the exact size and shape of the nasal cavities varies wildly from speaker to speaker.

Nasal variability
Nasal Variability

  • Measurements based on MRI data (Dang et al., 1994)

Damping effects part 1
Damping Effects, part 1

  • Damping by the nasal cavities decreases the overall amplitude of the sound coming out through the nose.



Damping effects part 2
Damping Effects, part 2

  • How might the power spectrum of an undamped wave:

  • Compare to that of a damped wave?

  • A: Undamped waves have only one component;

    • Damped waves have a broader range of components.

Here s why
Here’s Why

100 Hz sinewave


90 Hz sinewave


110 Hz sinewave

The result
The Result

90 Hz +

100 Hz +

110 Hz

  • If the 90 Hz and 110 Hz components have less amplitude than the 100 Hz wave, there will be less damping:

Damping spectra
Damping Spectra



Damping spectra1
Damping Spectra


  • Damping increases the bandwidth of the resonating filter.

    • Bandwidth = the range of frequencies over which a filter will respond at .707 of its maximum output.

  •  Nasal formants will have a larger bandwidth than vowel formants.

Bandwidth in spectrograms
Bandwidth in Spectrograms

F3 of

F3 of [m]

The formants in nasals have increased bandwidth, in comparison to the formants in vowels.

Nasal formants
Nasal Formants

  • The values of formant frequencies for nasal stops can be calculated according to the same formula that we used for to calculate formant frequencies for an open tube.

  • fn = (2n - 1) * c

    • 4L

  • The simplest case: uvular nasal .

  • The length of the tube is a combination of:

    • distance from glottis to uvula (9 cm)

    • distance from uvula to nares (12.5 cm)

  • An average tube length (for adult males): 21.5 cm

The math
The Math

12.5 cm

  • fn = (2n - 1) * c

    • 4L

  • L = 21.5 cm

  • c = 35000 cm/sec

  • F1 = 35000

  • 86

  • = 407 Hz

  • F2 = 1221 Hz

  • F3 = 2035 Hz

  • 9 cm

    The real thing
    The Real Thing

    • Check out Peter’s production of an uvular nasal in Praat.

      • And also Dustin’s neutral vowel!

    • Note: the higher formants are low in amplitude

    • Some reasons why:

      • Overall damping

      • “Nostril-rounding” reduces intensity

      • Resonance is lost in the side passages of the sinuses.

    • Nasal stops with fronter places of articulation also have anti-formants.

    Anti formants

    • For nasal stops, the occlusion in the mouth creates a side cavity.

    • This side cavity resonates at particular frequencies.

    • These resonances absorb acoustic energy in the system.

    • They form anti-formants

    Anti formant math
    Anti-Formant Math

    • Anti-formant resonances are based on the length of the vocal tract tube.

    • For [m], this length is about 8 cm.

    8 cm

    • fn = (2n - 1) * c

      • 4L

    L = 8 cm

    AF1 = 35000 / 4*8 = 1094 Hz

    AF2 = 3281 Hz


    Spectral signatures
    Spectral Signatures

    • In a spectrogram, acoustic energy lowers--or drops out completely--at the anti-formant frequencies.


    Nasal place cues
    Nasal Place Cues

    • At more posterior places of articulation, the “anti-resonating” tube is shorter.

      •  anti-formant frequencies will be higher.

    • for [n], L = 5.5 cm

      • AF1 = 1600 Hz

      • AF2 = 4800 Hz

    • for , L = 3.3 cm

      • AF1 = 2650 Hz

    • for , L = 2.3 cm

      • AF1 = 3700 Hz

    M vs n
    [m] vs. [n]





    AF1 (n)

    AF1 (m)

    • Production of [meno], by a speaker of Tsonga

    • Tsonga is spoken in South Africa and Mozambique

    Nasal stop acoustics summary
    Nasal Stop Acoustics: Summary

    • Here’s the general pattern of what to look for in a spectrogram for nasals:

    • Periodic voicing.

    • Overall amplitude lower than in vowels.

    • Formants (resonance).

    • Formants have broad bandwidths.

    • Low frequency first formant.

    • Less space between formants.

    • Higher formants have low amplitude.

    Perceiving nasal place
    Perceiving Nasal Place

    • Nasal “murmurs” do not provide particularly strong cues to place of articulation.

    • Can you identify the following as [m], [n] or ?

    • Repp (1986) found that listeners can only distinguish between [n] and [m] 72% of the time.

    • Transitions provide important place cues for nasals.

    • Repp (1986): 95% of nasals identified correctly when presented with the first 10 msec of the following vowel.

    • Can you identify these nasal + transition combos?