Sonorant grab bag
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

Sonorant Grab Bag PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on
  • Presentation posted in: General

Sonorant Grab Bag. March 27, 2014. Speech Synthesis: A Basic Overview. Speech synthesis is the generation of speech by machine. The reasons for studying synthetic speech have evolved over the years: Novelty To control acoustic cues in perceptual studies

Download Presentation

Sonorant Grab Bag

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Sonorant grab bag

Sonorant Grab Bag

March 27, 2014


Speech synthesis a basic overview

Speech Synthesis:A Basic Overview

  • Speech synthesis is the generation of speech by machine.

  • The reasons for studying synthetic speech have evolved over the years:

  • Novelty

  • To control acoustic cues in perceptual studies

  • To understand the human articulatory system

    • “Analysis by Synthesis”

  • Practical applications

    • Reading machines for the blind, navigation systems


Speech synthesis a basic overview1

Speech Synthesis:A Basic Overview

  • There are four basic types of synthetic speech:

  • Mechanical synthesis

  • Formant synthesis

    • Based on Source/Filter theory

  • Concatenative synthesis

    • = stringing bits and pieces of natural speech together

  • Articulatory synthesis

    • = generating speech from a model of the vocal tract.


1 mechanical synthesis

1. Mechanical Synthesis

  • The very first attempts to produce synthetic speech were made without electricity.

    • = mechanical synthesis

  • In the late 1700s, models were produced which used:

    • reeds as a voicing source

    • differently shaped tubes for different vowels


Mechanical synthesis part ii

Mechanical Synthesis, part II

  • Later, Wolfgang von Kempelen and Charles Wheatstone created a more sophisticated mechanical speech device…

    • with independently manipulable source and filter mechanisms.


Mechanical synthesis part iii

Mechanical Synthesis, part III

  • An interesting historical footnote:

    • Alexander Graham Bell and his “questionable” experiments with his dog.

  • Mechanical synthesis has largely gone out of style ever since.

    • …but check out Mike Brady’s talking robot.


The voder

The Voder

  • The next big step in speech synthesis was to generate speech electronically.

  • This was most famously demonstrated at the New York World’s Fair in 1939 with the Voder.

  • The Voder was a manually controlled speech synthesizer.

    • (operated by highly trained young women)


Voder principles

Voder Principles

  • The Voder basically operated like a vocoder.

  • Voicing and fricative source sounds were filtered by 10 different resonators…

  • each controlled by an individual finger!

  • Only about 1 in 10 had the ability to learn how to play the Voder.


Overtone singing

Overtone Singing

  • F0 stays the same (on a “drone”), while singer shapes the vocal tract so that individual harmonics (“overtones”) resonate.

  • What kind of voice quality would be conducive to this?


Vowels and sonorants

Vowels and Sonorants

  • So far, we’ve talked a lot about the acoustics of vowels:

    • Source: periodic openings and closings of the vocal folds.

    • Filter: characteristic resonant frequencies of the vocal tract (above the glottis)

  • Today, we’ll talk about the acoustics of sonorants:

    • Nasals

    • Laterals

    • Approximants

  • The source/filter characteristics of sonorants are similar to vowels… with a few interesting complications.


Damping

Damping

  • One interesting acoustic property exhibited by (some) sonorants is damping.

  • Recall that resonance occurs when:

    • a sound wave travels through an object

    • that sound wave is reflected...

    • ...and reinforced, on a periodic basis

  • The periodic reinforcement sets up alternating patterns of high and low air pressure

    • = a standing wave


Resonance in a closed tube

Resonance in a closed tube

t

i

m

e


Damping schematized

Damping, schematized

  • In a closed tube:

    • With only one pressure pulse from the loudspeaker, the wave will eventually dampen and die out.

  • Why?

    • The walls of the tube absorb some of the acoustic energy, with each reflection of the standing wave.


Damping comparison

Damping Comparison

  • A heavily damped wave wil die out more quickly...

  • Than a lightly damped wave:


Damping factors

Damping Factors

  • The amount of damping in a tube is a function of:

    • The volume of the tube

    • The surface area of the tube

    • The material of which the tube is made

  • More volume, more surface area = more damping

  • Think about the resonant characteristics of:

    • a Home Depot

    • a post-modern restaurant

    • a movie theater

    • an anechoic chamber


An anechoic chamber

An Anechoic Chamber


Resonance and recording

Resonance and Recording

  • Remember: any room will reverberate at its characteristic resonant frequencies

  • Hence: high quality sound recordings need to be made in specially designed rooms which damp any reverberation

  • Examples:

    • Classroom recording (29 dB signal-to-noise ratio)

    • “Soundproof” booth (44 dB SNR)

    • Anechoic chamber (90 dB SNR)


Spectrograms

Spectrograms

classroom

“soundproof” booth


Spectrograms1

Spectrograms

anechoic chamber


Inside your nose

Inside Your Nose

  • In nasals, air flows through the nasal cavities.

  • The resonating “filter” of nasal sounds therefore has:

    • increased volume

    • increased surface area

    •  increased damping

  • Note:

    • the exact size and shape of the nasal cavities varies wildly from speaker to speaker.


Nasal variability

Nasal Variability

  • Measurements based on MRI data (Dang et al., 1994)


Damping effects part 1

Damping Effects, part 1

  • Damping by the nasal cavities decreases the overall amplitude of the sound coming out through the nose.

[m]

[m]


Damping effects part 2

Damping Effects, part 2

  • How might the power spectrum of an undamped wave:

  • Compare to that of a damped wave?

  • A: Undamped waves have only one component;

    • Damped waves have a broader range of components.


Here s why

Here’s Why

100 Hz sinewave

+

90 Hz sinewave

+

110 Hz sinewave


The result

The Result

90 Hz +

100 Hz +

110 Hz

  • If the 90 Hz and 110 Hz components have less amplitude than the 100 Hz wave, there will be less damping:


Damping spectra

Damping Spectra

light

medium


Damping spectra1

Damping Spectra

heavy

  • Damping increases the bandwidth of the resonating filter.

    • Bandwidth = the range of frequencies over which a filter will respond at .707 of its maximum output.

  •  Nasal formants will have a larger bandwidth than vowel formants.


Bandwidth in spectrograms

Bandwidth in Spectrograms

F3 of

F3 of [m]

The formants in nasals have increased bandwidth, in comparison to the formants in vowels.


Nasal formants

Nasal Formants

  • The values of formant frequencies for nasal stops can be calculated according to the same formula that we used for to calculate formant frequencies for an open tube.

  • fn = (2n - 1) * c

    • 4L

  • The simplest case: uvular nasal .

  • The length of the tube is a combination of:

    • distance from glottis to uvula(9 cm)

    • distance from uvula to nares(12.5 cm)

  • An average tube length (for adult males): 21.5 cm


The math

The Math

12.5 cm

  • fn = (2n - 1) * c

    • 4L

  • L = 21.5 cm

  • c = 35000 cm/sec

  • F1 = 35000

  • 86

  • = 407 Hz

  • F2 = 1221 Hz

  • F3 = 2035 Hz

  • 9 cm


    The real thing

    The Real Thing

    • Check out Peter’s production of an uvular nasal in Praat.

      • And also Dustin’s neutral vowel!

    • Note: the higher formants are low in amplitude

    • Some reasons why:

      • Overall damping

      • “Nostril-rounding” reduces intensity

      • Resonance is lost in the side passages of the sinuses.

    • Nasal stops with fronter places of articulation also have anti-formants.


    Anti formants

    Anti-Formants

    • For nasal stops, the occlusion in the mouth creates a side cavity.

    • This side cavity resonates at particular frequencies.

    • These resonances absorb acoustic energy in the system.

    • They form anti-formants


    Anti formant math

    Anti-Formant Math

    • Anti-formant resonances are based on the length of the vocal tract tube.

    • For [m], this length is about 8 cm.

    8 cm

    • fn = (2n - 1) * c

      • 4L

    L = 8 cm

    AF1 = 35000 / 4*8 = 1094 Hz

    AF2 = 3281 Hz

    etc.


    Spectral signatures

    Spectral Signatures

    • In a spectrogram, acoustic energy lowers--or drops out completely--at the anti-formant frequencies.

    anti-formants


    Nasal place cues

    Nasal Place Cues

    • At more posterior places of articulation, the “anti-resonating” tube is shorter.

      •  anti-formant frequencies will be higher.

    • for [n], L = 5.5 cm

      • AF1 = 1600 Hz

      • AF2 = 4800 Hz

    • for , L = 3.3 cm

      • AF1 = 2650 Hz

    • for , L = 2.3 cm

      • AF1 = 3700 Hz


    M vs n

    [m] vs. [n]

    [m]

    [e]

    [n]

    [o]

    AF1 (n)

    AF1 (m)

    • Production of [meno], by a speaker of Tsonga

    • Tsonga is spoken in South Africa and Mozambique


    Nasal stop acoustics summary

    Nasal Stop Acoustics: Summary

    • Here’s the general pattern of what to look for in a spectrogram for nasals:

    • Periodic voicing.

    • Overall amplitude lower than in vowels.

    • Formants (resonance).

    • Formants have broad bandwidths.

    • Low frequency first formant.

    • Less space between formants.

    • Higher formants have low amplitude.


    Perceiving nasal place

    Perceiving Nasal Place

    • Nasal “murmurs” do not provide particularly strong cues to place of articulation.

    • Can you identify the following as [m], [n] or ?

    • Repp (1986) found that listeners can only distinguish between [n] and [m] 72% of the time.

    • Transitions provide important place cues for nasals.

    • Repp (1986): 95% of nasals identified correctly when presented with the first 10 msec of the following vowel.

    • Can you identify these nasal + transition combos?


  • Login