sonorant grab bag
Download
Skip this Video
Download Presentation
Sonorant Grab Bag

Loading in 2 Seconds...

play fullscreen
1 / 38

Sonorant Grab Bag - PowerPoint PPT Presentation


  • 138 Views
  • Uploaded on

Sonorant Grab Bag. March 27, 2014. Speech Synthesis: A Basic Overview. Speech synthesis is the generation of speech by machine. The reasons for studying synthetic speech have evolved over the years: Novelty To control acoustic cues in perceptual studies

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Sonorant Grab Bag' - tawana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
sonorant grab bag
Sonorant Grab Bag

March 27, 2014

speech synthesis a basic overview
Speech Synthesis:A Basic Overview
  • Speech synthesis is the generation of speech by machine.
  • The reasons for studying synthetic speech have evolved over the years:
  • Novelty
  • To control acoustic cues in perceptual studies
  • To understand the human articulatory system
    • “Analysis by Synthesis”
  • Practical applications
    • Reading machines for the blind, navigation systems
speech synthesis a basic overview1
Speech Synthesis:A Basic Overview
  • There are four basic types of synthetic speech:
  • Mechanical synthesis
  • Formant synthesis
    • Based on Source/Filter theory
  • Concatenative synthesis
    • = stringing bits and pieces of natural speech together
  • Articulatory synthesis
    • = generating speech from a model of the vocal tract.
1 mechanical synthesis
1. Mechanical Synthesis
  • The very first attempts to produce synthetic speech were made without electricity.
    • = mechanical synthesis
  • In the late 1700s, models were produced which used:
    • reeds as a voicing source
    • differently shaped tubes for different vowels
mechanical synthesis part ii
Mechanical Synthesis, part II
  • Later, Wolfgang von Kempelen and Charles Wheatstone created a more sophisticated mechanical speech device…
    • with independently manipulable source and filter mechanisms.
mechanical synthesis part iii
Mechanical Synthesis, part III
  • An interesting historical footnote:
    • Alexander Graham Bell and his “questionable” experiments with his dog.
  • Mechanical synthesis has largely gone out of style ever since.
    • …but check out Mike Brady’s talking robot.
the voder
The Voder
  • The next big step in speech synthesis was to generate speech electronically.
  • This was most famously demonstrated at the New York World’s Fair in 1939 with the Voder.
  • The Voder was a manually controlled speech synthesizer.
    • (operated by highly trained young women)
voder principles
Voder Principles
  • The Voder basically operated like a vocoder.
  • Voicing and fricative source sounds were filtered by 10 different resonators…
  • each controlled by an individual finger!
  • Only about 1 in 10 had the ability to learn how to play the Voder.
overtone singing
Overtone Singing
  • F0 stays the same (on a “drone”), while singer shapes the vocal tract so that individual harmonics (“overtones”) resonate.
  • What kind of voice quality would be conducive to this?
vowels and sonorants
Vowels and Sonorants
  • So far, we’ve talked a lot about the acoustics of vowels:
    • Source: periodic openings and closings of the vocal folds.
    • Filter: characteristic resonant frequencies of the vocal tract (above the glottis)
  • Today, we’ll talk about the acoustics of sonorants:
    • Nasals
    • Laterals
    • Approximants
  • The source/filter characteristics of sonorants are similar to vowels… with a few interesting complications.
damping
Damping
  • One interesting acoustic property exhibited by (some) sonorants is damping.
  • Recall that resonance occurs when:
    • a sound wave travels through an object
    • that sound wave is reflected...
    • ...and reinforced, on a periodic basis
  • The periodic reinforcement sets up alternating patterns of high and low air pressure
    • = a standing wave
damping schematized
Damping, schematized
  • In a closed tube:
    • With only one pressure pulse from the loudspeaker, the wave will eventually dampen and die out.
  • Why?
    • The walls of the tube absorb some of the acoustic energy, with each reflection of the standing wave.
damping comparison
Damping Comparison
  • A heavily damped wave wil die out more quickly...
  • Than a lightly damped wave:
damping factors
Damping Factors
  • The amount of damping in a tube is a function of:
    • The volume of the tube
    • The surface area of the tube
    • The material of which the tube is made
  • More volume, more surface area = more damping
  • Think about the resonant characteristics of:
    • a Home Depot
    • a post-modern restaurant
    • a movie theater
    • an anechoic chamber
resonance and recording
Resonance and Recording
  • Remember: any room will reverberate at its characteristic resonant frequencies
  • Hence: high quality sound recordings need to be made in specially designed rooms which damp any reverberation
  • Examples:
    • Classroom recording (29 dB signal-to-noise ratio)
    • “Soundproof” booth (44 dB SNR)
    • Anechoic chamber (90 dB SNR)
spectrograms
Spectrograms

classroom

“soundproof” booth

spectrograms1
Spectrograms

anechoic chamber

inside your nose
Inside Your Nose
  • In nasals, air flows through the nasal cavities.
  • The resonating “filter” of nasal sounds therefore has:
    • increased volume
    • increased surface area
    •  increased damping
  • Note:
    • the exact size and shape of the nasal cavities varies wildly from speaker to speaker.
nasal variability
Nasal Variability
  • Measurements based on MRI data (Dang et al., 1994)
damping effects part 1
Damping Effects, part 1
  • Damping by the nasal cavities decreases the overall amplitude of the sound coming out through the nose.

[m]

[m]

damping effects part 2
Damping Effects, part 2
  • How might the power spectrum of an undamped wave:
  • Compare to that of a damped wave?
  • A: Undamped waves have only one component;
    • Damped waves have a broader range of components.
here s why
Here’s Why

100 Hz sinewave

+

90 Hz sinewave

+

110 Hz sinewave

the result
The Result

90 Hz +

100 Hz +

110 Hz

  • If the 90 Hz and 110 Hz components have less amplitude than the 100 Hz wave, there will be less damping:
damping spectra
Damping Spectra

light

medium

damping spectra1
Damping Spectra

heavy

  • Damping increases the bandwidth of the resonating filter.
    • Bandwidth = the range of frequencies over which a filter will respond at .707 of its maximum output.
  •  Nasal formants will have a larger bandwidth than vowel formants.
bandwidth in spectrograms
Bandwidth in Spectrograms

F3 of

F3 of [m]

The formants in nasals have increased bandwidth, in comparison to the formants in vowels.

nasal formants
Nasal Formants
  • The values of formant frequencies for nasal stops can be calculated according to the same formula that we used for to calculate formant frequencies for an open tube.
  • fn = (2n - 1) * c
      • 4L
  • The simplest case: uvular nasal .
  • The length of the tube is a combination of:
    • distance from glottis to uvula (9 cm)
    • distance from uvula to nares (12.5 cm)
  • An average tube length (for adult males): 21.5 cm
the math
The Math

12.5 cm

  • fn = (2n - 1) * c
      • 4L
  • L = 21.5 cm
  • c = 35000 cm/sec
  • F1 = 35000
  • 86
  • = 407 Hz
  • F2 = 1221 Hz
  • F3 = 2035 Hz

9 cm

the real thing
The Real Thing
  • Check out Peter’s production of an uvular nasal in Praat.
    • And also Dustin’s neutral vowel!
  • Note: the higher formants are low in amplitude
  • Some reasons why:
    • Overall damping
    • “Nostril-rounding” reduces intensity
    • Resonance is lost in the side passages of the sinuses.
  • Nasal stops with fronter places of articulation also have anti-formants.
anti formants
Anti-Formants
  • For nasal stops, the occlusion in the mouth creates a side cavity.
  • This side cavity resonates at particular frequencies.
  • These resonances absorb acoustic energy in the system.
  • They form anti-formants
anti formant math
Anti-Formant Math
  • Anti-formant resonances are based on the length of the vocal tract tube.
  • For [m], this length is about 8 cm.

8 cm

  • fn = (2n - 1) * c
      • 4L

L = 8 cm

AF1 = 35000 / 4*8 = 1094 Hz

AF2 = 3281 Hz

etc.

spectral signatures
Spectral Signatures
  • In a spectrogram, acoustic energy lowers--or drops out completely--at the anti-formant frequencies.

anti-formants

nasal place cues
Nasal Place Cues
  • At more posterior places of articulation, the “anti-resonating” tube is shorter.
    •  anti-formant frequencies will be higher.
  • for [n], L = 5.5 cm
    • AF1 = 1600 Hz
    • AF2 = 4800 Hz
  • for , L = 3.3 cm
    • AF1 = 2650 Hz
  • for , L = 2.3 cm
    • AF1 = 3700 Hz
m vs n
[m] vs. [n]

[m]

[e]

[n]

[o]

AF1 (n)

AF1 (m)

  • Production of [meno], by a speaker of Tsonga
  • Tsonga is spoken in South Africa and Mozambique
nasal stop acoustics summary
Nasal Stop Acoustics: Summary
  • Here’s the general pattern of what to look for in a spectrogram for nasals:
  • Periodic voicing.
  • Overall amplitude lower than in vowels.
  • Formants (resonance).
  • Formants have broad bandwidths.
  • Low frequency first formant.
  • Less space between formants.
  • Higher formants have low amplitude.
perceiving nasal place
Perceiving Nasal Place
  • Nasal “murmurs” do not provide particularly strong cues to place of articulation.
  • Can you identify the following as [m], [n] or ?
  • Repp (1986) found that listeners can only distinguish between [n] and [m] 72% of the time.
  • Transitions provide important place cues for nasals.
  • Repp (1986): 95% of nasals identified correctly when presented with the first 10 msec of the following vowel.
  • Can you identify these nasal + transition combos?
ad