transitions perception l.
Skip this Video
Loading SlideShow in 5 Seconds..
Transitions + Perception PowerPoint Presentation
Download Presentation
Transitions + Perception

Loading in 2 Seconds...

play fullscreen
1 / 47

Transitions + Perception - PowerPoint PPT Presentation

  • Uploaded on

Transitions + Perception. March 26, 2009. Remainders. Singer’s formant spectrum. Damping and spectra. Laterals. Laterals are produced by constricting the sides of the tongue towards the center of the mouth. Air may pass through the mouth on either both sides of the tongue…

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Transitions + Perception

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
  • Singer’s formant spectrum.
  • Damping and spectra.
  • Laterals are produced by constricting the sides of the tongue towards the center of the mouth.
  • Air may pass through the mouth on either both sides of the tongue…
    • or on just one side of the tongue.
lateral acoustics
Lateral Acoustics
  • The central constriction traps the flow of air in a “side branch” of the vocal tract.
  • This side branch makes the acoustics of laterals similar to the acoustics of nasals.
    • In particular: acoustic energy trapped in the side branch sets up “anti-formants”
    • Also: some damping
    • …but not as much as in nasals.

17.5 cm

4 cm

  • Primary resonances of lateral approximants are the same as those of for vocal tract length of 17.5 cm
    • 500 Hz, 1500 Hz, 2500 Hz...
    • However, F1 is consistently low (300 - 400 Hz)
  • Anti-formant arises from a side tube of length  4cm
    • AF1 = 2125 Hz
laterals in reality
Laterals in Reality
  • Check out the Mid-Waghi and Zulu laterals in Praat

Mid-Waghi: [alala]

velarization of l
Velarization of [l]
  • [l] often has low F2 in English because it is velarized
    • = produced with the back of the tongue raised
    • = “dark” [l]
    • symbolized
  • Perturbation Theory flashback:
    • There is an anti-node for F2 in the velar region
    • constrictions there lower F2
  • Check out the video evidence.
dark vs clear l
Dark vs. Clear /l/
  • /l/ often has low F2 in English because it is velarized.


l vs n
[l] vs. [n]
  • Laterals are usually more intense than nasals
    • less volume, less surface area = less damping
  •  break between vowels and laterals is less clear

[ ] [ n ]

[l] vs.
  • [l] and are primarily distinguished by F3
    • much lower in
  • Also: [l] usually has lower F2 in English

[ ] [ ]

  • Glides are vowel-like sonorants which are produced…
    • with slightly more constriction than a vowel at the same place of articulation.
  • Each glide corresponds to a different high vowel.
  • Vowel Glide Place
  • [i] [j] palatal (front, unrounded)
  • [u] [w] labio-velar (back, rounded)
  • [y] labial-palatal (front, rounded)

velar (back, unrounded)

  • Each glide’s acoustics will be similar to those of the vowel they correspond to.
glide acoustics
Glide Acoustics
  • Glides look like high vowels, but…
      • are shorter than vowels
  • They also tend to lack “steady states”
    • and exhibit rapid transitions into (or from) vowels
    • hence: “glides”
  • Also: lower in intensity
    • especially in the higher formants
vowel glide vowel

[iji] [uwu]

more glides
More Glides



  • When stops are released, they go through a transition phase in between the stop and the vowel.
  • From stop to vowel:
    • Stop closure
    • Release burst
    • (glide-like) transition
    • “steady-state” vowel
  • Vowel-to-stop works the same way, in reverse, except:
    • Release burst (if any) comes after the stop closure.
stop components
Stop Components


closure voicing

formant transitions

stop release burst

  • From Armenian:


another closure

  • When the spectrogram was first invented…
    • phoneticians figured out quite quickly how to identify vowels from their spectral characteristics…
    • but they had a much harder time learning how to identify stops by their place of articulation.
  • Eventually they realized:
    • the formant transitions between vowels and stops provided a reliable cue to place of articulation.
  • Why?
formant transitions
Formant Transitions
  • A: the resonant frequencies of the vocal tract change as stop gestures enter or exit the closure phase.
  • Simplest case: formant frequencies usually decrease near bilabial stops
stops vs glides
Stops vs. Glides


  • Note: formant transitions are more rapid for stops than they are for glides.


formant transitions alveolars
Formant Transitions: alveolars
  • For other places of articulation, the formant transition that appears is more complex.
  • From front vowels into alveolars, F2 tends to slope downward.
  • From back vowels into alveolars, F2 tends to slope upwards.



formant locus
Formant Locus
  • Whether in a front vowel or back vowel context...
    • The formant transitions for alveolars tend to point to the same frequency value. ( 1650-1700 Hz)
  • This (apparent) frequency value is known as the locus of the formant transition.
  • In the ‘50s, researchers theorized:
    • the locus frequency can be used by listeners to reliably identify place of articulation.
  • However, velars posed a problem…
velar transitions
Velar Transitions
  • Velar formant transitions do not always have a reliable locus frequency for F2.
  • Velars exhibit a lot of coarticulation with neighboring vowels.
    • Fronter (more palatal) next to front vowels
      • Locus is high: 1950-2000 Hz
    • Backer (more velar) next to back vowels
      • Locus is lower: < 1500 Hz
  • F2 and F3 often come together in velar transitions
    • “Velar Pinch”
the velar pinch
The Velar Pinch

[bag] [bak]

testing the theory
Testing the Theory
  • The earliest experiments on place perception were conducted in the 1950s, using a speech synthesizer known as the pattern playback.
haskins formant transitions
Haskins Formant Transitions
  • Testing the perception of two-formant stimuli, with varying F2 transitions, led to a phenomenon known as categorical perception.
categorical perception
Categorical Perception
  • Categorical perception =
    • continuous physical distinctions are perceived in discrete categories.
  • In the in-class experiment from last time:
    • There were 11 different syllable stimuli
    • They only differed in the locus of their F2 transition
    • F2 Locus range = 726 - 2217 Hz
  • Source:

Stimulus #1

Stimulus #6

Stimulus #11

Example stimuli from the in-class experiment.

  • In Categorical Perception:
    • All stimuli within a category boundary should be labeled the same.
  • Original task: ABX discrimination
  • Stimuli across category boundaries should be 100% discriminable.
  • Stimuli within category boundaries should not be discriminable at all.

In practice, categorical perception means: the discrimination function can be determined from the identification function.

identification discrimination
Identification  Discrimination
  • Let’s consider a case where the two sounds in a discrimination pair are the same.
  • Example: the pair is stimulus 3 followed by stimulus 3
  • Identification data--Stimulus 3 is identified as:
    • [b] 95% of the time
    • [d] 5% of the time
  • The discrimination pair will be perceived as:
    • [b] - [b] - .95 * .95 = .9025
    • [d] - [d] - .05 * .05 = .0025
  • Probability of same response is predicted to be:
    • (.9025 + .0025) = .905 = 90.5%
identification discrimination37
Identification  Discrimination
  • Let’s consider a case where the two sounds in a discrimination pair are different.
  • Example: the pair is stimulus 9 followed by stimulus 11
  • Identification data:
    • Stimulus 9: [d] 80% of the time, [g] 20% of the time
    • Stimulus 11: [d] 5% of the time, [g] 95% of the time
  • The discrimination pair will be perceived as:
    • [d] - [d] - .80 * .05 = .04
    • [g] - [g] - .20 * .95 = .19
  • Probability of same response is predicted to be:
    • (.04 + .19) = 23%
  • In this discrimination graph--
  • Solid line is the observed data
  • Dashed line is the predicted data
  • (on the basis of the identification scores)

Note: the actual listeners did a little bit better than the predictions.

categorical continued
Categorical, Continued
  • Categorical Perception was also found for VOT distinctions.
  • And for stop/glide/vowel distinctions:

10 ms transitions: [b] percept

60 ms transitions: [w] percept

200 ms transitions: [u] percept

  • Main idea: in categorical perception, the mind translates an acoustic stimulus into a phonemic label. (category)
  • The acoustic details of the stimulus are discarded in favor of an abstract representation.
  • A continuous acoustic signal:
  • Is thus transformed into a series of linguistic units:
the next level
The Next Level
  • Interestingly, categorical perception is not found for non-speech stimuli.
  • Miyawaki et al: tested perception of an F3 continuum between /r/ and /l/.
the next level42
The Next Level
  • They also tested perception of the F3 transitions in isolation.
  • Listeners did not perceive these transitions categorically.
the implications
The Implications
  • Interpretation: we do not perceive speech in the same way we perceive other sounds.
  • “Speech is special”…
    • and the perception of speech is modular.
  • A module is a special processor in our minds/brains devoted to interpreting a particular kind of environmental stimuli.
module characteristics
Module Characteristics
  • You can think of a module as a “mental reflex”.
  • A module of the mind is defined as having the following characteristics:
    • Domain-specific
    • Automatic
    • Fast
    • Hard-wired in brain
    • Limited top-down access (you can’t “unperceive”)
  • Example: the sense of vision operates modularly.
a modular mind model
A Modular Mind Model



judgment, imagination, memory, attention











external, physical reality

remember this stuff
Remember this stuff?
  • Speech is a “special” kind of sound because it exhibits spectral change over time.
  •  it’s processed by the speech module, not by the auditory module.
sws findings
SWS Findings
  • The uninitiated either hear sinewave speech as speech or as “whistles”, “chirps”, etc.
  • Claim: once you hear it as speech, you can’t go back.
    • The speech module takes precedence
    • (Limited top-down access)
  • Analogy: it’s impossible to not perceive real speech as speech.
    • We can’t hear the individual formants as whistles, chirps, etc.
  • Motor theory says: we don’t perceive the “sounds”, we perceive the gestures which shape the spectrum.