Adult Speech Perception
1 / 31

Wide and narrowband spectrograms - PowerPoint PPT Presentation

  • Updated On :

Adult Speech Perception . Chris Darwin. Place of articulation. Labial. Alveolar. Velar. b. d. g. Stop. Voiced. p. t. k. Stop. Voiceless. ng. m. n. Nasal. Voiced. Wide and narrowband spectrograms. Wave. Wide. Narrow. Voice-Onset Time (VOT). bit. pit. 5 ms. 40 ms.

Related searches for Wide and narrowband spectrograms

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Wide and narrowband spectrograms' - LionelDale

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Slide2 l.jpg

Place of articulation



















Slide4 l.jpg

Voice-Onset Time (VOT)



5 ms

40 ms

Formants in a wide band spectrogram l.jpg
Formants in a wide-band spectrogram

<-- F 3

Burst -->

<-- F 2

<-- Formant transitions ------->

<-- F1

“w e g o”

Categorical perception 1 l.jpg
Categorical Perception - 1

1. Set up a continuum of sounds between two categories

/ba/ - /da/

1 ... 3 … 5 … 7

Categorical perception 2 l.jpg

1 ... 3 … 5 … 7

Categorical Perception - 2

2. Run an identification experiment


Sharp phoneme boundary

% /ba/


Categorical perception 3 l.jpg

1 versus 3

Categorical Perception - 3

2. Run a discrimination experiment


Discrimination peak

% difft


1 ... 3 … 5 … 7

Categorical perception 4 l.jpg
Categorical Perception - 4

Defined as:

1. Sharp phoneme boundary

2. Discrimination peak at phoneme boundary

3. Discrimination predicted from identification

(only “different” if different phoneme)

Categorical perception 5 l.jpg
Categorical Perception - 5

• Occurs for many consonant continua

Even with “proper” discrimination paradigms

• Less evident for vowel continua

Categorical perception 6 l.jpg
Categorical Perception - 6

  • • For most “ordinary” continua, such as frequency, loudness, brightness etc, our ability to discriminate far exceeds our ability to label

  • • Continua that show Categorical Perception are different from this norm.

  • Liberman claims that CP is an indicator of a special Speech Mode of perception that is distinctively human.

Categorical perception 7 l.jpg
Categorical Perception - 7

  • Is CP restricted to speech?

  • No, also shown by comparisons of musical intervals.

    • Burns, E. M. and Campbell, S. L. (1994). "Frequency and frequency ratio resolution by possessors of relative and absolute pitch: Examples of categorical perception?," J. Acoust. Soc. Am. 96, 2704-2719.

  • Is CP shown for speech unique to humans?

  • No. Chinchillas and quails show the same VOT boundary as humans.

    • Kuhl, P. K. and Miller, J. D. (1978). "Speech perception by the chinchilla: identification functions for synthetic VOT stimuli," J. Acoust. Soc. Am. 63, 905-917.

  • Macaques show discrimination peaks at human VOT and place-of-articulation boundaries.

    • Kuhl, P. K. and Padden, D. M. (1982). "Enhanced discriminability at the phonetic boundaries for the voicing feature in macaques," Percept. Psychophys. 32, 542-550.

    • Kuhl, P. K. and Padden, D. M. (1983). "Enhanced discriminability at the phonetic boundaries for the place feature in macaques," J. Acoust. Soc. Am. 73, 1003-1010.

  • So - human speech exploits discontinuities in the way that vertebrate auditory systems represent sound.

Natural auditory categories l.jpg
Natural auditory categories

Category 2





Category 1

Stimulus dimension

Stimulus dimension

Voicing and place-of-articulation dimensions show these natural categories.

Sinex, D. G. and McDonald, L. P. (1989). "Synchronized discharge rate representation of voice-onset time in the chinchilla auditory nerve," J. Acoust. Soc. Am. 85, 1995-2004.

Sinex, D. G., McDonald, L. P. and Mott, J. B. (1991). "Neural correlates of nonmonotonic temporal acuity for voice onset time," J. Acoust. Soc. Am. 90, 2441-9.

Is cp innate or acquired l.jpg
Is CP innate or acquired?

Yes !

Infants born with ability to make many speech discriminations that they can subsequently NOT make (see next lecture)

Adults (and 1-year-old infants) have lost the ability to make distinctions that their language does not use.

Slide16 l.jpg
/r/ - /l/... distinctive

  • Phone - a particular sound used by any language eg the sound [r]

  • Phoneme - a sound used in contrast to another in a particular language eg the category /r/ as distinct from /l/

  • Different languages make different phonemic contrasts.

R l 2 l.jpg
/r/ - /l/ …- 2 distinctive

  • Phonemes in a particular language are defined by minimal pairs

  • i.e. since in English “lice” and “rice” have a different meaning, then they contain different phonemes: /l/ and /r/

  • But there is no such minimal pair in Japanese, so they have a single phoneme /r/

R l 3 l.jpg
/r/ - /l/ …- 3 distinctive

  • Each language has its own distinctive set of phonemic categories

  • English distinguishes /r/ from /l/ but Japanese doesn’t

  • Tamil distinguishes dental /t1/ from an alveolar /t2/ from a retroflex /t3/. English doesn’t.

Slide19 l.jpg

Synthetic Stimuli: /ra/-/la/ distinctive



R l 4 l.jpg
/r/ - /l/ 4 distinctive

English identification



English discrimination

% correct


% /ra/

Japanese discrimination




1 ... 5 … 10 … 15



Miyawaki et al (1975) Perception & Psychophysics18 331-340

Slide21 l.jpg

Synthetic Stimuli: /ra/-/la/ distinctive

Second Formant

Third Formant

Iverson, P., et al. (2003). "A perceptual interference account of acquisition difficulties for non-native phonemes," Cognition 87, B47-57.

Slide22 l.jpg

American MDS Solution distinctive



Second Formant

Third Formant

Physical Spacing of Stimuli

…and rated the similarity of stimulus pairs

Slide23 l.jpg

Japanese MDS Solution distinctive

Second Formant

Third Formant

R l 424 l.jpg
/r/ - /l/ - 4 distinctive

  • Can Japanese really not hear any difference?

  • Use implicit technique (Mann 1986 Cognition):

    • Co-articulation: /d/ and /g/ are pronounced differently after /l/ and /r/ as in /arda/-/arga/ compared with /alda/-/alga/

  • So, for English speakers /d/-/g/ boundary has different formant values after /l/ than after /r/.

  • Also true for Japanese who can hear /r/ vs /l/

  • But ALSO true for those who CAN’T.

R l 5 l.jpg
/r/ - /l/ - 5 distinctive

Is this still something specific to speech?NO!! QUAILS DO IT TOO !!!Lotto, Kluender & Holt (1997) J.Acoust. Soc. Am. 102, 1134-1140

So may reflect a general auditorycontrast effect.i.e. the auditory representation of the [d] sound is different after an [r] than after an[l].Lotto & Kluender (1998) Perc & Psychophys. 60, 602-619

R l 6 l.jpg
/r/ - /l/ - 6 distinctive

Fowler, C. A., Brown, J. M. and Mann, V. A. (2000). "Contrast effects do not underlie effects of preceding liquids on stop-consonant identification by humans," J. exp. Psychol.: Hum. Perc. & Perf. 26, 877-888.

McGurk effect experiment shows compensation for coarticulation by listeners when neither frequency contrast nor masking can be the source of the compensations.

Mcgurk effect l.jpg
McGurk effect distinctive

watch it on YouTube

R l fowler s mcgurk expt l.jpg
/r/ - /l/: Fowler’s McGurk expt distinctive





da or ga


Vision:either ar or al

Still get shift in d/g boundary, so not auditory contrast.

Resolution l.jpg
Resolution distinctive

  • Holt, L. L., Stephens, J. D. and Lotto, A. J. (2005). "A critical evaluation of visually moderated phonetic context effects," Percept Psychophys 67, 1102-12.

Fowler’s effect due to McGurk effect caused by visual

input concurrent with test syllable, NOT precursor.

Trading relations l.jpg
Trading relations distinctive

  • Most phonetic distinctions have more than one acoustic cue as a result of the particular articulatory gesture that gives the distinction.

  • Perception must establish some "trade-off" between the different cues. Can this trade-off be explained by low-level auditory processes such as short-term adaptation, or do they require processes specific to speech?

  • Repp (1982) Psych Bull. 92, 81-110

Summary l.jpg
Summary distinctive

  • Many consonantal speech sounds perceived categorically.

  • For some due to speech exploiting discontinuities in the way that auditory systems represent sound.

  • Some of it is due to cultural differences, acquired in the first year of life.

  • Some aspects of the decoding of co-articulation may be due to general contrast effects (eg through adaptation in the auditory nerve).

  • Others are non-auditory in nature and may be specific to human listeners