slide1 l.
Skip this Video
Loading SlideShow in 5 Seconds..
Adult Speech Perception PowerPoint Presentation
Download Presentation
Adult Speech Perception

Loading in 2 Seconds...

play fullscreen
1 / 31

Adult Speech Perception - PowerPoint PPT Presentation

  • Uploaded on

Adult Speech Perception . Chris Darwin. Place of articulation. Labial. Alveolar. Velar. b. d. g. Stop. Voiced. p. t. k. Stop. Voiceless. ng. m. n. Nasal. Voiced. Wide and narrowband spectrograms. Wave. Wide. Narrow. Voice-Onset Time (VOT). bit. pit. 5 ms. 40 ms.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Adult Speech Perception' - LionelDale

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Place of articulation




















Voice-Onset Time (VOT)



5 ms

40 ms

formants in a wide band spectrogram
Formants in a wide-band spectrogram

<-- F 3

Burst -->

<-- F 2

<-- Formant transitions ------->

<-- F1

“w e g o”

categorical perception 1
Categorical Perception - 1

1. Set up a continuum of sounds between two categories

/ba/ - /da/

1 ... 3 … 5 … 7

categorical perception 2

1 ... 3 … 5 … 7

Categorical Perception - 2

2. Run an identification experiment


Sharp phoneme boundary

% /ba/


categorical perception 3

1 versus 3

Categorical Perception - 3

2. Run a discrimination experiment


Discrimination peak

% difft


1 ... 3 … 5 … 7

categorical perception 4
Categorical Perception - 4

Defined as:

1. Sharp phoneme boundary

2. Discrimination peak at phoneme boundary

3. Discrimination predicted from identification

(only “different” if different phoneme)

categorical perception 5
Categorical Perception - 5

• Occurs for many consonant continua

Even with “proper” discrimination paradigms

• Less evident for vowel continua

categorical perception 6
Categorical Perception - 6
  • • For most “ordinary” continua, such as frequency, loudness, brightness etc, our ability to discriminate far exceeds our ability to label
  • • Continua that show Categorical Perception are different from this norm.
  • Liberman claims that CP is an indicator of a special Speech Mode of perception that is distinctively human.
categorical perception 7
Categorical Perception - 7
  • Is CP restricted to speech?
  • No, also shown by comparisons of musical intervals.
    • Burns, E. M. and Campbell, S. L. (1994). "Frequency and frequency ratio resolution by possessors of relative and absolute pitch: Examples of categorical perception?," J. Acoust. Soc. Am. 96, 2704-2719.
  • Is CP shown for speech unique to humans?
  • No. Chinchillas and quails show the same VOT boundary as humans.
    • Kuhl, P. K. and Miller, J. D. (1978). "Speech perception by the chinchilla: identification functions for synthetic VOT stimuli," J. Acoust. Soc. Am. 63, 905-917.
  • Macaques show discrimination peaks at human VOT and place-of-articulation boundaries.
    • Kuhl, P. K. and Padden, D. M. (1982). "Enhanced discriminability at the phonetic boundaries for the voicing feature in macaques," Percept. Psychophys. 32, 542-550.
    • Kuhl, P. K. and Padden, D. M. (1983). "Enhanced discriminability at the phonetic boundaries for the place feature in macaques," J. Acoust. Soc. Am. 73, 1003-1010.
  • So - human speech exploits discontinuities in the way that vertebrate auditory systems represent sound.
natural auditory categories
Natural auditory categories

Category 2





Category 1

Stimulus dimension

Stimulus dimension

Voicing and place-of-articulation dimensions show these natural categories.

Sinex, D. G. and McDonald, L. P. (1989). "Synchronized discharge rate representation of voice-onset time in the chinchilla auditory nerve," J. Acoust. Soc. Am. 85, 1995-2004.

Sinex, D. G., McDonald, L. P. and Mott, J. B. (1991). "Neural correlates of nonmonotonic temporal acuity for voice onset time," J. Acoust. Soc. Am. 90, 2441-9.

is cp innate or acquired
Is CP innate or acquired?

Yes !

Infants born with ability to make many speech discriminations that they can subsequently NOT make (see next lecture)

Adults (and 1-year-old infants) have lost the ability to make distinctions that their language does not use.

/r/ - /l/...
  • Phone - a particular sound used by any language eg the sound [r]
  • Phoneme - a sound used in contrast to another in a particular language eg the category /r/ as distinct from /l/
  • Different languages make different phonemic contrasts.
r l 2
/r/ - /l/ …- 2
  • Phonemes in a particular language are defined by minimal pairs
  • i.e. since in English “lice” and “rice” have a different meaning, then they contain different phonemes: /l/ and /r/
  • But there is no such minimal pair in Japanese, so they have a single phoneme /r/
r l 3
/r/ - /l/ …- 3
  • Each language has its own distinctive set of phonemic categories
  • English distinguishes /r/ from /l/ but Japanese doesn’t
  • Tamil distinguishes dental /t1/ from an alveolar /t2/ from a retroflex /t3/. English doesn’t.
r l 4
/r/ - /l/ 4

English identification



English discrimination

% correct


% /ra/

Japanese discrimination




1 ... 5 … 10 … 15



Miyawaki et al (1975) Perception & Psychophysics18 331-340


Synthetic Stimuli: /ra/-/la/

Second Formant

Third Formant

Iverson, P., et al. (2003). "A perceptual interference account of acquisition difficulties for non-native phonemes," Cognition 87, B47-57.


American MDS Solution



Second Formant

Third Formant

Physical Spacing of Stimuli

…and rated the similarity of stimulus pairs


Japanese MDS Solution

Second Formant

Third Formant

r l 424
/r/ - /l/ - 4
  • Can Japanese really not hear any difference?
  • Use implicit technique (Mann 1986 Cognition):
    • Co-articulation: /d/ and /g/ are pronounced differently after /l/ and /r/ as in /arda/-/arga/ compared with /alda/-/alga/
  • So, for English speakers /d/-/g/ boundary has different formant values after /l/ than after /r/.
  • Also true for Japanese who can hear /r/ vs /l/
  • But ALSO true for those who CAN’T.
r l 5
/r/ - /l/ - 5

Is this still something specific to speech?NO!! QUAILS DO IT TOO !!!Lotto, Kluender & Holt (1997) J.Acoust. Soc. Am. 102, 1134-1140

So may reflect a general auditorycontrast effect.i.e. the auditory representation of the [d] sound is different after an [r] than after an[l].Lotto & Kluender (1998) Perc & Psychophys. 60, 602-619

r l 6
/r/ - /l/ - 6

Fowler, C. A., Brown, J. M. and Mann, V. A. (2000). "Contrast effects do not underlie effects of preceding liquids on stop-consonant identification by humans," J. exp. Psychol.: Hum. Perc. & Perf. 26, 877-888.

McGurk effect experiment shows compensation for coarticulation by listeners when neither frequency contrast nor masking can be the source of the compensations.

mcgurk effect
McGurk effect

watch it on YouTube

r l fowler s mcgurk expt
/r/ - /l/: Fowler’s McGurk expt





da or ga


Vision:either ar or al

Still get shift in d/g boundary, so not auditory contrast.

  • Holt, L. L., Stephens, J. D. and Lotto, A. J. (2005). "A critical evaluation of visually moderated phonetic context effects," Percept Psychophys 67, 1102-12.

Fowler’s effect due to McGurk effect caused by visual

input concurrent with test syllable, NOT precursor.

trading relations
Trading relations
  • Most phonetic distinctions have more than one acoustic cue as a result of the particular articulatory gesture that gives the distinction.
  • Perception must establish some "trade-off" between the different cues. Can this trade-off be explained by low-level auditory processes such as short-term adaptation, or do they require processes specific to speech?
  • Repp (1982) Psych Bull. 92, 81-110
  • Many consonantal speech sounds perceived categorically.
  • For some due to speech exploiting discontinuities in the way that auditory systems represent sound.
  • Some of it is due to cultural differences, acquired in the first year of life.
  • Some aspects of the decoding of co-articulation may be due to general contrast effects (eg through adaptation in the auditory nerve).
  • Others are non-auditory in nature and may be specific to human listeners