Adult Speech Perception

Adult Speech Perception Chris Darwin

Place of articulation Labial Alveolar Velar b d g Stop Voiced p t k Stop Voiceless ng m n Nasal Voiced

Wide and narrowband spectrograms Wave Wide Narrow

Voice-Onset Time (VOT) bit pit 5 ms 40 ms

Formants in a wide-band spectrogram <-- F 3 Burst --> <-- F 2 <-- Formant transitions -------> <-- F1 “w e g o”

Categorical Perception - 1 1. Set up a continuum of sounds between two categories /ba/ - /da/ 1 ... 3 … 5 … 7

1 ... 3 … 5 … 7 Categorical Perception - 2 2. Run an identification experiment 100 Sharp phoneme boundary % /ba/ 0

1 versus 3 Categorical Perception - 3 2. Run a discrimination experiment 100 Discrimination peak % difft 0 1 ... 3 … 5 … 7

Categorical Perception - 4 Defined as: 1. Sharp phoneme boundary 2. Discrimination peak at phoneme boundary 3. Discrimination predicted from identification (only “different” if different phoneme)

Categorical Perception - 5 • Occurs for many consonant continua Even with “proper” discrimination paradigms • Less evident for vowel continua

Categorical Perception - 6 • • For most “ordinary” continua, such as frequency, loudness, brightness etc, our ability to discriminate far exceeds our ability to label • • Continua that show Categorical Perception are different from this norm. • Liberman claims that CP is an indicator of a special Speech Mode of perception that is distinctively human.

Categorical Perception - 7 • Is CP restricted to speech? • No, also shown by comparisons of musical intervals. • Burns, E. M. and Campbell, S. L. (1994). "Frequency and frequency ratio resolution by possessors of relative and absolute pitch: Examples of categorical perception?," J. Acoust. Soc. Am. 96, 2704-2719. • Is CP shown for speech unique to humans? • No. Chinchillas and quails show the same VOT boundary as humans. • Kuhl, P. K. and Miller, J. D. (1978). "Speech perception by the chinchilla: identification functions for synthetic VOT stimuli," J. Acoust. Soc. Am. 63, 905-917. • Macaques show discrimination peaks at human VOT and place-of-articulation boundaries. • Kuhl, P. K. and Padden, D. M. (1982). "Enhanced discriminability at the phonetic boundaries for the voicing feature in macaques," Percept. Psychophys. 32, 542-550. • Kuhl, P. K. and Padden, D. M. (1983). "Enhanced discriminability at the phonetic boundaries for the place feature in macaques," J. Acoust. Soc. Am. 73, 1003-1010. • So - human speech exploits discontinuities in the way that vertebrate auditory systems represent sound.

Natural auditory categories Category 2 Auditory values Acoustic values Category 1 Stimulus dimension Stimulus dimension Voicing and place-of-articulation dimensions show these natural categories. Sinex, D. G. and McDonald, L. P. (1989). "Synchronized discharge rate representation of voice-onset time in the chinchilla auditory nerve," J. Acoust. Soc. Am. 85, 1995-2004. Sinex, D. G., McDonald, L. P. and Mott, J. B. (1991). "Neural correlates of nonmonotonic temporal acuity for voice onset time," J. Acoust. Soc. Am. 90, 2441-9.

Is CP innate or acquired? Yes ! Infants born with ability to make many speech discriminations that they can subsequently NOT make (see next lecture) Adults (and 1-year-old infants) have lost the ability to make distinctions that their language does not use.

Different languages make different regions of acoustic space distinctive

/r/ - /l/... • Phone - a particular sound used by any language eg the sound [r] • Phoneme - a sound used in contrast to another in a particular language eg the category /r/ as distinct from /l/ • Different languages make different phonemic contrasts.

/r/ - /l/ …- 2 • Phonemes in a particular language are defined by minimal pairs • i.e. since in English “lice” and “rice” have a different meaning, then they contain different phonemes: /l/ and /r/ • But there is no such minimal pair in Japanese, so they have a single phoneme /r/

/r/ - /l/ …- 3 • Each language has its own distinctive set of phonemic categories • English distinguishes /r/ from /l/ but Japanese doesn’t • Tamil distinguishes dental /t1/ from an alveolar /t2/ from a retroflex /t3/. English doesn’t.

Synthetic Stimuli: /ra/-/la/ /ra/ /la/

/r/ - /l/ 4 English identification 3-oddball 100 English discrimination % correct or % /ra/ Japanese discrimination 50 0 F3 1 ... 5 … 10 … 15 /ra/ /la/ Miyawaki et al (1975) Perception & Psychophysics18 331-340

Synthetic Stimuli: /ra/-/la/ Second Formant Third Formant Iverson, P., et al. (2003). "A perceptual interference account of acquisition difficulties for non-native phonemes," Cognition 87, B47-57.

American MDS Solution /l/ /r/ Second Formant Third Formant Physical Spacing of Stimuli …and rated the similarity of stimulus pairs

Japanese MDS Solution Second Formant Third Formant

/r/ - /l/ - 4 • Can Japanese really not hear any difference? • Use implicit technique (Mann 1986 Cognition): • Co-articulation: /d/ and /g/ are pronounced differently after /l/ and /r/ as in /arda/-/arga/ compared with /alda/-/alga/ • So, for English speakers /d/-/g/ boundary has different formant values after /l/ than after /r/. • Also true for Japanese who can hear /r/ vs /l/ • But ALSO true for those who CAN’T.

/r/ - /l/ - 5 Is this still something specific to speech?NO!! QUAILS DO IT TOO !!!Lotto, Kluender & Holt (1997) J.Acoust. Soc. Am. 102, 1134-1140 So may reflect a general auditorycontrast effect.i.e. the auditory representation of the [d] sound is different after an [r] than after an[l].Lotto & Kluender (1998) Perc & Psychophys. 60, 602-619

/r/ - /l/ - 6 Fowler, C. A., Brown, J. M. and Mann, V. A. (2000). "Contrast effects do not underlie effects of preceding liquids on stop-consonant identification by humans," J. exp. Psychol.: Hum. Perc. & Perf. 26, 877-888. McGurk effect experiment shows compensation for coarticulation by listeners when neither frequency contrast nor masking can be the source of the compensations.

McGurk effect watch it on YouTube

/r/ - /l/: Fowler’s McGurk expt Ambiguous ar/l (constant) Either da or ga + Vision:either ar or al Still get shift in d/g boundary, so not auditory contrast.

Resolution • Holt, L. L., Stephens, J. D. and Lotto, A. J. (2005). "A critical evaluation of visually moderated phonetic context effects," Percept Psychophys 67, 1102-12. Fowler’s effect due to McGurk effect caused by visual input concurrent with test syllable, NOT precursor.

Trading relations • Most phonetic distinctions have more than one acoustic cue as a result of the particular articulatory gesture that gives the distinction. • Perception must establish some "trade-off" between the different cues. Can this trade-off be explained by low-level auditory processes such as short-term adaptation, or do they require processes specific to speech? • Repp (1982) Psych Bull. 92, 81-110

Summary • Many consonantal speech sounds perceived categorically. • For some due to speech exploiting discontinuities in the way that auditory systems represent sound. • Some of it is due to cultural differences, acquired in the first year of life. • Some aspects of the decoding of co-articulation may be due to general contrast effects (eg through adaptation in the auditory nerve). • Others are non-auditory in nature and may be specific to human listeners

Adult Speech Perception