1 / 31

Adult Speech Perception

Adult Speech Perception . Chris Darwin. Place of articulation. Labial. Alveolar. Velar. b. d. g. Stop. Voiced. p. t. k. Stop. Voiceless. ng. m. n. Nasal. Voiced. Wide and narrowband spectrograms. Wave. Wide. Narrow. Voice-Onset Time (VOT). bit. pit. 5 ms. 40 ms.

LionelDale
Download Presentation

Adult Speech Perception

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adult Speech Perception Chris Darwin

  2. Place of articulation Labial Alveolar Velar b d g Stop Voiced p t k Stop Voiceless ng m n Nasal Voiced

  3. Wide and narrowband spectrograms Wave Wide Narrow

  4. Voice-Onset Time (VOT) bit pit 5 ms 40 ms

  5. Formants in a wide-band spectrogram <-- F 3 Burst --> <-- F 2 <-- Formant transitions -------> <-- F1 “w e g o”

  6. Categorical Perception - 1 1. Set up a continuum of sounds between two categories /ba/ - /da/ 1 ... 3 … 5 … 7

  7. 1 ... 3 … 5 … 7 Categorical Perception - 2 2. Run an identification experiment 100 Sharp phoneme boundary % /ba/ 0

  8. 1 versus 3 Categorical Perception - 3 2. Run a discrimination experiment 100 Discrimination peak % difft 0 1 ... 3 … 5 … 7

  9. Categorical Perception - 4 Defined as: 1. Sharp phoneme boundary 2. Discrimination peak at phoneme boundary 3. Discrimination predicted from identification (only “different” if different phoneme)

  10. Categorical Perception - 5 • Occurs for many consonant continua Even with “proper” discrimination paradigms • Less evident for vowel continua

  11. Categorical Perception - 6 • • For most “ordinary” continua, such as frequency, loudness, brightness etc, our ability to discriminate far exceeds our ability to label • • Continua that show Categorical Perception are different from this norm. • Liberman claims that CP is an indicator of a special Speech Mode of perception that is distinctively human.

  12. Categorical Perception - 7 • Is CP restricted to speech? • No, also shown by comparisons of musical intervals. • Burns, E. M. and Campbell, S. L. (1994). "Frequency and frequency ratio resolution by possessors of relative and absolute pitch: Examples of categorical perception?," J. Acoust. Soc. Am. 96, 2704-2719. • Is CP shown for speech unique to humans? • No. Chinchillas and quails show the same VOT boundary as humans. • Kuhl, P. K. and Miller, J. D. (1978). "Speech perception by the chinchilla: identification functions for synthetic VOT stimuli," J. Acoust. Soc. Am. 63, 905-917. • Macaques show discrimination peaks at human VOT and place-of-articulation boundaries. • Kuhl, P. K. and Padden, D. M. (1982). "Enhanced discriminability at the phonetic boundaries for the voicing feature in macaques," Percept. Psychophys. 32, 542-550. • Kuhl, P. K. and Padden, D. M. (1983). "Enhanced discriminability at the phonetic boundaries for the place feature in macaques," J. Acoust. Soc. Am. 73, 1003-1010. • So - human speech exploits discontinuities in the way that vertebrate auditory systems represent sound.

  13. Natural auditory categories Category 2 Auditory values Acoustic values Category 1 Stimulus dimension Stimulus dimension Voicing and place-of-articulation dimensions show these natural categories. Sinex, D. G. and McDonald, L. P. (1989). "Synchronized discharge rate representation of voice-onset time in the chinchilla auditory nerve," J. Acoust. Soc. Am. 85, 1995-2004. Sinex, D. G., McDonald, L. P. and Mott, J. B. (1991). "Neural correlates of nonmonotonic temporal acuity for voice onset time," J. Acoust. Soc. Am. 90, 2441-9.

  14. Is CP innate or acquired? Yes ! Infants born with ability to make many speech discriminations that they can subsequently NOT make (see next lecture) Adults (and 1-year-old infants) have lost the ability to make distinctions that their language does not use.

  15. Different languages make different regions of acoustic space distinctive

  16. /r/ - /l/... • Phone - a particular sound used by any language eg the sound [r] • Phoneme - a sound used in contrast to another in a particular language eg the category /r/ as distinct from /l/ • Different languages make different phonemic contrasts.

  17. /r/ - /l/ …- 2 • Phonemes in a particular language are defined by minimal pairs • i.e. since in English “lice” and “rice” have a different meaning, then they contain different phonemes: /l/ and /r/ • But there is no such minimal pair in Japanese, so they have a single phoneme /r/

  18. /r/ - /l/ …- 3 • Each language has its own distinctive set of phonemic categories • English distinguishes /r/ from /l/ but Japanese doesn’t • Tamil distinguishes dental /t1/ from an alveolar /t2/ from a retroflex /t3/. English doesn’t.

  19. Synthetic Stimuli: /ra/-/la/ /ra/ /la/

  20. /r/ - /l/ 4 English identification 3-oddball 100 English discrimination % correct or % /ra/ Japanese discrimination 50 0 F3 1 ... 5 … 10 … 15 /ra/ /la/ Miyawaki et al (1975) Perception & Psychophysics18 331-340

  21. Synthetic Stimuli: /ra/-/la/ Second Formant Third Formant Iverson, P., et al. (2003). "A perceptual interference account of acquisition difficulties for non-native phonemes," Cognition 87, B47-57.

  22. American MDS Solution /l/ /r/ Second Formant Third Formant Physical Spacing of Stimuli …and rated the similarity of stimulus pairs

  23. Japanese MDS Solution Second Formant Third Formant

  24. /r/ - /l/ - 4 • Can Japanese really not hear any difference? • Use implicit technique (Mann 1986 Cognition): • Co-articulation: /d/ and /g/ are pronounced differently after /l/ and /r/ as in /arda/-/arga/ compared with /alda/-/alga/ • So, for English speakers /d/-/g/ boundary has different formant values after /l/ than after /r/. • Also true for Japanese who can hear /r/ vs /l/ • But ALSO true for those who CAN’T.

  25. /r/ - /l/ - 5 Is this still something specific to speech?NO!! QUAILS DO IT TOO !!!Lotto, Kluender & Holt (1997) J.Acoust. Soc. Am. 102, 1134-1140 So may reflect a general auditorycontrast effect.i.e. the auditory representation of the [d] sound is different after an [r] than after an[l].Lotto & Kluender (1998) Perc & Psychophys. 60, 602-619

  26. /r/ - /l/ - 6 Fowler, C. A., Brown, J. M. and Mann, V. A. (2000). "Contrast effects do not underlie effects of preceding liquids on stop-consonant identification by humans," J. exp. Psychol.: Hum. Perc. & Perf. 26, 877-888. McGurk effect experiment shows compensation for coarticulation by listeners when neither frequency contrast nor masking can be the source of the compensations.

  27. McGurk effect watch it on YouTube

  28. /r/ - /l/: Fowler’s McGurk expt Ambiguous ar/l (constant) Either da or ga + Vision:either ar or al Still get shift in d/g boundary, so not auditory contrast.

  29. Resolution • Holt, L. L., Stephens, J. D. and Lotto, A. J. (2005). "A critical evaluation of visually moderated phonetic context effects," Percept Psychophys 67, 1102-12. Fowler’s effect due to McGurk effect caused by visual input concurrent with test syllable, NOT precursor.

  30. Trading relations • Most phonetic distinctions have more than one acoustic cue as a result of the particular articulatory gesture that gives the distinction. • Perception must establish some "trade-off" between the different cues. Can this trade-off be explained by low-level auditory processes such as short-term adaptation, or do they require processes specific to speech? • Repp (1982) Psych Bull. 92, 81-110

  31. Summary • Many consonantal speech sounds perceived categorically. • For some due to speech exploiting discontinuities in the way that auditory systems represent sound. • Some of it is due to cultural differences, acquired in the first year of life. • Some aspects of the decoding of co-articulation may be due to general contrast effects (eg through adaptation in the auditory nerve). • Others are non-auditory in nature and may be specific to human listeners

More Related