1 / 39

The Perception of Speech

The Perception of Speech. Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes examples of phonemes: /ba/ in bat , /pa/ in pat. Acoustic Properties of Speech. Speech can be characterized by a spectrogram. Acoustic Properties of Speech.

forbes
Download Presentation

The Perception of Speech

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Perception of Speech

  2. Speech • Speech is for rapid communication • Speech is composed of units of sound called phonemes • examples of phonemes: /ba/ in bat , /pa/ in pat

  3. Acoustic Properties of Speech • Speech can be characterized by a spectrogram

  4. Acoustic Properties of Speech • Spectrogram reveals differences between phonemes • The differences are in the formants and the formant transitions

  5. Perceiving Speech • So perceiving (interpreting) speech sounds is simply a matter of matching the spectrotemporal properties (the shape of the spectrogram) of the incoming sound waves to the appropriate phoneme • right?…

  6. Perceiving Speech • So perceiving (interpreting) speech sounds is simply a matter of matching the spectrotemporal properties (the shape of the spectrogram) of the incoming sound waves to the appropriate phoneme • Then specific phonemes must correspond to specific spectrograms - a property called acoustic-phonetic invariance

  7. Perceiving Speech • Acoustic - Phonetic invariance says that phonemes should match one and only one pattern in the spectrogram • This is not the case! For example /d/ followed by different vowels:

  8. Perceiving Speech • Acoustic - Phonetic invariance says that phonemes should match one and only one pattern in the spectrogram • This is not the case! For example /d/ • Clearly perception and understanding of speech sounds is more elaborate than simply interpreting an internal spectrogram

  9. Perceiving Speech • The phrase “Peter buttered the burnt toast” has five /t/ phonemes. There are not 5 identical sweeps in the spectrogram

  10. Perceiving Speech • The Segmentation Problem • Segmentation is the perception of silence between words • Often illusory

  11. Perceiving Speech • The phrase “I owe you a Yo-Yo” has no silence in it !

  12. Spoken Input • The Segmentation Problem: • The stream of acoustic input is not physically segmented into discrete phonemes, words, phrases, etc. • Silent gaps don’t always indicate (aren’t perceived as) interruptions in speech

  13. Spoken Input • The Segmentation Problem: • The stream of acoustic input is not physically segmented into discrete phonemes, words, phrases, etc. • Continuous speech stream is sometimes perceived as having gaps

  14. Perceiving Speech • So how do you perceive speech? Some of the “strategies”: 1. reduce the data 2. use context clues 3. use vision

  15. Categorical Perception • Categorical Perception is a phenomenon in which the brain assigns a stimulus into one or another category but never into an intermediate category

  16. Categorical Perception • For example, /ba/ and /pa/ differ in their formant transitions • /ba/ is formed by stopping the flow of air from the lungs and releasing it after about 10 milliseconds (called voice onset time) • /pa/ is similar except that voice onset time is about 50 ms

  17. Categorical Perception • Voice onset time can range from zero to >50 ms. For example, you could synthesize a sound with a voice onset time of 30 ms but...

  18. Categorical Perception • Voice onset time can range from zero to >50 ms. For example, you could synthesize a sound with a voice onset time of 30 ms but... • English speakers will hear either /ba/ or /pa/ but never something in between

  19. Categorical Perception is Part of Learning a Language • Babies can discriminate /ba/ from /pa/ and can discriminate these from phonemes with intermediate voice onset times! • By 10 to 12 months, babies (learning English) stop discriminating intermediate voice onset times

  20. Categorical Perception is Part of Learning a Language • Once category boundaries are learned it is impossible to unlearn them • non-native speakers of any language often cannot hear certain phonemes the way native speakers do • as a consequence they will always have at least some slight accent

  21. Categorical Perception • Another example:

  22. Perception (of all types) Makes Use of Context • The stream of information contained in speech is usually ambiguous and incomplete • Your brain makes a “best guess” based on the circumstances

  23. Perception (of all types) Makes Use of Context • Consider the following example: shoe”. “The __eel fell of the cough car”.

  24. Perception (of all types) Makes Use of Context • Consider the following example: • Listeners report hearing the “appropriate” phoneme during the cough shoe”. “The __eel fell of the cough car”.

  25. Much of Speech Perception isn’t Auditory ! • Why rely on only one sensory system when there is information in two !?

  26. Much of Speech Perception isn’t Auditory ! • Why rely on only one sensory system when there is information in two !? • The brain seamlessly integrates any information it is given - this is called cross-modal integration

  27. Cross-modal Integration • Speech perception involves the synthesis of vision and hearing • The McGurk effect demonstrates the critical role of vision on speech perception

  28. Cross-modal Integration • The McGurk Effect

  29. Cross-modal Integration • The McGurk Effect - suggests that visual and auditory information are combined to enhance speech perception under normal circumstances • When visual and auditory information are incongruous the resulting perception is unpredictable and often wrong

  30. Auditory Scene Analysis • Sounds don’t happen in isolation, they happen in streams of changing frequencies • How does the system group related auditory events into streams and keep different streams separate?

  31. Auditory Scene Analysis • Solving this problem is called Auditory Scene Analysis • One important principle is proximity –in pitch, time, or spatial location

  32. Auditory Scene Analysis • Effect of timing proximity: Slow Fast

  33. Auditory Scene Analysis • Effect of timing proximity: Slow Fast Do you hear this? Pitch Or this? Pitch

  34. Auditory Scene Analysis • Effect of pitch proximity: far close

  35. Auditory Scene Analysis • Effect of pitch proximity: far close Do you hear this? Pitch Or this? Pitch

  36. Auditory Scene Analysis • Effect of proximity: • auditory system groups together events that happen close together in time and frequency

  37. Auditory Scene Analysis • Effect of proximity: • auditory system groups together events that happen close together in time and frequency • This enables us to perceive meaningful streams of information when they are mixed with distraction

  38. Auditory Scene Analysis • Effect of proximity: • auditory system groups together events that happen close together in time and frequency • This enables us to perceive meaningful streams of information when they are mixed with distraction • Interestingly, the brain can disentangle mixed streams only certain circumstances • E.g. “The picket fence illusion” : gaps of silence dramatically distort perception of a sentence, while bursts of noise do not

  39. Next Time: Taste Smell Touch Balance

More Related