1 / 15

Theories of Speech Perception Part II

Theories of Speech Perception Part II.

justin
Download Presentation

Theories of Speech Perception Part II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Theories of Speech PerceptionPart II

  2. AUDITORY THEORIESThe object of speech perception is the acoustic/auditory signal. The perceptual mechanisms responsible for speech perception are not special to speech. Instead, they involve general auditory processes, aided by cognitive processes.Auditory theories are more in conflict with Motor Theory than with Direct Realism (e.g., they share with Direct Realism the view that speech perception is not special). But the points of conflict between auditory and gestural approaches are changing over time. E.g., Lotto, Kluender & Holt (1997, JASA 102) proposed a theoretical perspective consistent with points  and  above, but which also considers the influences of coarticulatory constraints on the acoustic signal.We turn now to a specific theory of auditory processing.

  3. AUDITORY ENHANCEMENT THEORYR. L. Diehl & K. R. Kluender (1989) Ecological Psychology 1.J. Kingston & R. L. Diehl (1994) Language 70.  Contrasts between the sounds in a phonological system are robust in part because systems have evolved to enhance the perceptual distinctiveness of the contrasts. Because sound systems evolve under constraints of preserving and enhancing perceptual distinctiveness, the object of speech perceptual must be auditory rather than articulatory events. Phonetic contrasts and categories are acquired via general processes of audition and category formation.

  4. From Diehl and Kluender (1989): According to the Auditory Enhancement Hypothesis, “speech communities tend to select components that have mutually enhancing auditory effects. The reasons for such a selection strategy should be obvious: Environments of speech communication tend to be noisy and reverberant [...] and there is considerable variation in vocal-tract size and dialect for listeners to contend with. In the face of these obstacles to successful speech communication, phonologies or sound systems of languages have doubtless evolved to be fairly robust signaling devices. A natural way to achieve such robustness is for language communities to signal phonological contrasts using an ensemble of vocal-tract gestures and gestural components that jointly and synergistically enhance the perceptual distinctiveness of segments.”

  5. Evidence cited in support of Auditory Enhancement Theory:Cross-language patterns in phonological systemsE.g., the structure of vowel systems is consistent with a principle of maximal perceptual distinctiveness.Recall our earlier discussion of the co-occurrence of backness and roundness in vowel systems. This co-occurrence has the acoustic effect of yielding a particularly low-frequency F2.Another notable co-occurrence: high vowels have a higher F0 than non-high vowels, which is argued to create a low-frequency prominence. (Recall our discussion of integration of prominences falling within 3 - 3.5 Bark. However, the effects of vowel height on F0 are small, and it’s questionable whether these small shifts are sufficient to contribute to the F0 /F1 center of gravity effect.)

  6. More evidence cited in support of Auditory Enhancement Theory:Speech categorization by nonhuman animalsAnimals exhibit categorical-like perception (and, more recently, evidence of “perceptual compensation”). The convergence of perceptual findings for humans and animals suggests that general auditory processes may be responsible for human speech perception.Perception of nonspeech stimuliSome of the perceptual phenomena reported for speech hold for nonspeech analogs as well.Trading relationsSome acoustic properties may “trade” with each other because each enhances a particular auditory effect (an argument, as Hawkins points out, that contradicts the gesturalists’ claim of auditory independence of the relevant properties).

  7. A test case for auditory vs. gestural accounts of perception: PERCEPTUAL COMPENSATIONFrom 1980-2000, a series of experiments was conducted on a /da-ga/ continuum embedded in two contexts, /a®__/ and /al__/. Coarticulation between the stop and preceding liquid will yield more front articulations of /g/ after /l/ than/ after /®/. The acoustic effects of this coarticulation are mainly in F3: F3 onset of /g/ is higher after /l/ than after /®/. That is, /g/ is acoustically more /d/-like after /l/. Listeners compensating for this effect will attribute it to its coarticulatory source, and will be more likely to hear relatively ambiguous stops as /g/ in the /l_/ context than in the /®_/ context.  Do AE-speaking adults compensate for these coarticulatory effects? What about Japanese-speaking listeners (recall the Miyawaki et al. data)? Infants? Quail? What about audio-visual processing of these stimuli?

  8. Mann (1980, Perception & Psychophysics 28) found that (American English) listeners adjust for the coarticulatory effects of the liquid on the following stop: listeners heard more /g/ in the context of /l_/. But are these effects due largely or entirely to experience with the relevant coarticulatory patterns? Mann (1986, Cognition 24) found that Japanese-speaking listeners who could not reliably discriminate the /®-l/ distinction also adjusted for the coarticulatory effects of the liquid on the following stop. Fowler, Best, & McRoberts (1990, Perception & Psychophysics 48) asked whether such compensation—which (arguably) involves disentangling the acoustic effects of coarticulatory overlap—requires experience producing coarticulated speech. They found that 4-mo-old infants responded in much the same way as adults.

  9. Lotto, Kluender, & Holt (1997, JASA 102) tested whether this perceptual phenomenon is species-specific. They trained Japanese quail to peck differentially to clear cases of /da/ and /ga/. When quail were presented with ambiguous /da-ga/ stimuli in /a®_/ and /al_/ contexts, quails responded much as humans did. Lotto et al. concluded that apparent “perceptual compensation for coarticulation” is due instead to a more general auditory contrast effect: the low-frequency F3 of /g/ sounds (comparatively) higher after high-F3 /l/, triggering more /g/ responses in this context. Fowler, Brown, & Mann (2000, Journal of Exp. Psych.: Human Perc. & Perf. 26): If perceptual “compensation” is a purely auditory phenomenon, then the effect should not hold when the information for the liquid is visual rather than acoustic. Using a variant of the McGurk paradigm, listeners saw a clear /r/ or /l/, and heard a constant ambiguous[al/r] precursor followed by members from the /da-ga/ continuum. Bottom line: listeners compensated for the coarticulatory effects based on the visual stimulus. Since the precursor audio was constant, compensation could not be an auditory effect.

  10. QUANTAL THEORYK. N. Stevens (1972) In David & Denes (eds), Human Communication.K. N. Stevens (1989) Journal of Phonetics 17. Quantal Theory is a theory of articulatory-acoustic and acoustic-auditory relations.  These relations are non-monotonic: there are certain regions of greater sensitivity to change and other regions of greater stability. That is, continuous variation in vocal tact configuration may result in large or small acoustic shifts. Similarly, same-sized acoustic differences may elicit large or small changes in the auditory response. These regions form the basis for a universal set of distinctive features, each feature corresponding to an invariant property that the auditory system is sensitive to.

  11. I II III Acoustic parameter Auditory response Articulatory parameter Acoustic parameter

  12. Supporting data involve evidence of quantal effects in articulatory-acoustic and acoustic-auditory relations.E.g., some articulatory-acoustic relations in vowels:Non-low front (unrounded) vowels: Broad maximum of F2 for configurations with back-cavity length in 6.5-9 cm range, resulting in F2-F3 proximity. So F2 and F3 of front Vs are relatively insensitive to front-back changes in tongue body position.Non-low back rounded vowels: Stable F1-F2 separation (400-500 Hz) across a range of back cavity lengths.Rounding permits closer proximity of 2 formants (F1-F2 for back, F2-F3 for front), creating a more prominent peak in the spectrum.

  13. E.g., some acoustic-auditory relations in vowels:Stevens’ approach to acoustic-auditory relations in vowels builds on the findings of Chistovich and her colleagues (which we have discussed) of a center-of-gravity effect for spectral prominences that fall within 3 - 3.5 Bark. Back vowels: ideally, F2-F1 distance falls within the critical spacing (but recall that this doesn’t hold for English non-low back Vs).Front vowels: F3-F2 expected to be within the critical distance.High vowels: evidence suggests that F1- F0 distance is 3.0-3.2 Bark.Oral-nasal distinction: Maximum perturbation in F1 region (extra peak and/or decrease in F1 amplitude); listeners’ perceptual responses show crossovers when these properties are manipulated.

  14. DEVELOPMENTAL PERCEPTIONWe do not have sufficient time here to discuss in any detail current theories of developmental perception, which include theories of how infants develop from language-general to language-specific perceivers.An empirical finding that theoretical approaches need take into account is that the transition from language-general to language-specific (in terms of phoneme perception) takes place within the first year of life.Werker & Tees (1984, Infant Behavior & Development 7) tested English-learning infants on a Hindi /ˇa-ta/ contrast and a Nthlakampx /k'i-q'i/ contrast. Most 6-8 mo olds were able to discriminate both contrasts, but few 10-12 mo olds were able to do so. Similar changes between 6 and 12 months have since been reported for other contrasts (with some evidence that developmental changes in vowel perception may be in place even earlier).

  15. Other evidence of developmental changes within the first year of life comes from research by Kuhl and her colleagues that focuses on the internal structure of phonemic categories.Adult listeners can identify the “best” or “prototype” instance of a category such as /i/, but they are relatively poor at discriminating between vowels that are acoustically/auditorily close to the prototype. Thus prototypes appear to function as perceptual magnets, attracting nearby stimuli to themselves and warping the auditory space. Prototypes differ across languages, as shown by findings for English- and Swedish-speaking adults. Infants show evidence of language-specific prototypes, and accompanying magnet effects by 6 months. Kuhl’s Native Language Magnet Model proposes that infants are innately endowed with the ability to discriminate sounds that cross natural auditory boundaries. By 6 mo of age, native-language prototypes have been acquired and, over the course of the next 6 months, infants lose their sensitivity to the natural boundaries not used in the ambient language.

More Related