340 likes | 607 Views
Motor Theory + Signal Detection Theory. March 23, 2010. Oh Yeahs. Nasometer labs due! Dental vs. alveolar vs. bilabial release bursts. Examples from Yanyuwa: Examples from Hindi: Creating synthetic formant transitions: KlattTalk. KlattTalk.
E N D
Motor Theory + Signal Detection Theory March 23, 2010
Oh Yeahs. • Nasometer labs due! • Dental vs. alveolar vs. bilabial release bursts. • Examples from Yanyuwa: • Examples from Hindi: • Creating synthetic formant transitions: KlattTalk.
KlattTalk • KlattTalk has since become the standard for formant synthesis. (DECTalk) • http://www.asel.udel.edu/speech/tutorials/synthesis/vowels.html
Categorical Perception • Categorical perception = • continuous physical distinctions are perceived in discrete categories. • In the in-class perception experiment: • There were 11 different syllable stimuli • They only differed in the locus of their F2 transition • F2 Locus range = 726 - 2217 Hz • Source: http://www.ling.gu.se/~anders/KatPer/Applet/index.eng.html
Stimulus #1 Stimulus #6 Stimulus #11 Example stimuli from the in-class experiment.
Identification • In Categorical Perception: • All stimuli within a category boundary should be labeled the same.
Discrimination • Original task: ABX discrimination • Stimuli across category boundaries should be 100% discriminable. • Stimuli within category boundaries should not be discriminable at all. In practice, categorical perception means: the discrimination function can be determined from the identification function.
Identification Discrimination • Let’s consider a case where the two sounds in a discrimination pair are the same. • Example: the pair is stimulus 3 followed by stimulus 3 • Identification data--Stimulus 3 is identified as: • [b] 95% of the time • [d] 5% of the time • The discrimination pair will be perceived as: • [b] - [b] - .95 * .95 = .9025 • [d] - [d] - .05 * .05 = .0025 • Probability of same response is predicted to be: • (.9025 + .0025) = .905 = 90.5%
Identification Discrimination • Let’s consider a case where the two sounds in a discrimination pair are different. • Example: the pair is stimulus 9 followed by stimulus 11 • Identification data: • Stimulus 9: [d] 80% of the time, [g] 20% of the time • Stimulus 11: [d] 5% of the time, [g] 95% of the time • The discrimination pair will be perceived as: • [d] - [d] - .80 * .05 = .04 • [g] - [g] - .20 * .95 = .19 • Probability of same response is predicted to be: • (.04 + .19) = 23%
Discrimination • In this discrimination graph-- • Solid line is the observed data • Dashed line is the predicted data • (on the basis of the identification scores) Note: the actual listeners did a little bit better than the predictions.
Categorical, Continued • Categorical Perception was also found for stop/glide/vowel distinctions: 10 ms transitions: [b] percept 60 ms transitions: [w] percept 200 ms transitions: [u] percept
Interpretation • Main idea: in categorical perception, the mind translates an acoustic stimulus into a phonemic label. (category) • The acoustic details of the stimulus are discarded in favor of an abstract representation. • A continuous acoustic signal: • Is thus transformed into a series of linguistic units:
The Next Level • Interestingly, categorical perception is not found for non-speech stimuli. • Miyawaki et al: tested perception of an F3 continuum between /r/ and /l/.
The Next Level • They also tested perception of the F3 transitions in isolation. • Listeners did not perceive these transitions categorically.
The Implications • Interpretation: we do not perceive speech in the same way we perceive other sounds. • “Speech is special”… • and the perception of speech is modular. • A module is a special processor in our minds/brains devoted to interpreting a particular kind of environmental stimuli.
Module Characteristics • You can think of a module as a “mental reflex”. • A module of the mind is defined as having the following characteristics: • Domain-specific • Automatic • Fast • Hard-wired in brain • Limited top-down access (you can’t “unperceive”) • Example: the sense of vision operates modularly.
A Modular Mind Model central processes judgment, imagination, memory, attention vision hearing touch speech modules transducers eyes ears skin etc. external, physical reality
Remember this stuff? • Speech is a “special” kind of sound because it exhibits spectral change over time. • it’s processed by the speech module, not by the auditory module.
SWS Findings • The uninitiated either hear sinewave speech as speech or as “whistles”, “chirps”, etc. • Claim: once you hear it as speech, you can’t go back. • The speech module takes precedence • (Limited top-down access) • Analogy: it’s impossible to not perceive real speech as speech. • We can’t hear the individual formants as whistles, chirps, etc. • Motor theory says: we don’t perceive the “sounds”, we perceive the gestures which shape the spectrum.
McGurk Effect explained • Audio Visual Perceived • ba + ga da • ga + ba ba (bga) • Some interesting facts: • The McGurk Effect is exceedingly robust. • Adults show the McGurk Effect more than children. • Americans show the McGurk Effect more than Japanese.
Original McGurk Data • Auditory Visual • Stimulus: ba-ba ga-ga • Response types: • Auditory: ba-ba Fused: da-da • Visual: ga-ga Combo: gabga, bagba • Age Auditory Visual Fused Combo • 3-5 19% 36 81 0 • 7-8 36 0 64 0 • 18-40 2 0 98 0
Original McGurk Data • Auditory Visual • Stimulus: ga-ga ba-ba • Response types: • Auditory: ba-ba Fused: da-da • Visual: ga-ga Combo: gabga, bagba • Age Auditory Visual Fused Combo • 3-5 57% 10 0 19 • 7-8 36 21 11 32 • 18-40 11 31 0 54
Audio-Visual Sidebar • Visual cues affect the perception of speech in non-mismatched conditions, as well. • Scientific studies of lipreading date back to the early twentieth century • The original goal: improve the speech perception skills of the hearing-impaired • Note: visual speech cues often complement audio speech cues • In particular: place of articulation • However, training people to become better lipreaders has proven difficult… • Some people got it; some people don’t.
Sumby & Pollack (1954) • First investigated the influence of visual information on the perception of speech by normal-hearing listeners. • Method: • Presented individual word tokens to listeners in noise, with simultaneous visual cues. • Task: identify spoken word • Clear: • +10 dB SNR: • + 5 dB SNR: • 0 dB SNR:
Sumby & Pollack data • Auditory-Only Audio-Visual • Visual cues provide an intelligibility boost equivalent to a 12 dB increase in signal-to-noise ratio.
Tadoma Method • Some deaf-blind people learn to perceive speech through the tactile modality, by using the Tadoma method.
Audio-Tactile Perception • Fowler & Dekle: tested ability of (naive) college students to perceive speech through the Tadoma method. • Presented synthetic stops auditorily • Combined with mismatched tactile information: • Ex: audio /ga/ + tactile /ba/ • Also combined with mismatched orthographic information: • Ex: audio /ga/ + orthographic /ba/ • Task: listeners reported what they “heard” • Tactile condition biased listeners more towards “ba” responses
Fowler & Dekle data orthographic mismatch condition tactile mismatch condition read “ba” felt “ba”
fMRI data • Benson et al. (2001) • Non-Speech stimuli = notes, chords, and chord progressions on a piano
fMRI data • Benson et al. (2001) • Difference in activation for natural speech stimuli versus activiation for sinewave speech stimuli
Mirror Neurons • In the 1990s, researchers in Italy discovered what they called “mirror neurons” in the brains of macaques. • Macaques had been trained to make grasping motions with their hands. • Researchers recorded the activity of single neurons while the monkeys were making these motions. • Serendipity: • the same neurons fired when the monkeys saw the researchers making grasping motions. • a neurological link between perception and action. • Motor theory claim: same links exist in the human brain, for the perception of speech gestures
Motor Theory, in a nutshell • The big idea: • We perceive speech as abstract “gestures”, not sounds. • Evidence: • The perceptual interpretation of speech differs radically from the acoustic organization of speech sounds • Speech perception is multi-modal • Direct (visual, tactile) information about gestures can influence/override indirect (acoustic) speech cues • Limited top-down access to the primary, acoustic elements of speech