Version WS 2007-8. Speech Science XIII. Speech perception is special (Accompanying notes in English). Topics. Speech perception as simple pattern matching? Evidence for and against a “speech mode ” of speech perception. A bird ’ s-eye view of the perception landscape
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
P.-M., 3.2.2., part 2 + 3.2.3. pp. 162-173
Two-formant synthetic vowels which best match natural vowels (nach Carlsson et al. 1975, Fig. 1)
For 140 Hz fundamental,
the same vowels are generally perceived with 80 Hz lower F1 values
than for a 280 Hz F0
(after Miller 1953)
Formants of carrierrelative to testword
/bt/Vowels relative to preceding context.
Ladefoged and Broadbent (1957)demonstrated that the size of thespeaker producing a carrier phrase(and therefore the values of the speaker‘s vowel formants) affected the intrepetation of the test wordsat the end of the carrier phrase.(the test words were not produced
by different speakers)
Relation of carrier-phrase formantsrelative to testword formantvalues. (e.g. F1 up = higher carrierphrase formants, therefore testwordheard as less open lower F1)
….Is this the case?
Formants rarely stay constantfor long in C_C syllabic context.
This could lead to the assumptionthat isolated vowels with well-defined, steady-state formantsshould be identified with more certainty.
But Stevens (1968) showed thatsteady-state isolated vowels are,in fact, less well identified thansyllable-context vowels.
Verbrugge & Rakerd (1986)investigated the contributionof the dynamic, movementinformation vs. the “vowel-defining” target information.The importance ofvowel-target info.vs. vowel-dynamics
The whole syllable was clearlyeasiest to recognise (91.7%). But even if the central target section was missing, almost 80% werecorrectly identified.
We identify acousticallydifferent stimuli as one and the same articulatorily defined speech sound
We can only discriminate acoustic differences between stimuli that cross category boundaries, although the differences within categories are just as great.
Discriminability of stimulus pairs
Series of acoustically equidistant stimuliCategorical Perception
No. of judgements for a category
E.g., 1 is a typical /b/ F2-transition, 8 is a typical /d/ transition and 15 is a typical /g/ transition. Stimuli 2-7 and 9-14 are steps between these typical stimuli.
• Further experiments with many other acoustic propertieswhich come from articulations which are not categorically separable (VOT, /l – r/, vowel categories, etc.) broughtabout a theoretical modification ….
• Categorical perzeption is “acquired” and the increased distinctiveness between categories is also acquired. The low-sensitivity baseline between the category boundariescan be seen as psychoacoustically normal sensitivity.
• Normal perception in persons with disturbed articulationinduced a theoretical fall-back to a position where the link
between perception and production was more abstract….
The position was referred to as “the speech mode” of perception. This still made speech perception special.
• Many experiments showed that the functional goal ofspeech perception made it special:
• Dichotic signals (different parts played into the left andright ear) were heard as one speech sound, but the separateelements were still audible
• Separate words played into the left and right ear were
heard as one word, if the sounds of the two words couldcombine: E.g. “pay” + “lay” “play”. This was heardeven if the /l/ started before the release of the /p/!
• Even more dramatic is the perceptual“switch” which canoccur with “sine analogue speech”. Some people hear itas strange music until they are asked whether they canunderstand what is being said. They then hear it as speech(and cannot switch back to the music mode)
• The prime input in speech perception is the acousticsignal, but we can also often see the person who isspeaking and have therefore a sub-conscious knowledgeof the visual information accompanying the acoustics.
• A laboratory mistake led to the discovery, that a videoclip of a spoken /ga/ together with the acoustic Signal of/ba/ is often perceived as /da/. Acoustic /ga/ with a video of /ba/, on the other hand, is heard as /ba/.
• This “McGurk” effect (after the person who discovered it)has since been systematically investigated. It confirms thatwe cannot ignore visual information, but the synchronisationmust be accurate for fusion to take place.
Es gibt einen Effektvon fast 25% in derErkennung eines echten Wortes im Vergleich zu einemNichtwort entlangeiner Stimulusreihemit einem Wort bzw.einem Nichtwort alsEndstimulus:
• There are still many scientists who consider the speech-mode approach too much like “hocus pocus”. They concentrate on a more direct relationship between theacoustic signal and the percept.
• Stevens’ “quantal theory” of (plosive) perception rests on thefact that /t, d/ tend to have high-frequency energy, /g, k/ havemiddle-frequency energy, and /b, p/. Therefore, the same relative acoustic information serves the distinction independent of context.
• “Feature detectors” have been another attempt to link theacoustic signal directly with the linguistic units in a morepassive model of speech perception. Animals have high-level neuronal detectors linked to vital functions, so whynot humans?