Jaimie gilbert psychology 593 october 6 2005
1 / 30

Jaimie Gilbert Psychology 593 October 6, 2005 - PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Visual speech speeds up the neural processing of auditory speech van Wassenhove, V., Grant, K. W., & Poeppel, D. (2005) Proceedings of the National Academy of Sciences, 102(4), 1181-1186. Jaimie Gilbert Psychology 593 October 6, 2005. Audio-Visual Integration.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Jaimie Gilbert Psychology 593 October 6, 2005

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Visual speech speeds up the neural processing of auditory speechvan Wassenhove, V., Grant, K. W., & Poeppel, D. (2005) Proceedings of the National Academy of Sciences, 102(4), 1181-1186.

Jaimie Gilbert

Psychology 593

October 6, 2005

Audio-Visual Integration

  • Information from one modality (e.g., visual) can influence the perception of information presented in a different modality (e.g., auditory)

    • Speech in noise

    • McGurk Effect

Demonstration of McGurk Effect

Audiovisual Speech Web-Lab


Arnt Maasø University of Oslo


Unresolved questions about AV integration

  • Behavioral evidence exists for vision altering the perception of speech, but…

  • When does it occur in processing?

  • How does it occur?

ERPs can help answer the “when” question

  • EEG/MEG studies have demonstrated AV integration effects using oddball/mismatch paradigms

    • These effects occur around 150-250 ms

  • A non-speech ERP study with non-ecologically valid stimuli demonstrated earlier interaction effects (40-95 ms) (Giard & Peronnet, 1999)

  • Does AV integration for speech occur earlier than 150-250 ms?

There’s a debate about the “how” question…

  • Enhancement

    • Audio-visual integration generates activity at multi-sensory integration sites, information possibly fed back to sensory cortices

  • VS.

  • Suppression

    • Reduction of stimulus uncertainty by two corresponding sensory stimuli reduces the amount of processing required

The Experiments

  • 3 experiments were conducted

    • Each had behavioral and EEG measures

    • Behavioral: Forced choice task

    • EEG: Auditory P1/N1/P2

  • 26 participants

    • Experiment 1: 16

    • Experiment 2: 10

    • Experiment 3: 10 (of the 16 who participated in Experiment 1)

Audio /pa/

Audio /ta/

Audio /ka/

Visual /pa/

Visual /ta/

Visual /ka/

AV /pa/

AV /ta/

AV /ka/

Incongruent AV with Audio /pa/ + Visual /ka/

1 Female face & voice for all stimuli

In Exp. 1 & 2, each stimuli presented 100 times; total of 1000 trials

The Stimuli

Experiment 1

  • Exp. 1

  • Stimuli presented in blocks of audio, or blocks of visual, or blocks of AV (congruent and incongruent)

  • Participants knew before each block which stimuli were going to be presented

Experiment 2

  • Exp. 2

  • Stimuli presented in randomized blocks containing all stimuli types (A, V, Congruent AV, Incongruent AV) to reduce expectancy

  • Task for both experiments: choose which stimuli was presented; for AV--choose what was heard while looking at the face

Experiment 3

  • Presented 200 Incongruent AV stimuli

  • Task: choose what syllable you saw, neglect what you heard

  • In all experiments, correct response to Incongruent AV = /ta/

Waveform Analysis

  • Retained 75-80% of recordings after Artifact Rejection and Ocular Artifact Reduction

  • Only correct responses were analyzed

  • 6 electrodes used in analysis: FC3, FC4, FCz, CPz, P7, P8

  • Reference electrodes: Linked mastoids


  • This study’s answer to “How”

    • Suppression/Deactivation Hypothesis

  • AV N1 & P2 amplitude were significantly reduced compared to Auditory-alone peaks

  • Performed separate analysis to determine if summing the responses to unimodal stimuli would result in the amplitude reduction present in the data—this was not the case; therefore the AV waveform is not a superposition of the 2 sensory waveforms, but reflects actual multisensory interaction.

Results: Experiment 1

  • N1/P2 Amplitude

    • AV < A (p < .0001)

  • N1/P2 Latency

    • AV < A (significant, but confounded by interaction)

    • Modality x Stimulus Identity

      • P < T < K (p < .0001)

    • Latency effect more pronounced in P2, but can occur as early as N1

Results: Experiment 2

  • N1/P2 Amplitude

    • AV < A (p < .0001)

  • N1/P2 Latency

    • AV < A (p < .0001)

    • Modality x Stimulus Identity (p < .06)

Results: comparison of Exp. 1 & Exp. 2

  • Similar results for Exp. 1 & 2;

  • Temporal facilitation varied by Stimulus Identity but amplitude reduction did not;

  • No evidence for attention effect (i.e., for expectancy affecting waveform morphology)

Temporal facilitation depends on visual saliency/signal redundancy

  • More temporal facilitation is expected to occur if:

    • The audio and the visual signals are redundant

    • The visual cue (which naturally precedes the auditory cue) is more salient

    • (Figure 3)

Results: Experiment 3/Incongruent AV Stimuli

  • Incongruent AV stimuli in Exp. 1 & 2:

    • no temporal facilitation

    • Amplitude reduction present and equivalent to reduction seen for Congruent AV stimuli

  • Experiment 3:

    • Both temporal facilitation and amplitude reduction occurred

Visual speech effects on auditory speech

  • Perceptual ambiguity/salience of visual speech affects processing time of auditory speech

  • Incorporating visual speech with auditory speech reduces the amplitude of N1/P2 “independent of AV congruency, participant’s expectancy, and attended modality” (p. 1184)

Ecologically valid stimuli

  • Suggest that AV speech processing is different from general multisensory integration due to the ecological validity of speech

Possible explanation for amplitude reduction

  • Visemes provide information regarding place of articulation

  • If this information is salient and/or redundant with auditory place of articulation cues (e.g., 2nd and 3rd formants), the auditory cortex does not need to analyze these frequency regions, resulting in fewer firing neurons

Analysis-by-Synthesis Model of AV Speech Perception

  • Visual speech activates internal representation/prediction

  • This representation/prediction is updated as more visual information is received over time

  • Representation/prediction is compared to the incoming auditory signal

  • Residual errors to this matching process are reflected by temporal facilitation and amplitude reduction effects

  • Attended modality can influence temporal facilitation

Suggest 2 time scales for AV integration

  • 1: feature stage

    • 25 ms

    • Latency facilitation

    • (sub-)segmental analysis

  • 2: perceptual unit stage

    • 200 ms

    • Amplitude reduction

    • Syllable level analysis

    • Independent of feature content and attended modality


  • AV speech interaction occurs by the time N1 is elicited (50-100 ms)

  • Processing time of auditory speech varies by the saliency/ambiguity of visual speech

  • Amplitude of AV ERP reduced when compared to amplitude of A-alone ERP


  • Dynamic visual stimulus and ocular artifact

  • If effects of AV integration are influenced by attended modality, would modality dominance also influence these effects?

  • Are incongruent AV/McGurk stimuli ecologically valid?

  • Login