Cross-modal Prediction in Speech Perception

Cross-modal Prediction in Speech Perception Carolina Sánchez, Agnès Alsius, James T. Enns & Salvador Soto-Faraco Multisensory Research Group Universitat Pompeu Fabra Barcelona

Auditory + visual performance MSI enhancement Background Visual + Auditory Improve Speech Perception Multisensory Integration

Background • Prediction within one sensory modality • Many levels of information processing • Phonological prediction “ This morning I went to the library and borrowed a … book” (De Long, 2005; Pickering, 20707) • Visual prediction: Visual search (Enns, 2008; Dambacher, 2009) • Sensorimotor prediction: forward model (Wolpert, 1997)

Predictive coding Pickering, 2007

Hypothesis • If there exists prediction within the same modality, and if predictive coding models can account for prediction at a phonological level, then … Predictive Coding could occur across different sensory modalities too.

Indirect evidences of cross-modal transfer in speech ERPs • Amplitud reduction • Shortening latency /pa/ high visual saliency /ka/ short visual saliency time van Wassenhove’s , 2005

Our study • Visual prediction • Auditory prediction • Visual-to-auditory cross-modal prediction • Auditory-to-visual cross-modal prediction

speech non speech Context fragment Target fragment Visual prediction Task : AV Match vs. AV Mismatch With visual informative visual context Visual stream Auditory stream Without informative context V A

* Reaction time Without informative context * With visual informative context msec match mismatch Results With previous context participants respond faster than without it. VISUAL PREDICTION

speech non speech Context fragment Target fragment Auditory prediction Task : AV Match vs. AV Mismatch With auditory informative auditory context Visual stream Auditory stream Without informative context V A

* Reaction time Without informative context With auditory informative context * msec mismatch match Results With previous context participants respond faster than without it. AUDITORY PREDICTION

Visual vs. Auditory Rts Visual prediction Auditory prediction Rts Without informative context Without informative context With visual informative context With auditory informative context msec msec * * incongruent congruent congruent incongruent

Conclusions • Visual prediction • Auditory prediction Is this prediction cross-modal?

V A Match V V A A Mismatch V A Predictability of Vision-to-AuditionDesign of the experiment Unimodal continued Unimodal continued Visual stream Auditory stream Discontinued Discontinued Mismatch Cross-modal continued Match Mismatch

V V A A Mismatch Mismatch V A Mismatch Predictability of Vision-to-AuditionStimuli Cross-modal continued Discontinued Unimodal continued

* Results Reaction time Participants were faster in the cross-modal condition than in the completely incongruent one. msec VISUAL –TO-AUDITORY PREDICTION Cross-modal continued Unimodal continued Discontinued Visual Auditory

V A Match Match V V A A Mismatch Mismatch V A Mismatch Predictability of Audition-to-Vision Design of the experiment Unimodal continued Unimodal continued Visual stream Auditory stream Discontinued Discontinued Cross-modal continued

Results Reaction time We didn’t find any difference between the mismatch condicions NO AUDITORY-TO-VISUAL PREDICTION msec Cross-modal continued Unimodal continued Discontinued Visual Auditory

Conclusions There is some kind of prediction from vision-to-auditory modality There is not any prediction from auditory-to-vision modality Does this prediction depend on the language?

Results (L1) Spanish participants with spanish sentences Canadian participants with english sentences Reaction time Reaction time * * msec msec Cross-modal continued Cross-modal continued Unimodal continued Unimodal continued Discontinued Discontinued Visual Visual Auditory Auditory VISUAL –TO-AUDITORY PREDICTION IN NATIVE LANGUAGE

Spanish participants with spanish sentences Results (L1) Canadian participants with english sentences Reaction time Reaction time msec msec Cross-modal continued Cross-modal continued Unimodal continued Unimodal continued Discontinued Discontinued Visual Visual Auditory Auditory No differences between the mismatch conditions No prediction from auditory-to-visual modality in native language

Conclusions There is some kind of prediction from vision-to-auditory modality in L1 There is not any prediction from auditory-to-vision modality L1 What happens with an unknown language?

Canadian participants with spanish sentences Reaction time msec NO VISUAL-TO-AUDITORY IN OTHER LANGUAGE Unknown language : visual to auditory Cross-modal continued Unimodal continued Discontinued Visual Auditory

Unknown language: auditory to visual Spanish participants with english sentences Canadian participants with spanish sentences Reaction time Reaction time msec msec Cross-modal continued Cross-modal continued Unimodal continued Unimodal continued Discontinued Discontinued Visual Visual Auditory Auditory No differences between the mismatch conditions No prediction from auditory-to-visual modality in other language

Conclusions • No visual-to-auditory cross-modal prediction in an unknown language… it seems that some level of knowledge about the articulatory phonetics of the language is required to obtain the advantage of the predictive coding • No auditory-to-visual cross-modal prediction

General Conclusions • Unimodal prediction from visual to visual modality from auditory to auditory • L1: ASYMMETRY • Cross-modal prediction from visual-to-auditory modality • No cross-modal prediction from auditory-to-visual modality • Unknown language: previous knowledge of the language is neccesary to make the prediction • No cross-modal prediction from visual-to-auditory modality • No cross-modal prediction from auditory-to-visual modality

Thanks to… • Agnès Alsius, Postdoc Queen’s University • Antonia Najas, MA/ Research Assistant Universitat Pompeu Fabra • Phil Jaekl, Postdoc Universitat Pompeu Fabra - All the people of the Vision Lab, UBC, Vancouver Thanksforyourattention!!

Cross-modal Prediction in Speech Perception