An interdisciplinary approach to the problem of visual and auditory attention
Sponsored Links
This presentation is the property of its rightful owner.
1 / 29

An interdisciplinary approach to the problem of visual and auditory attention PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

An interdisciplinary approach to the problem of visual and auditory attention. The POP project (2006-2009) Presented by: Helder Araujo University of Coimbra. The POP consortium. Radu Horaud, INRIA, coordinator Andreas Engel, University of Hamburg Peter König, University of Osnabrück

Download Presentation

An interdisciplinary approach to the problem of visual and auditory attention

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

An interdisciplinary approach to the problem of visual and auditory attention

The POP project (2006-2009)

Presented by:

Helder Araujo

University of Coimbra

The POP consortium

  • Radu Horaud, INRIA, coordinator

  • Andreas Engel, University of Hamburg

  • Peter König, University of Osnabrück

  • Helder Araujo, University of Coimbra

  • Martin Cooke, University of Sheffield

The overall idea of the project is to study

visual and auditory attention and to propose

an explicit model that is at the cross-roads of

neuroscience and engineering science

The neuroscience perspective

  • Visual attention is strongly related to awareness and the understanding of visual attention may open a window onto the understanding of consciousness.

  • Visual attention involves both « reflex » and « willed » behaviours. The links between the visual areas, both cortical and sub-cortical are currently poorly understood.

  • Auditory attention has barely been addressed

The computational perspective

  • Visual and auditory attention systems go far beyond the concept of a simple process to be developed by computer scientists as a stand alone module

  • It is challenging because it must integrate theories from a large variety of engineering disciplines (control-theory, computer vision, statistics, signal processing, probability theory, etc.) into a working computer programme with its associated hardware components.

To date, there is no common agreement

on either a biological or

a computational model of attention.

The POP manifesto

  • Work in neurophysiology and psychophysics will allow us to make advances towards the nature of the solution.

  • Work in computer vision, computational auditory analysis, and control theory will suggest AND implement an explicit solution. The computational solution available by the end of the project will be :

    • partially in agreement with neurophysiology and psychophysical findings

    • partially in disagreement because the biological knowledge is (and will still be) incomplete

    • It may predict for what to look for in the brain in order to make steps towards an explicit modelling of visual and auditory attention.

The neuro-psychophysical model (1)

  • Visual attention decomposes into two steps: preattentive processing (mostly parallel) followed by attentive processing (serial)

  • One cannot attend (track) two visual events simultaneously

  • There is no particular cortical area whose activity is responsible of attention

  • We should not assume that the higher levels of the brain could be the only ones involved in attention/awareness, or that they are more important than the lower levels.

The neuro-psychophysical model (2)

  • Humans routinely perform 2D to 3D conversion. One possibility to perform this conversion is through binocular (stereo) vision. The link between attention and binocular vision has been overlooked, and there is almost nothing out there in the neuroscience literature (Rolls and Deco book, 2004, 550 pages, 4 chapters on visual attention, 2 pages on depth perception).

  • Likewise visual attention, there is no cortical area that is particularly specialized in binocular vision.

  • The links between sub-cortical and cortical areas and their relevance to binocular vision have been ignored

The neuro-psychophysical model (3)

  • Such a « global » view of the problem of visual attention raises the question of « binding » or how the various contributions of the numerous visual areas are correlated together to provide a « behaviour ».

  • Among all, we have to address the problem of transient binding, able to combine visual features into an infinite variety and to provide, within a short time interval, an organized view of the world.

  • This transient binding is relevant for visual attention in general, and for depth perception in particular and it can be viewed as a form of 3D visual awareness.

Computational and algorithmic vision

  • explicit modelling that takes images as input and produces image descriptions as output.

  • Based on D. Marr’s explit model that disinguishes three representation levels : 2D, 21/2D, and 3D. A number of vision modules build representations from low-level to high-level.

  • These vision modules are studied as independent/separate processes (stereopsis, structure-from-motion, shape-from-shading, object recognition/categorization, etc.)

  • Some of these models rely on well-grounded mathematical theories, and/or are biologically plausible.

  • Models that propagate knowledge top-down are more difficult because of the inherent computational complexity.

Auditory scene analysis

  • Understanding a sound (speech is a particular form of sound) in the presence of competing sounds

  • It is common to talk about speech recognition and not about sound recognition – the vast majority of work has concentrated on recognition of speech in the absence of noise

  • the problem of active hearing, which is to recognise sounds in a dynamic environment: changing number of sources, moving source, moving receiver, etc. has barely been addressed

Auditory analysis: The key concept


acoustic mixture



acoustic sources

auditory streams

The POP approach

Cross-fertilization between neuro- and engineering sciences

  • Cast neuro-anatomical and neuro-physiological findings into a computational model is difficult and challenging.

  • The computational model may take some « liberty » with respect to neuro-biology : it has its own (mathematical and technological) constraints, it may predict the existence of behaviors where the neuro-biological knowledge is still partially understood and incomplete.

  • The same kind of collaboration that exists between theoretical and experimental physics, where the former may predict what should be looked at by the latter, and where empirical observations call for a thorough theoretical model.

  • The case of active stereo that needs both low-level and high-level,that implies strong links between reflex and willed attention, and that may correlate with auditory cues, will be studied in detail

Neurophysiology and neuropsychophysics

  • We will build our approach on the temporal binding model : Neural synchrony with a precision up to the millisecond is crucial for response selection, attention, and sensory-motor integration.

  • A crucial ingredient of the model is that neural synchrony can be intrinsically generated (intrinsic signals that reflect experience, action goals, etc.) and not imposed on the system by external stimuli.

Temporal binding via correlated firing

  • • It “binds” neurons transiently into functionally coherent assemblies

  • • This temporal binding occurs within a few milliseconds (10-15 faster than the firing rate of a single neuron)

  • Relevant for perceptual integration, attention, memory formation, awareness

This model needs further experimental validation with emphasis on :

The correlation between overt attention (visual attention combined with eye movements) and depth cues such as binocular disparity.

The correlation (cross-modal integration) between visual and auditory cues.

Experiments combining eye-tracking and EEG,

fMRI measurements to identify brain regions involved in multisensory control of attention.

Experimental validation

Computational validation

  • These models and experiments must be further put into a computational model of neural synchrony.

  • We will address this computational modelling at an « abstract » level, i.e., not modelling exactly the interactions between neurons, but modelling a visual task, such as active estimation of depth cues.

Active binocular depth perception

  • Build an observer-centered 3D representation of the world

  • Adopt a visual attention paradigm

  • A framework that gathers reflex camera motions, willed camera motions, and computation of a dense depth map.

  • Explicit modelling of geometry, kinematics, and dynamics of eye movements

The statistical framework

  • The depth at each visual-field location and the depth discontinuity between two adjacent visual-field locations will be viewed as hidden variables in a RMF model (Random Markov Field).

  • The « energy » associated with such a model is analog to the energy of an interacting spin physical system.

  • We will use statistical optimization methods that attempt to find the global minimum of such a system. There is a strong analogy between such a modelling and neural computation.

Computational auditory analysis

We will adress the problem of analysing complex auditory stimuli (several audio sources, some of them moving around) with a classical temporal analysis approach (hidden Markov chains) and a novel spatial analysis approach (binaural audition).

Cross-modal integration

The integration of vision (depth maps) and audition (auditory maps) will be investigated within a statistical framework similar to the one just described.

Modelling binocular eye motions

Binocular tracking








































Stereo Vision

Depth perception

Eye movements + Depth perception = Active stereopsis

  • Eye movements: saccades, smooth pursuit, vergence (3-4 movements/sec)

  • Depth perception: MRF based algorithm

  • Active stereopsis will be viewed within a visual attention paradigm: binding low-level reflex behaviours with willed behaviours

  • Correlated firing will be the underlying physiological model

Thank you

  • Login