What is focal attention for? The What and Why of perceptual selection

What is focal attention for?The What and Why of perceptual selection • The central function of focal attention is to select • We must select because our capacity to process information is limited • We must select because we need to be able to mark certain aspects of a display and to refer to the marked tokens individually • That’s what this talk is principally about: but first some background

The functions of focal attention • A central notion is that of “picking out” or selecting. The usual mechanism that is appealed to in explaining perceptual selection is attention (sometimes called focal attention or selective attention). • Why must we select anyway? • We must select because we can’t process all the information available. This is the resource-limitation reason. • But in what way (along what dimensions) is it limited? What happens to what is not selected? The “filter theory” has many problems. • We need to select because certain patterns cannot be computed without first marking certain special elements (e.g. in counting) • We need to select in order to track the identity of individual things (e.g., to solve the correspondence problem) • We need to select because of the way relevant information in the world is packaged. This leads to the Binding Problem(later)

What is selected? • Whatever the reason for selection, the selection must occur in early in vision (in the visual module) and prior to conceptualization. • For resource-limitation reasons, selection must occur before the need for major resources • In the case of the “marking” or individuating, the empirical facts require that vision pick out and individuate without regard for the conceptual category or properties of the individuals • In the case of the property-binding, there are good reasons why selection should be based on individual things (objects) • All these reasons converge on the claim that what is selected is individuals or proto-objects

 Attention and Selection • Early research concentrated on selective attention as a filter. It assumed that we select what can be described in very low-level terms – i.e., in terms of physical “channels” or based on transducer outputs. But the filter idea was shown to be only approximate – because filters always leaked • It is important that the question of selection be placed in the context of a pre-attentive (modular, nonconceptual, cognitively-impenetrable) stage of vision – otherwise in some sense anything can be “selected” (e.g., being edible, being a genuine Rembrandt painting)

Broadbent’s Filter Theory(illustrating the resource-limited account of selection) Rehearsal loop Effectors Motor planner Senses Filter Limited Capacity Channel Very Short Term Store Store of conditional probabilities of past events (in LTM) Broadbent, D. E. (1958). Perception and Communication. London: Pergamon Press.

 Attention and Selection • The question the basis for selection has been at the bottom of a lot of controversy in vision science. Some options that have been proposed include: • We select what can be described physically (i.e., by “channels”) – i.e. we select based on transducer outputs • e.g., we select by frequency, color, shape, or location • We select according to what is important to us (e.g., affordances), or according to phenomenological salience • We select what we need to treat as special (selection = “marking”) or what we need to refer to • We select aspects (properties) to which we subsequently attach concepts (this idea will be important later) • It is important that the question of selection be placed in the context of a pre-attentive (modular, nonconceptual, cognitively-impenetrable) stage of vision – otherwise in some sense anything can be “selected” (e.g., being edible, being a genuine Rembrandt painting)

What does visual attention select? (What is the basis for selection?) • The most obvious answer to what we select is places. For example, we can select places by moving our eyesso our gaze lands on different places • When places are selected, are they selected automatically? • Must we always move our eyes to change what we attend to? • Studies of Covert Attention-Movement: Posner (1980). • How does attention switch from one place to another? • When places are selected, are they selected automatically? • How does the visual system specify where to move attention to? • If we select places, are there restrictions on those places? e.g., • Must those places be filled or can they be empty places? • Must they be specifiable in relation to landmark objects?

Covert movement of attention Example of an experiment using a cue-validity paradigm for showing that the locus of attention moves without eye movements and for estimating its speed.Posner, M. I. (1980). Orienting of Attention. Quarterly Journal of Experimental Psychology, 32, 3-25.

Extension of Posner’s demonstration of attention switch Does the improved detection in intermediate locations entail that the “spotlight of attention” moves continuously through empty space?

But there are empirical reasons why objects are a better basis for attentional selection than location • There is experimental evidence that attention attaches to things rather than places • When attention is exogenously summoned, the appearance of analog movement of focal attention can be explained by a punctate object-based theory of attention-allocation – Sperling & Weichselgartner (1995)

Sperling & Weichselgartner (1995) “Episodic” or Quantal Theory of Attention switching Assumes a quantal “shift” in attention in which the spotlight pointed at location -2 is extinguished and, simultaneously, the spotlight at location +2 is turned on. Because extinction and onset take a measurable amount of time, there is a brief period when the spotlights partially illuminate both locations simultaneously.

This object-based view of attentional selection is at the heart of FINST theory • I propose that there are good reasons on both experimental and conceptual grounds for supposing that attention attaches itself to objects rather than locations

In what other ways might our visual information capacity be limited? • There are obviously limitations on the input side of vision that depend on the acuity of the sensors and the range of physical properties to which they respond. • But there is a limitation beyond that of acuity: The perceptual system is limited in what it can individuate and how many of these individuals it can deal with at one time. The capacity to individuate is different from the capacity to discriminate. • Some reason for thinking that individuating is a distinct process

The increasingly important role played by ‘Objects’ in studies of visual attention • There is a limitation in visual information processing that is beyond the limitation of acuity and of channel capacity: The perceptual system is limited in what it can individuate and how many of these individuals it can deal with at one time. • The capacity to individuate is different from memory capacity and discrimination capacity. • This notion of individuating and of individuals may be related to Miller’s “chunks”, but it has a special role in vision which we will explore in the next lecture • First some reasons why individuating is a distinct process

Visual Indexes (aka FINSTs) • There is evidence that individuating is a special aspect of vision and the capacity to individuate is different from memory capacity and discrimination capacity. • This notion of individuating and of individuals may be related to Miller’s “chunks”, but it has a special role in vision • In vision there appears to be a limit to how many objects (individuals) can be selected and bound to the arguments of cognitive functions at one time. • There is evidence that we can hold on to 4 objects in visual short term memory (Luck & Vogel, 1997). • There is evidence that Objects (i.e., individual things) may be the basic units of visual attention • FINST Theory (to be described later) claims that there is a mechanism for picking out and referring to (pointing to) primitive visual elements independent of any of their properties and that this mechanism is the essential bridge between nonconceptual and conceptual representation.

Pick out 3 dots and keep track of them • In a field of identical elements you can select a number of them and move your attention among them (e.g., “move one up” or Move 2 right” etc) so long as at no time do you have to hold on to more than 4 dots

Individuals and patterns • Vision does not recognize patterns by applying templates since the size, shape, retinal location, orientation, and other properties must be abstracted away, • A pattern is encoded over time (and often over saccades), therefore the visual system must keep track of the individual parts and merge descriptions of the same part at different times and stages of encoding • Individuating is a prerequisite for recognition of patterns and configural properties defined among a number of individual parts • An example of how we can easily detect patterns if they are defined over a small enough number of parts is subitizing • In order to recognize a pattern, the visual system must pick out individual parts and bind them to the representation being constructed • Examples include what Ullman called “visual routines” • Another area where the concept of an individual has become important is in cognitive development, where it is clear that babies are sensitive to the numerosity of individual things in a way that is distinct from their perceptual abilities but is limited in its capacity

Are there collinear items (n>3)?

Several objects must be picked out at once in making relational judgments • The same is true for other relational judgments like inside or on-the-same-contour… etc. We must pick out the relevant individual objects first. Respond: Inside-same contour? On-same contour?

Signature subitizing phenomena only appear when objects are automatically individuated and indexed Counting slope subitizing slope Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A limited capacity preattentive stage in vision. Psychological Review, 101(1), 80-102.

Encoding conjunctions of properties • Experiments showing the special difficulty that vision has in detecting conjunctions of several properties have provided a basis for understanding an important problem in in visual analysis

How are conjunctions of features detected? Read the vertical line of digits in the following display Under these conditions Conjunction Errors are very frequent

Rapid visual search (Treisman) Find the following simple figure in the next slide:

This case is easy – and the time is independent of how many nontargets there are – because there is only one red item. This is called a ‘popout’ search

This case is also easy – and the time is independent of how many nontargets there are – because there is only one right-leaning item. This is also a ‘popout’ search.

Rapid visual search (conjunction) Find the following simple figure in the next slide:

Serial vs parallel search? • Finding an element that differs from all others in a scene by a single feature – called a single-feature search – is fast, error-free and almost independent of how many nontargets there are; • Finding an object that differs from all others by a conjunction of two or more features (and that shares at least one feature with each object in the scene) – called a conjunction search – is usually slow, error-prone, and is worse the more nontargets there are in the scene*. • These results suggest that in order to find a conjunction, which requires solving the binding problem, attention has to be scanned serially to all objects. * This way of putting is simplifies things. Under certain conditions the serial-parallel distinction breaks down

Single-Feature vs Conjunction-feature search

What is attention is for? Treisman’s Attention as Glue Hypothesis • The purpose of visual attention is to Bindproperties together in order to recognize objects • This is called the “binding problem” or the “many properties problem” and it is of considerable interest to philosophers as well as vision scientists • We can recognize not only the presence of “squareness” and “redness” in our field of view, but we can also distinguish between different ways they may be conjoined

The role of attention to location in Treisman’s Feature Integration Theory

The ‘attention-as-glue’ hypothesis has a corollary: In computing conjunctions of properties, attention must be directed primarily at objectssince it is objects that have the conjoined properties • Instead of being like a spotlight beam that can be scanned around a scene, and can be zoomed to cover a larger or smaller area, maybe attention can only be directed towards occupied places – i.e., to visual objects

An alternative view of how we solve the binding problem • If we assume that only properties of indexed objects are encoded and stored in Object Files, then properties that belong to the same object are stored in the same Object File, so the binding problem does not arise • This is the Object-Based Attention view exemplified by FINST Theory • The assumption that only properties of indexed objects are encoded raises the problem of what happens to properties of the other (unindexed) objects or unencoded properties in a display • I will return to this conundrum later.

FINST Theory postulates a limited number of pointers in early vision that are elicited by causal events in the visual field and that enable vision to refer to things without doing so under concept or a description

Evidence for attentional selection based on Objects • Single Object Advantage: pairs of judgments are faster when both apply to the same perceived object • Entire objects acquire enhanced sensitivity from focal attention to a part of the object • Single-Object advantage occurs even with generalized “objects” defined in feature space • Simultanagnosia and hemispatial neglect show object-based effects • Attention moves with Moving Objects • IOR • Object Files • MOT

Single-object superiority even when the shapes are controlled

Attention spreads over perceived objects Spreads to B and not C Spreads to C and not B Spreads to B and not C Spreads to C and not B Using a priming method (Egly, Driver & Rafal, 1994) showed that the effect of a prime spreads to other parts of the same visual object compared to equally distant parts of different objects.

Objecthood endures over time Several studies have shown that what counts as an object (as the same object) endures over time and over changes in location; Certain forms of disappearances in time and changes in location preserve objecthood. This gives what we have been calling a “visual object” a real physical-object character and partly justifies our calling it an “object”.

Inhibition of return appears to be object-based(as well as to some extent location-based) • Inhibition-of-return is thought to help in visual search since it prevents previously visited objects from being revisited • The original study used static objects. Then (Tipper, Driver & Weaver, 1991) showed that IOR moves with the inhibited object

IOR appears to be object-based (it travels with the object that was attended)

Most studies showed that IOR is object-based (it travels with the object that was attended) • Some studies (Tipper, Weaver, Jerreat, & Burak, 1994) showed that attention can also be location-based, but in those cases the “location” was well marked by visible context cues – so it may be that locations such as “halfway between object X and Object Y” can be attended • Clinical studies with patients who have attentional deficits show that their deficit is object based (illustrated later)

Tracking objects not defined by distinct spatial locations and spatial trajectories Blaser, E., Pylyshyn, Z. W., & Holcombe, A. O. (2000). Tracking an object through feature-space. Nature, 408(Nov 9), 196-199.

There is also evidence from neuropsychology that is consistent with the object-based view • Neglect • Balint and simultanagnosic patients

Visual neglect syndrome is object-based When a right neglect patient is shown a dumbbell that rotates, the patient continues to neglect the object that had been on the right, even though It is now on the left (Behrmann & Tipper, 1999).

Simultanagnosic (Balint Syndrome) patients only attend to one object at a time Simultanagnosic patients cannot judge the relative length of two lines, but they can tell that a figure made by connecting the ends of the lines is not a rectangle but a trapezoid (Holmes & Horax, 1919).

Balint patients can only attend to one object at a time even if they are overlapping Luria, 1959

The End (for now)

Multiple Object Tracking • One of the clearest cases illustrating object-based attention is Multiple Object Tracking • Keeping track of individual objects in a scene requires a mechanism for individuating, selecting, accessingand tracking the identityof individuals over time • These are the functions we have proposed are carried out by the mechanism of visual indexes (FINSTs) • We have been using a variety of methods for studying visual indexing, including subitizing, subset selection for search, and Multiple Object Tracking (MOT).

Multiple Object Tracking • In a typical experiment, 8 simple identical objects are presented on a screen and 4 of them are briefly distinguished in some visual manner – usually by flashing them on and off. • After these 4 “targets” have been briefly identified, all objects resume their identical appearance and move randomly. The subjects’ task is to keep track of which ones had earlier been designated as targets. • After a period of 5-10 seconds the motion stops and subjects must indicate, using a mouse, which objects were the targets. • People are very good at this task (80%-98% correct). The question is: How do they do it?

Keep track of the objects that flash

What is focal attention for? The What and Why of perceptual selection