Last section of lectures on visual indexes (aka FINST theory)

Last section of lectures on visual indexes(aka FINST theory)

Summarizing FINST Theory • A FINST is a primitive reference mechanism that normally references individual visible objects in the world. There are a small number (~4-5) FINSTs available at any one time. • Objects are picked out and referred to without using any encoding of their properties, including their location. Picked out referentially, not attributively. • Referring to objects (or more accurately, being grabbed by objects) is prior to encoding any of their properties! • Indexing is nonconceptual because it does not represent (select, refer to) an individual as a member of some conceptual category.

Summarizing FINST Theory • An important function of FINST indexes is to bind arguments of visual predicates to things in the world. Only predicates with bound arguments can be evaluated. Since predicates are quintessential concepts, an index serves as a bridge from nonconceptual to conceptual representations. • Similarly FINSTs can bind arguments of motor commands, including the command to move focal attention or gaze to the indexed object: • MoveGaze(x) might be a primitive perceptual-motor operation NOTE: Not MoveGaze (x, y, z) which gives spatial coordinates of the gaze target

Information (causal) link FINST Demonstrative reference link FINSTs and Object Files are the basic mechanisms that link the world and its conceptualization The only thing in this picturethat is conceptual is what’s in the Object Files (unless you count a reference as conceptual) Object File contents are conceptual!

A note on terminology • A FINST provides a reference to an individual visible ‘thing’ • I sometimes call this referent a FING by analogy with FINST and sometimes an object to conform with usage in psychology • A FINST does not pick out or refer to something as an object, because OBJECT is a concept. So FINGs are nonconceptual. Maybe proto object ? • I have also called it a pointer, but that erroneously suggests that it points to the location of an object, as opposed to the object itself. In a computer, a pointer is the name of a stored datum. • I have said that a FINST is a visual demonstrative like ‘this’ or ‘that’, but this too is misleading because the reference of a demonstrative depends on the context and intentions of the speaker • I have also noted that a FINST is like a proper name but that won’t do either since a name can pick out something not in sensory contact whereas a FINST can only refer to a visible item (or one that has been only briefly out of sight).

Illustrating FINST Theory: Multiple Object Tracking

Some Multiple Object Tracking Findings • Basic finding: Most people can track at least 4 targets that move randomly among identical non-target objects (even 5 year old children can track 3 objects) • We have now accumulated dozens of results that I will list later as they have implications for FINST theory. • How is it done? A first approximation: • Early on we showed that it is unlikely that the tracking is done by keeping a record of the targets’ locations and updating them by serially visiting the objects (Pylyshyn & Storm, 1998) • Other strategies have been proposed (e.g., tracking a single deforming pattern), but they do not explain tracking (see later) • Hypothesis: FINST Indexes get assigned to targets when they flash. At the end of the trial these indexes can be used to move attention to the targets and hence to select them & respond

A way of viewing what goes on in MOT • According to Kahneman & Treisman’s Object File theory, the appearance of a new visual object causes a new Object File to be created. Each object file is associated with its respective object – presumably through a FINST Index. • The object file may contain information about the object to which it is attached. But according to FINST Theory, keeping track of the object’s identity does not require the use of this information. The evidence suggests that in MOT, little or nothing may be stored in the object file except in some special cases (e.g., when the object suddenly changes or disappears). • What makes something the same object over time is that it remains connected to the same object-file by the same FINST. Thus, for vision to treat something as the same enduring individual does not require appeal to properties or concepts.

What role do visual properties play in MOT? • Certain properties may have to be present in order for an object to be indexed, and certain properties (probably different properties) may be required in order for the index to keep track of the object, but this does not mean that such properties are encoded, stored, or used in tracking. • Compare this with Kripke’s distinction between properties that fix the referent of a proper name and the property that the name refers to. The former only plays a role at the name’s initial “baptism.” • Is there something special about location? Do we record and track properties-at-locations? • Location in time & space may be essential for individuating objects, but locations need not be encoded or made cognitively available • The fact that an object is actually at some location or other does not mean that it is represented as such. Representing property ‘P’ (where P happens to be at location L) ≠ Representing property ‘P-at-L’.

Why is this relevant to foundational questions in the philosophy of mind? • According to Quine, Strawson, and most philosophers, you cannot pick out or track individuals without concepts (sortals) • But you also cannot pick out individuals with only concepts • Sooner or later you have to pick out individuals using non-conceptual causal connections between thoughts and things • The present proposal is that FINSTs provide the needed non-conceptual mechanism for individuating objects and for tracking their identity, which works most of the time in our kind of world. It relies on natural constraints(Marr). • FINST indexes provide the right sort of connection for predicating properties of the world by allowing the arguments of predicates to be bound to objects prior to the predicates being evaluated. So they may be the basis for early vocabulary learning.

But there must be some properties that cause indexes to be grabbed! • Of course there are properties that are causally responsible for objects to be individuated, and for indexes to be grabbed. There must also be properties (probably different ones) that make it possible for objects to be tracked; • But a core assumption of FINST theory is that these properties need not be symbolically represented and, even if they are represented, they are not used in tracking. There is an important distinction between object properties that cause individuation and index assignment and those that are represented, placed in Object Files and available to cognition.

Distinguish properties that play a causal role in individuating and indexing objects from properties that are encoded and made available to cognition • Individuating objects is a process that occurs in the encapsulated (modular) stage of vision (“early vision”) • These processes compute over such properties as the distance between features in order to cluster them • These processes also need to have access to other properties in order to compute correspondences • For example, apparent motion is computed within early vision and it must operate over vector distances between objects in order to select the right correspondence match (Dawson & Pylyshyn, 1988) • In MOT, objects are tracked even when the disappear behind occluding surfaces, and tracking works best when they reappear at the same place where they disappeared, so early vision must have a “local memory” (cf local variables in computer functions)

Role of target properties in MOT: Evidence that they play little or no part in tracking • Changes of target properties are not recalled or even noticed during MOT (except location-of disappearance) • Keeping all targets at different color, size, or shape does not improve tracking. Synchronousvsasynchronous conditions. • Observers do not use target speed or direction in tracking (e.g., by anticipating where the targets will reappear after occlusion). • Observers do appear to retain the targets’ locations at the time they disappeared since if they reappear at the location where they disappeared, tracking is not impaired. Halt vs move • More recent study showed that altering reappearance location or direction by up to 4 diameters or 60 degrees did not impair tracking. <Demo1> <Demo2> <Demo3> <Demo4>

Some open questions • We have arrived at the view that only properties of selected (indexed) objects enter into subsequent conceptualization and perception-based thought (i.e., only information in object files is made available to cognition) • So what happens to the rest of the visual information? • Visual information seems rich and fine-grained while this theory only allows for the properties of 4 or 5 objects to be encoded! {more empirical support for the number 4 than 7±2} • The present view leaves no room for representations whose content corresponds to the content of conscious experience • According to the present view, the only nonconceptual content that representations have is the demonstrative content of indexes that refer to perceptual objects • Question: Why do we need any more than that? {Irwin, 1992}

An intriguing possibility..but problem with conscious contents Maybe the theoretically relevant information we take in is less than (or at least different from) what we experience • This possibility has received attention recently with the discovery of various “blindnesses” (e.g., change-blindness, inattentional blindness, blindsight…) as well as the discovery of independent-vision systems (e.g., recognition <ventral> and motor control <dorsal>) • The qualitative content of conscious experience may not play a role in explanations of cognitive processes {This is a profound mystery!} • The unconceptualized information that enters into causal process (e.g., motor control) may not be represented or made available to the cognitive mind it – not even as a nonconceptualrepresentation • For something to be a representation its content must figure in explanations – it must capture generalizations. It must have truth conditions and therefore allow for misrepresentation. It is an empirical question whether current proposals do (e.g., primal sketch, scenarios). cf Devitt: Pylyshyn’s Razor

Are we the final authority on our conscious contents? • What changes between flashed displays? • Airplane • Farm • Dinner • Is the failure due to scant information intake or inability to report? Or is it due to the unreliability of our awareness of our own conscious experience. • Dretske claims that we do see and are conscious of everything, but: • “Consciously seeing something does not require noticing it. It does not require awareness of what you are seeing or that you are seeing it.” • “Failure to recognize or notice visible differences—failures so dramatically demonstrated in change blindness—tell us absolutely nothing about what one is conscious of or the nature of conscious experience. [change blindness] teaches that we are not the ultimate authority on our own conscious states”. • Eric Schwitzgebel suggests (in Perplexities of Consciousness) • Most people before 1950 said they dreamed in black-and-white while most after 1960 said they dreamed in color. The author suggests that this is not because of a change in their recall due to color TV but rather “we simply don’t know whether or not we dream in color”! • The main point is simply that when it comes to conscious content there is more going on than we understand so we’d better remain agnostic.

Vision science has always been deeply ambivalent about role of conscious experience No: There is no a priori‘must explain’! • But isn’t how things appear one of the things that our theories must explain? • The content of subjective experience is a major source of evidence. But it may turn out not to be the most reliable source for inferring the relevant functional states. It competes with other types of evidence. The same is true of neuroscience. Every source makes assumptions. • How things appear cannot be taken at face value: it carries many theoretical assumptions and draws on many levels of representation • It was a serious obstacle to early theories of vision (Kepler and the inverted image) • It has been a poor guide in the case of theories of mental imagery (e.g., color mixing, image size, image distances). It is an illusion that we read properties off a mental image: Mental images do not have properties that can be seen!

(Reports of) conscious experience as data • It seems likely that vision science will use evidence of conscious experience the way linguistics uses evidence of grammatical intuitions – only as it is filtered through developing theories.

More issues surrounding visual indexing • Is the limit of 4-5 targets architectural? Does it arise from a more fundamental principle? • Maybe crowding? • Does it draw on a general attentional resource? • Can we distinguish between different stages of MOT such that one is cognitively impenetrable? • Are indexes just the usual well-studied visual focal attention, split into discrete beams? • If arguments in favor of attention-beams are inconclusive, does parsimony weigh in favor of its being attention?

Index capacity and its plasticity • Daphne Bavelier’s lab (Rochester) has shown that videogame players (VGPs) can track (2) more objects in MOT. • Non VGPs can also increase the number tracked after only 9 hrs of practice on certain kinds of (mostly violent) video games • José Rivest (York U) has shown that some athletes can track more targets than non-athletes • There are some conditions under which subjects can track more objects than the typical 4 (or 5 ± 1) • When objects move more slowly • When objects are kept further apart • When half of the objects are in the left visual hemifield and the others are in the right hemifield • When half are in one depth plane and the others are in a different depth plane (shown in stereo).

Index capacity and its plasticity • It has been claimed that the tracking limit is not architectural because it is affected by speed and other variables; but • We have shown that this is because increasing speed increases crowding averaged over time. When crowding is constant, speed is not a factor. • A number of recent studies have shown that the determiner of the number of targets that can be tracked is only the spacing between them: • Franconeri, S., Jonathan, S. J., & Scimeca, J. M. (2010). Tracking Multiple Objects Is Limited Only by Object Spacing, Not by Speed, Time, or Capacity. Psychological Science, 21(920-925). • Franconeri, S., Lin, J., Pylyshyn, Z., Fisher, B., & Enns, J. (2008). Evidence against a speed limit in multiple-object tracking. Psychonomic Bulletin & Review, 15(4), 802-808.

Tracking without keeping track of target labels • There are many other MOT findings that place constraints on an adequate theory. For example, we have found that if subjects are asked to both track the targets and also to recall identifying information initially associated with each target, they are very poor at the identifying task. e,g., CornersNumbers • We also showed that this was not due to there being two tasks rather than one by giving subjects the same two tasks but not involving the same target objects and they showed no decrement. • This has been cited (by Brian Scholl) as evidence for unmarked attention beams rather than indexes.

Stop here to consider possible ways that you tracked targets in the earlier demos

How must we track targets? Here is a formal specification of the MOT task: An object Xn(t) is a target at time t if, and only if, • Xn(t - t) was a target, where t may approach zero • Xn(0) was visibly a target • This says that an object is a target if and only if it was a target in the immediately preceding instant.

Another way to view the task requirements But there is another way to look at the task, based on the idea that we do not need to keep track of individual objects but only of sets of objects and we can then ‘flush’ the individual objects’ history. • So another task specification is that an object is a target if and only if it is a member of the set of targets,and a set is a target set if and only if it was a target set in the preceding instant. • But how do we determine that a set was a target set, without first determining that individual members of the set were targets? • In certain special cases this may be possible (e.g., if all targets had a particular property in common or were all located in a particular region of the display). But how do we do track sets in general? • This question is worth some special consideration because very many (smart) people believe it and because it reveals an import assumption that it shares with imagery theories.

A set-tracking explanation:We do not track individuals but only sets • Version 1: We track an entire set of objects by zooming attention to cover them all. • This alternative assumes that MOT involves no special mechanism; only the usual attention beam. • This attention proposal explains why keeping track of individual identities is poorer than keep track of whether objects are targets. The single beam is poor at recalling IDs because it has no way to distinguish individual objects! • But this does not explain how we track the individual objects, which we must do in order to keep targets distinct when the pass among the nontargets. As long as the set of targets and the set of nontargets are not spatially separated as a group a global attention will not help with MOT.

A set-tracking explanation • Version 2: We track a single distorting polygon made by imagining that targets are connected by a covering convex polygon (e.g., imagining an elastic band wrapped around the targets). • That way we track only 1 thing rather than 4. (Subjects told to do it that way are reported to do better – Yantis, 1992). • What’s the problem with that proposal? Can you keep track of where the polygon vertices will be without keeping track of where individual target objects will be? • Does this assume that the targets are obligated to remain connected by the ‘mental elastic band’ the way they would in the corresponding physical situation? Do you see a parallel with how mental images behave in a ‘functional space’?

Parallel with mental rotation of images • What must be assumed about the format or architecture of the representation in order for it to support mental rotation? • According to Prinz (2002) p 118,“If visual-image rotation uses a spatial medium of the kind Kosslyn envisions, then images must traverse intermediate positions when they rotate from one position to another. The propositional [i.e., symbolic] system can be designed to represent intermediate positions during rotation, but that is not obligatory.” • This is a very important observation. But it is incomplete: it needs to answer the question, What makes it obligatory that the object must pass through intermediate positions?

Another set-tracking explanation • The most common proposal is that we track by splitting focal attention into several indistinguishable ‘beams’ (see Scholl, 2007, Cavanagh & Alvarez, 2008). • But this approach abandons one of the main reasons for Indexes – to allow arguments of predicates or motor functions to be bound to individual distal objects so that we can, for example, evaluate Collinear(x,y,z) or Above(x,y) or Inside(x,W). This requires that indexes be distinguished. • This option also assumes that tracking draws from a common resource pool – general attention. This is supported by the observation that tracking can be disrupted by some attention-demanding secondary tasks. • To deal with these we need to distinguish different stages of MOT. • We will have to do that in any case because of the role of nontargets .

Tracking objects not defined by distinct spatial locations and spatial trajectories Blaser, E., Pylyshyn, Z. W., & Holcombe, A. O. (2000). Tracking an object through feature-space. Nature, 408(Nov 9), 196-199.

Exploring the fine structure of tracking • We have explored some spatiotemporal aspects of indexing and tracking by using probe dots. • The task requires that small probe dots be detected during MOT • A probe-dot detection task was used to ask whether targets are in any way more focally attended. • Notice that while FINST theory claims that indexes are different from attention, and so indexed objects should not be favored in tasks requiring attention, it also predicts that focal attention can be quickly moved to indexed objects if needed. • We found that probe-dot detection was not higher on targets than anywhere else. Rather detection is low on nontarget and equally high everywhere else. So nontargets alone appear to be inhibited (presumably to keep them from interfering with tracking). • This presents a problem: How can inhibition move with moving objects without the space through which they move being inhibited as well?

Probing targets and nontargets during tracking The suggestion that nontargets are tracked raises a problem for FINST theory. • Is it possible to track very many moving objects in a way that does not allow those objects to be accessed or identified as targets (since only 4-5 targets can be identified in MOT)? Is there some weaker type of tracking? • Might there be several operations involved in tracking where only some of them can be identified with referential access. • There is a hint of a possible answer in our earlier discussion of solving the correspondence problem. It seems that the tracking involved in solving the correspondence problem may not be numerically limited. (cf Kinetic Depth Effect)* • So maybe we can track many more than 4-5 objects but can only refer to 4 or 5 and thus can only bind 4 or to arguments of predicates or motor commands. Maybe?

Another frequently argued point Must an object be located (its location determined) in order to be indexed? • There is a terminological problem here. There is a sense in which the object is ‘located’ but its location is not stored in the object file and is not conceptualized and available to cognition. • Clearly an object’s location (or at least the location of its contributing parts) is used within the vision module by processes such as figure-ground differentiation, clustering, computing correspondence between objects, etc. But the only information that an index provides to the cognitive mind is the capacity to bind the indexed object to internal symbols.

How can an object be selected without first locating it? • One might still wonder how an index allows some perceptual judgment (e.g. “x above y”), to be made with reference to an indexed object without knowing where it is. • Once again there is a terminological problem here. Being able to refer to an object in a demonstrative manner does not require that the object first be represented as an object or as having some particular property. Once it is referenced, however, it may be assigned properties (properties may be represented by being entered into the object’s Object File). • But many people think that it is just logically or physically impossible to refer to something unless the thing is represented or unless one knows its location. Why?

The problem of referring and the problem of binding • The concept of reference is connected with deep problems in philosophy. What I care about is how mental symbols can connect with things in the world for certain particular purposes without having to appeal to their properties. • I take this to be the problem of how we can refer to objects without doing so “attributively”. But if this connection lacks some of the inherent properties of reference I would still be happy if it helps answer such questions as; how we can judge that x is above y or that w lies inside the contour z or that there are n objects in a display and that they are collinear. • For that, the worry about the function of indexes can be reduced to the question, can we pick out and bind certain objects to certain internal symbols, without first encoding the coordinates of those objects.

Describing what indexes do • When I talk to philosophers about indexes as a form of singular reference they do not like my use of the term “reference”. • John Campbell has suggested a different way of describing what Indexes do, without raising the problems of reference. … there are other ways of characterizing what [indexes] do. For example, it seems just as reasonable to describe them as 'epistemic instruments'. That is, they are part of a mechanism one has for finding out about the objects around, and for controlling action on the objects around. I don't (yet) see why the role of visual indexes couldn't be exhaustively described in functional-role terms. Since the role is to find out about and act on objects in the surroundings, that will require causal connections to the objects around. But to say that isn't yet to say why we should be talking about reference here rather than epistemic role. …suppose we prefer referential semantics (as indeed I do)… why should we single out visual indexes as referential devices, rather than as mere tools whose role is exhausted by their functional role?… it may be that all you mean by 'x refers to y' is: 'x is causally connected to y in a way helpful for finding out about and acting on y'.

Will Campbell’s suggestion do? • I’m not sure whether Campbell’s notion of “epistemic instrument” covers the functions I had in mind for indexes. For example. • I want indexes to bind symbols – the variable arguments in visual predicates and in motor commands – to things in the world, and to remain bound independently of any property changes the things may undergo. Is “binding” a form of epistemic role or is it the same as reference? • I want indexes to help solve the correspondence problem – to specify reliably when two visual tokens correspond to the same distal object (at two different times or on different retinas), at least in our kind of world. • I need indexes to distinguish individuals from their properties – a distinction that is central to recognizing objects and to solving the many-properties or binding problem (something that neural-nets can’t do)

Information links and the problem of referring, binding and locating • The minimum we need is an information link between mental symbols and distal objects. The closest natural account there is of such a connection is one provided (or at least suggested) by Fred Dretske – it is an informational account. • An informational link requires a reliable correlation that persists over a wide range of potential (counterfactual) circumstances. • The question of whether one can have an informational link between objects in the mind and objects in the world without first spatially localizing the object in the world then becomes clearer. There are plenty of everyday examples where there is an informational connection (an open information channel) without location encoding: • Magnet and iron objects • Tuning forks • Cell phones

What next? This picture leaves many unanswered questions, but it does provide a mechanism for solving the binding problem and also explaining how mental representations could have a nonconceptual (non-attributive) connection with objects in the world (something required if mental representations are to connect with actions)

Last section of lectures on visual indexes (aka FINST theory)