Solving the Mind-World Connection: Understanding Perception

How does the mind connect with the world and how does perception pick out unique individual things (tokens)

Some topics we will cover and some terminology… How perception connects with the world • Coordinating “noticings” over time – one form of the Correspondence Problem (arises because perceptual representations are built incrementally over time) • Coordinating across modalities (esp. vision and control) • Coordinating conjunctions of properties – the Binding Problem (also the many properties problem or the qualitative bundling problem) • All these are instances of a very general problem: The inadequacy of “satisfaction” as the sole relation between representations and what they represent – in John Perry’s terms, there is an ineliminable need for a special sort of picking out or demonstrative reference

Some background ….

Setting out the problem • The basic assumption of cognitive science is that in order to explain/predict people’s behavior we need to appeal to what people believe and desire and to how they perceive the world around them – to the content of their mental representations, as well as to how they draw inferences from these representations. • While these sorts of contents are necessary, they are not sufficient. We also need to appeal to a special sort of nonconceptual content that is related to the world not by the semantic relation of satisfaction, such as holds between a description and what it describes, but by a nonconceptual relation, such as holds between a demonstrative like this or that and its referent or between a name and its referent. Such a relation simply picks out the referent, but does not describe it nor refer to it under some conceptual category.

Setting out the problem • The mind-world relation I will be discussing involves picking out individuals without using an encoding of any of their properties and without representing the individuals as falling under some conceptual category – it is therefore a nonconceptual relation. <Are these individuals what have been referred to as Objects?> • I will be describing empirical evidence for the existence of a mechanism, called a Visual Index or FINST, that instantiates this relation. But first: Why do we need such a relation and why do we need nonconceptual contents?

An example from personal experience • Back in the 1970’s a computer science colleague and I set ourselves the overly-ambitious goal of developing a computer system that would reason about geometry by actually drawing a diagram and noticing adventitious properties of the diagram from which it would conjecture lemmas to prove • We wanted the system to be as psychologically realistic as possible so we assumed that it had a narrow field of view and noticed only limited, spatially-restricted information as it examined the drawing • This immediately raised the problem of coordinating noticings and led us to the idea of visual indexes to keep track of previously encoded parts of the diagram.

Begin by drawing a line….

Now draw a second line….

And draw a third line….

Notice what you have so far….(noticings are local – you encode what you attend to) There is an intersection of two lines… But which of the two lines you drew are they? There is no way to indicate which individual things are seen again without a way to refer to individual (token) things

Look around some more to see what is there …. Here is another intersection of two lines… Is it the same intersection as the one seen earlier? Without a special way to keep track of individuals the only way to tell would be to encode unique properties of each of the lines. Which properties should you encode?

Can we keep track of previous ‘noticings’ by encodingunique properties of individual items? No description can pick out a unique individual when things in a scene are changing or when the representation itself is changing for any reason (how about rapid updating?) But a visual representation is always changing since it is always built up over time as properties are ‘noticed’ Whether or not anything is changing, we need a way to refer to an individual qua individual (as in “it’s a bird, it’s a plane, no it’s superman!”) One common way of doing this is by using direction of gaze (equivalent to the deictic reference “what I am looking at now”), but we can also pick out individuals independent of where we are looking, by using focal attention. An observer can also pick out several individual tokens even if they are in a field of identical tokens – e.g., pick out a dot in a uniform field of identical dots.

Keeping track by encodingunique properties of individual items will not work in general We need a mind-to-world connection that is more like that provided by a demonstrative or proper name than like that provided by a (conceptual) description But unlike proper names, this mechanism is only available while the referent is in view and, Unlike demonstratives in language, the mechanism is part of the wired-in architecture and does not depend on the intentions of a user. (It is primarily data-driven) This function is very like that of a pointer or local variable in a computer program – it allows access without explicitly encoding any of the referents’ properties and may only be available inside the scope of an active function (at “run time”). (But this variable-binding is interrupt-driven as in production systems)

Descriptions and Visual Demonstratives bear a very different relation to their referents • The sort of relation that a demonstrative bears to its referent is indispensable if thoughts are to connect with actions • John Perry has written about the indispensable nature of all indexicals*, but the case of what I have been calling visual demonstratives is even more compelling. *Perry, J. (1979). The problem of the essential indexical. Noûs, 13, 3-21.

The difference between a description and a demonstrative (or direct) reference, and the indispensability of the latter, is illustrated by this example from John Perry’s Essential Indexical. • The author of the book Hiker’s Guide to the Desolation Wilderness stands in the wilderness beside Gilmore Lake, looking at the Mt. Tallac trail as it leaves the lake and climbs the mountain. He desires to leave the wilderness. He believes that the best way out from Gilmore Lake is to follow the Mt. Tallac trail up the mountain … But he doesn’t move. He is lost. He is not sure whether he is standing beside Gilmore Lake, looking at Mt. Tallac, or beside Clyde Lake, looking at the Maggie peaks. Then he begins to move along the Mt. Tallac trail. If asked, he would have to explain the crucial change in his beliefs in this way: “I came to believe that this is the Mt. Tallac trail and that is Gilmore Lake”. (Perry, 1979, p 4) • The person in this story recognized the identity of something that was being referred to in two different ways – by a description and by direct selection, expressed by the demonstrative “this”. These are two very different ways of picking something out.

Another example of why descriptions will not work in general and why you need demonstrative reference

Footnote about the geometry example: • Notice that in our geometry example, it would not eliminate the need for a nonconceptual index if you labeled parts of the diagram as you drew them. Why not? • Because to refer to the line with label L1 you would have to be able to think “This is line L1” and you could not think that unless you had a mechanism for picking out this. • Being able to think “this” is another way to view the very problem for which indexes are postulated. You still need a mechanism for picking out and of referring to an individual element qua individual,even if it is labeled! • That is the point of John Perry’s claim about the “essential indexical” : In order to act on what you see, you need to bridge the gap from a reference (description or name) to an individual token thing, and this bridge is not conceptual.

Different types of mind-world relations Two distinct types of mind-world connections • The nonconceptual connection: cause (selection) • The semantic connection: satisfaction (reference) The problem of how we make the transition from physical cause to meaning/reference is one of the great mysteries of mind (Brentano’s Problem). • I address a (very small) issue related to that problem by suggesting that perception must be able to preconceptually pick out individuals (i.e., without using concepts) and that the mechanism for doing this, the Visual Index or FINST provides a first step in the mind-world relation.

Why do we need to be able to pick out individuals without concepts? • We need to make nonconceptual contact with the world through perception in order to stop the regress of concepts being defined in terms of other concepts which are defined in terms of still other concepts …This is known as the Grounding Problem. (For more on this see Fodor’s 1998 book Conceptsor his paper Revenge of the Given). • The question of where to stop has received different answers by different philosophical schools. But sense data (sensory transduction) by itself will not work because most concepts cannot be reduced to sense data since they are not about how things look. • Our candidate is individuals as the forerunner of conceptualization and predication and Picking Out as the basic operation to bring these individuals into contact with cognition. • Is individual = object?

The requirements for picking out individual things and keeping track of them reminded me of an early comic book character called “Plastic Man”

Imagine being able to place several of your fingers on things in the world without being able to detect their properties in this way, but being able to refer to those things so you could move your gaze or attention to them. If you could you would possess FINgers of INSTantiation (FINSTs)!

FINST Theory postulates a limited number of pointers in early vision that are elicited by causal events in the visual field and that enable vision to refer to things without doing so under concept or a description

Demonstrating FINSTs withMultiple Object Tracking (MOT) MOT has now been used in dozens of laboratories in many countries and in many different variants. A great deal is know about the conditions under which tracking is possible and many counterintuitive findings have been demonstrated, many of which raise issues of interest to philosophy – but most of these have to be left for another occasion Time!

Demonstrating the function of FINSTs withMultiple Object Tracking (MOT) • In a typical experiment, 8 simple identical objects are presented on a screen and 4 of them are briefly distinguished in some visual manner – usually by flashing them on and off. • After these 4 targets are briefly identified, all objects resume their identical appearance and move randomly. The observers’ task is to keep track of the ones designated as targets. • After a period of 5-10 seconds the motion stops and subjects must indicate, using a mouse, which objects were the targets. • People are very good at this task (85%-98% correct). The question is: How do they do it?

Keep track of the objects that flash

How do we do it? What properties of individual objects do we use?

Another example: Self occlusion

Self occlusion dues not seriously impair tracking: This has made it easier to design certain experiments where the trajectory patterns need to be independent

Going behind occluding surfaces does not disrupt tracking

Not all well-defined features can be tracked:Track endpoints of these linesEndpoints move exactly as the squares did!

Analyzing Multiple Object Tracking • Basic finding: Most people (even many 5 year old children) can track at least 4 individual objects that have no unique visual properties • How is it done? • We have shown that it is unlikely that the tracking is done by keeping a record of the targets’ locations (the only unique instantaneous target property) and updating it while serially visiting the objects • We proposed that tracking uses the primitive mechanism of Visual Indexes or FINSTs

Summarizing FINSTs • A FINST is a primitive reference mechanism that normally refers to individual visible objects in the world. There are a small number (~4-5) FINSTs available at any one time. • Objects are picked out and referred to without using any encoding of their properties, including their location. ۞Picking out objectsis prior to encoding any properties! • Indexing is nonconceptual because it does not represent an individual as amember of some conceptual category. • An important function of FINST indexes is to bind arguments of visual predicates to things in the world to which they refer. Only predicates with bound arguments can be evaluated. Since predicates are quintessential concepts, an index serves as a bridge from nonconceptual to conceptual representations. • Similarly they can bind arguments of motor commands, including the command to move focal attention or gaze to the indexed object: e.g., MoveGaze(x)

FINSTs are a mechanism for picking out individual distal elementsdirectly, as token sensory individuals, rather than as bearers of some known properties Examples where such a mechanism is needed: • Incremental construction of visual representations – the correspondence problem over time (geometry example) • We can pick out several individuals in a field of identical elements – attentional selection is different from discrimination

Being able to pick out individual distal elementsdirectly is essential for many visual functions Other examples where such a mechanism is needed: • Encoding relational predicates; e.g., Collinear (x,y,z,..); Closed (C); Inside (x, C); Above (x,y); Square (w,x,y,z), requires simultaneously binding the arguments of n-place predicates to n elements in the visual scene • Evaluating such visual predicates requires individuating and referring to the objects over which the predicate is evaluated: i.e., the arguments in the predicate must be bound to individual elements in the scene.

Pick out 3 dots and keep track of them • In a field of identical elements you can select a number of them and move your attention among them (e.g., “move one up” or Move 2 right” etc) so long as at no time do you have to hold on to more than 4 dots

Picking out is different from discriminating:Pick out the third contour from the left

Several objects must be picked out at once in making relational judgments When we judge that certain objects are collinear, we must pick out the relevant individual objects first

Several objects must be picked out at once in making relational judgments • The same is true for other relational judgments like inside or on-the-same-contour… etc. We must pick out the relevant individual objects first. Respond: Inside-same contour? On-same contour?

More functions of FINSTsFurther experimental explorationsusing different paradigms • Recognizing the cardinality of small sets of things without using sortals: Subitizing vs counting • Selecting subsets – selecting items to search through • Selecting subsets and holding on to them during a saccade • Application of FINST index theory to infant cardinality studies (Leslie, Carey, Spelke, etc) and to the acquisition of words/names by ostensive definitions. These will not be discussed here.

Subitizing vs CountingHow many squares are there? Subitizing indexed objects is fast, accurate and (relatively) independent of how many items there are. But a prerequisite for subitizing is being able to pick out the relevant individuals. Only the squares on the right can be subitized because picking out concentric items requires serial attention. Concentric squares cannot be subitized because individuating them requires the serial operation of curve tracing

Signature subitizing phenomena only appear when objects are automatically individuated and indexed Counting slope subitizing slope Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A limited capacity preattentive stage in vision. Psychological Review, 101(1), 80-102.

Example of the operation of Visual Indexes: Subset selection for search Burkell, J., & Pylyshyn, Z. W. (1997). Searching through subsets: A test of the visual indexing hypothesis. Spatial Vision, 11(2), 225-258.

Subset search results: • Only properties of the subset matter – but note that properties of the entire subset are taken into account simultaneously (since that is what distinguishes a feature search from a conjunction search) • If the subset is a single-feature search it is fast and the slope (RT vs number of items) is shallow • If the subset is a conjunction search set, it takes longer and is more sensitive to the set size • The distance among the targets does not matter, so observers don’t seem to be scanning the display looking for the target

The stability of the visual world entails the capacity to reidentify individuals after a saccade • There is no problem about how tactile selection can provide a stable world when you move around while keeping your fingers on the same objects – because in that case retaining individual identity is automatic • But with FINSTs the same can be true of vision – for a small number of visual objects • This is compatible with the fact that it appears one retains the relative location of only about 4 elements during saccadic eye movements (Irwin, 1996)[Irwin, D. E. (1996). Integrating information across saccadic eye movements. Current Directions in Psychological Science, 5(3), 94-100.]

The selective search experiment with a saccade induced between the late onset cues and start of search Even with a saccade between selection and access, items can be accessed efficiently

Must we encode location when we detect the presence of a property? • Many researchers claim that detecting a feature entails detecting it as being at some particular location. The assumption is that this location information is used to detect conjunctions of properties (Nissen, 1985). This is implicit in Treisman’s Feature-Integration Theory. • Discussions (by psychologists and by philosophers) of the question how vision primitively selects things in the world typically confound individuals and locations • Experiments mostly use static items which confounds location and individuality. When moving items are used (as in MOT) the individual-object option usually wins over the location option – i.e., we detect a property as belonging to an object rather than as being at a particular location. We have also demonstrated this using generalized objects that move through a property space without changing location.

The view that we must encode location when we detect a property is also the standard view in philosophy • Austen Clark (in ‘A Theory of Sentience’), following the tradition of Quine and Strawson, also assumes that location is primary and that in our most primitive nonconceptual sensory contact with the world, which he calls the “level of sentience,” the only resources available are those of what Strawson called a “feature-placing language.” Our sensory system detects the presence of “Feature F at location L” • Clark argues that because we can distinguish conjunctions – e.g., we can distinguish a red square beside a blue circle from a blue square beside a red circle – then the earliest stages of sensation must provide this information in a way that does not merge properties and their locations, hence feature-at-location. • But we can do the same with objects: we can evaluate and record “Pn(Oi)” for some sensory predicate Pn so long as the variable Oi is bound to the object i by an index.

Superimposed Gabor patches Blaser, E., Pylyshyn, Z. W., & Holcombe, A. O. (2000). Tracking an object through feature-space. Nature, 408(Nov 9), 196-199.

Changing feature dimensions

Surfaces in feature-space Trajectories: * pseudo-random and independent * frequent changes in speed and direction * Gabors frequently "pass" each other along a dimension(s)

Solving the Mind-World Connection: Understanding Perception