Arithmetic Done by Brains and Machines: The Ersatz Brain Project

Arithmetic Done by Brains and Machines: The Ersatz Brain Project James A. Anderson James_Anderson@brown.edu Department of Cognitive and Linguistic Sciences Brown University, Providence, RI 02912 Our Goal: We want to build a first-rate, second-rate brain.

Ersatz Participants Faculty: Jim Anderson, Cognitive Science. Gerry Guralnik, Physics. David Sheinberg, Neuroscience. Students: Socrates Dimitriadis, Cognitive Science. Brian Merritt, Cognitive Science. Private Industry: Paul Allopenna, Aptima, Inc. Andrew Duchon, Aptima, Inc. John Santini, Alion, Inc.

Acknowledgements This work was supported by: A seed money grant from the Office of the Vice President for Research, Brown University. Phase I and Phase II SBIRs, “The Ersatz Brain Project,” to Aptima, Inc. (Woburn MA), Dr. Paul Allopenna, Project Manager. Funding from the Air Force Research Laboratory, Rome, NY

Comparison of Silicon Computers and Carbon Computer Digital computers are • Made from silicon • Accurate (essentially no errors) • Fast (nanoseconds) • Execute long chains of logicaloperations (billions) • Often irritating (because they don’t think like us).

Comparison of Silicon Computers and Carbon Computer Brains are • Made from carbon • Inaccurate (low precision, noisy) • Slow (milliseconds, 106 times slower) • Execute short chains of parallel alogicalassociative operations (perhaps 10 operations/second) • Yet largely understandable (because they think like us).

Comparison of Silicon Computers and Carbon Computer • Huge disadvantage for carbon: more than 1012in the product of speed and power. • But we still do better than them in many perceptual skills: speech recognition, object recognition, face recognition, information integration, motor control. • One implication: Cognitive “software” uses only a few but very powerful elementary operations.

Major Point Brains and computers are very different in their underlying hardware, leading to major differences in software. Computers, as the result of 60 years of evolution, are great at modeling physics. They are not great (after 50 years trying and largely failing) at modeling human cognition. One possible reason: inappropriate hardware leads to inappropriate software. Maybe we need something completely different: new software, new hardware, new basic operations, even new ideas about computation.

So Why Build a Brain-Like Computer? 1. Engineering. Computers are all special purpose devices. Many of the most important practical computer applications of the next few decades will be cognitive in nature: ·Natural language processing. ·Internet search. ·Cognitive data mining. ·Decent human-computer interfaces. ·Text understanding. We claim it will be necessary to have a cortex-like architecture (either software or hardware) to run these applications efficiently.

2. Science: Such a system, even in simulation, becomes a powerful research tool. It leads to designing software with a particular structure to match the brain-like computer. If we capture any of the essence of the cortex, writing good programs will give insight into biology and cognitive science. If we can write good software for a vaguely brain like computer we may show we really understand something important about the brain.

3. Personal: It would be the ultimate cool gadget. A technological vision: In 2057 the personal computer you buy in Wal-Mart will have two CPU’s with very different architectures: First, a traditional von Neumann machine that runs spreadsheets, does word processing, keeps your calendar straight, etc. etc. What they do now. Second, a brain-like chip ·To handle the interface with the von Neumann machine, ·Give you the data that you need from the Web or your files (but didn’t think to ask for). ·Be your silicon friend, guide, and confidant (Because you understand each other.)

Ersatz Basic Assumptions

The Ersatz Brain Approximation: The Network of Networks. Conventional wisdom says neurons are the basic computational units of the brain. The Ersatz Brain Project is based on a different approximation. The Network of Networks model was developed in collaboration with Jeff Sutton then at Harvard Medical School, now at NSBRI. Cerebral cortex contains intermediate level structure, between neurons and an entire cortical region. Intermediate level brain structures are hard to study experimentally because they require recording from many cells simultaneously.

Network of Networks Approximation We use the Network of Networks[NofN]approximation to structure the hardware and to reduce the number of connections. We assume the basic computing units are not neurons, but small (104 neurons) attractor networks. Basic Network of Networks Hardware Architecture: • 2 Dimensional array of modules • Locally connected to neighbors

Cortical Columns: Minicolumns “The basic unit of cortical operation is the minicolumn … It contains of the order of 80-100 neurons except in the primate striate cortex, where the number is more than doubled. The minicolumn measures of the order of 40-50 m in transverse diameter, separated from adjacent minicolumns by vertical, cell-sparse zones … The minicolumn is produced by the iterative division of a small number of progenitor cells in the neuroepithelium.” (Mountcastle, p. 2) VB Mountcastle (2003). Introduction [to a special issue of Cerebral Cortex on columns]. Cerebral Cortex, 13, 2-4. Figure: Nissl stain of cortex in planum temporale.

Columns: Functional Groupings of minicolumns seem to form the physiologically observed functional columns. Best known example is orientation columns in V1. They are significantly bigger than minicolumns, typically around 0.3-0.5 mm. Mountcastle’s summation: “Cortical columns are formed by the binding together of many minicolumns by common input and short range horizontal connections. … The number of minicolumns per column varies … between 50 and 80. Long range intracortical projections link columns with similar functional properties.” (p. 3) Cells in a column ~ (80)(100) = 8000

Elementary Modules The activity of the non-linear attractor networks (modules) is dominated by their attractor states. Attractor states may be built in or acquired through learning. We approximate the activity of a module as a weighted sum of attractor states.That is: an adequate set of basis functions. Activity of Module: x = Σciai where the ai are the attractor states.

The Single Module: BSB The attractor network we use for the individual modules is the BSB network (Anderson, 1993). It can be analyzed using the eigenvectors and eigenvalues of its local connections.

Interactions between Modules Interactions between modules are described by state interaction matrices, M. The state interaction matrix elements give the contribution of an attractor state in one module to the amplitude of an attractor state in a connected module. In the BSB linear region x(t+1) = ΣMisi + f + x(t) weighted sum input ongoing from other modules activity

The Linear-Nonlinear Transition The first BSB processing stage is linear and sums influences from other modules. The second processing stage is nonlinear. This linear to nonlinear transition is a powerful computational tool for cognitive applications. It describes the processing path taken by many cognitive processes. A generalization from cognitive science: Sensory inputs  (categories, concepts, words) Cognitive processing moves from continuous values to discrete entities.

Sparse Connectivity The brain is sparsely connected. (Unlike most neural nets.) A neuron in cortex may have on the order of 100,000 synapses. There are more than 1010neurons in the brain. Fractional connectivity is very low: 0.001%. Implications: • Connections are expensive biologically since they take up space, use energy, and are hard to wire up correctly. • Connections are valuable. • The pattern of connection is under tight control. • Short local connections are cheaper than long ones. Our approximation makes extensive use of local connections for computation.

Biological Evidence

Biological Evidence:Columnar Organization in Inferotemporal Cortex Tanaka (2003) suggests a columnar organization of different response classes in primate inferotemporal cortex. There seems to be some internal structure in these regions: for example, spatial representation of orientation of the image in the column.

IT Response Clusters: Imaging Tanaka (2003) used intrinsic visual imaging of cortex. Train video camera on exposed cortex, cell activity can be picked up. At least a factor of ten higher resolution than fMRI. Size of response is around the size of functional columns seen elsewhere: 300-400 microns.

Columns: Inferotemporal Cortex Responses of a region of IT to complex images involve discrete columns. The response to a picture of a fire extinguisher shows how regions of activity are determined. Boundaries are where the activity falls by a half. Note: some spots are roughly equally spaced.

Active IT Regions for a Complex Stimulus Note the large number of roughly equally distant spots (2 mm) for a familiar complex image.

Back-of-the-Envelope Engineering Considerations

Engineering Hardware Considerations We feel that there is a size, connectivity, and computational power sweet spot at the level of the parameters of the network of network model. If an elementary attractor network has 104 actual neurons, that network might have 50 attractor states. Each elementary network might connect to 50 others through state connection matrices. A brain-sized system might consist of 106 elementary units with about 1011 (0.1-1 terabyte) numbers specifying the connections. If 100 to 1000 elementary units on a chip gives a total of 1,000 to 10,000 chips in a cortex sized system. Well within the upper bounds of current technology.

Modules(Ersatz Processing Units:EPUs) Function of EPU Modules: • Simulate local integration: Addition of inputs from outside, from other modules. • Simulate local network dynamics. • Communications Controller: Handle long range (i.e. not neighboring) interactions. Simpler approximations are possible: • “Cellular automaton”. (Ignore local dynamics.) • Approximations to local dynamics.

Physical (Hardware) Module We assume only local connections for the physical hardware. Reason: Flexible, easy to build, easy to work with.

Software Based Connectivity Cortical data suggests more connections than just nearest neighbors exist. Simulate these with EPU module software, in the the Communications Controller.

Implications Interesting bonus from this structure: • Information transmission both local and long range can be slow. • It will take multiple steps (a long time) to move data to distant modules. • But: This is a feature, not a bug!

Implications Forces us to pay attention to the Temporal aspects of module behavior • Communication times • Module temporal dynamics • Note: The details of spatial arrangement of data affects communication times. Consistent with cortical neuroscience Implication: We can “program” the array by manipulating these “analog” properties to control array behavior.

Ersatz Programming Peculiarities How do you make this “computer” compute? Not with logic! It is like a hybrid analog-digital computer. Programming Techniques: • Spatial arrangement of data on array • Integration of data from multiple sources • Abstraction and discrete concept formation • Control of computation using (analog) dynamical system parameters • Assemblies of interacting modules. Give one example: performance of arithmetic by a simple Ersatz-like system.

Ersatz Arithmetic

Cognitive Computation: Example - Arithmetic • Brains and computers are very different in the way they do things, largely because the underlying hardware is so different. • Consider a computational task that humans and computers do frequently, but by different means: • Learning simple arithmetic facts

Learning the “Right Thing” Cognition is not memory for facts (like computer data) but remembering the “right things” even if the right things are constructed from many experiences and don’t actually exist! Most (99.9%) sensory input data is discarded. (The essential process of “creative data destruction.”) What is kept are useful abstractions and transformation of the inputs.

Arithmetic Digital computers compute the answers to problem using well-known logic based algorithms. Humans do it very differently. The human algorithm for elementary multiplication facts seems to look like: • Find a number that is the answer to somemultiplication problem and 2. A product number that is about the right size. This is a process involving memory and estimation, not computation as traditionally understood. Next, develop advantages and disadvantages of doing it this way.

A Problem with Arithmetic • We often congratulate ourselves on the powers of the human mind. • But why does this amazing structure have such trouble learning elementary arithmetic? • Adults doing arithmetic are slow and make many errors. • Learning the times tables takes children several years and they find it hard.

Brain Software: John von Neumann Von Neumann: 1958, The Computer and the Brain The nervous system is a complex machine which manages to do its exceedingly complex work on a rather low level of precision. Von Neumann, as a numerical analyst, knew that errors would rapidly grow and the result would be meaningless if there were more than a few steps in the computation.

Computational Strategy Ways to avoid problem: • Use a small number of steps • Use discrete (“logic-like”) operations rather than hard (“analog”) operations. Engineering rule: Digital is easy, analog is hard. Von Neumann: … Whatever language the central nervous system is using is characterized by less logical and arithmetical depth than we are normally used to. A small number of powerful operations are strung together to form a mental computation.

Teaching of Mathematics Collaborators: Prof. Kathryn Spoehr, Dr. Susan Viscuso, and Dr. David Bennett My own interest goes back to a joint paper with Prof. Phil Davis of Brown Applied Mathematics. Point of the paper: The “Theorem-Proof” method of teaching mathematics has ruined mathematics in the 20th Century.

Reason for Ruination Real mathematicians do not think this way. Mathematicians use a complex blend of intuition, perception, and memory to understand complex systems. Proving theorems is the last stage, to convince others that you are correct. Effects very hard on consumers of mathematics: Engineers and scientists. They say, “I don’t think like this.” and lose confidence in their intuitions.

Why is Arithmetic so Hard? People are much worse than they should be at elementary arithmetic. Elementary arithmetic fact learning involves making the right associative links between pairs of the 10 digits to give products, sums, etc. Only a few hundred facts to learn ... Arithmetic rules are orders of magnitude less complicated than syntax in language. But: Takes years for children to learn arithmetic.

The Problem with Arithmetic At the same time children are having trouble learning arithmetic they are knowledge sponges learning • Several new words a day. • Social customs. • Many facts in other areas.

Association In structure, arithmetic facts are simple associations. Example: multiplication: (Multiplicand)(Multiplicand)  Product Simple association (S-R learning) was popular idea in the 1920’s (Thorndyke). Formation of arbitrary associations is the basic rationale behind flash cards. Can learn this way, but hard and not really with “understanding.”

Multiplication • Arithmetic facts are not arbitrary associations. • They have an ambiguous structure that gives rise to associative interference. 4 x 3 = 12 4 x 4 = 16 4 x 5 = 20 • Initial ‘4’ has associations with many possible products. • Ambiguity causes difficulties for simple associative systems.

Number Magnitude • One way to cope with ambiguity is to embed the fact in a larger context. • Numbers are much more than arbitrary abstract patterns. • Experiment: • Which is greater? 17 or 85 • Which is greater? 73 or 74

Response Time Data

Number Magnitude It takes much longer to compare 74 and 73. When a “distance” intrudes into what should be an abstract relationship it is called a symbolic distance effect. A computer would be unlikely to show such an effect. (Subtract numbers, look at sign.)

Magnitude Coding Key observation: We see a similar effect when sensory magnitudes are being compared. Deciding which of • two weights is heavier, • two lights is brighter, • two sounds is louder • two numbers is bigger displays the same reaction time pattern.

Arithmetic Done by Brains and Machines: The Ersatz Brain Project