Minds and Machines

Minds and Machines Summer 2011 Monday, 8/1

As you’re working on your paper • Make sure to state your thesis and the structure of your argument in the very first paragraph. • Help the reader (me!) by including signposts of where you are in the argument. • Ask yourself what the point of each paragraph is and how it contributes to your argument. • Give reasons for your claims! Don’t make unsupported assertions.

Neural Networks • The brain can be thought of as a highly complex, non-linear and parallel computer whose structural constituents are neurons. There are billions of neurons in the brain. • The computational properties of neurons are the reason why we’re interested in neurons more than in any other, non-neuronal cells in the brain.

Neural Networks • Consider a simple recognition task, e.g. matching an image with a stored photograph. • To perform the task, a computer must compare the image with thousands of stored photographs. • At the end of all the comparisons, the computer may output the photograph that best matches the image. • If the photograph database is as large as the one in our memory, this may take several hours. • But our brain can do thisinstantly!

Neural Networks • A silicon chip can perform a computation in nanoseconds (10 to the power of -9 seconds). • But neuronal computations are done in miliseconds, which are 6 orders slower! • Yet it seems that our computational capability (processing speed) is enormously greater than that of the typical computer. • How is this possible?

Neural Networks • The answer seems to lie in the massively parallel structure of the brain, which includes trillions of interconnections between neurons.

Artificial Neural Networks • Inspired by the organization of the brain. • Like the brain, are composed of many simple processors linked in parallel. • In the brain, the simple processors are neurons and the connections are axons and synapses. • In connectionist theory, the simple processing elements (much simpler than neurons) are called units and the connections are numerically weighted links between these units. • Each unit takes inputs from a small group of neighbouring units and passes outputs to a small group of neighbors.

NETtalk • An artificial neural network that can be trained to pronounce English words. • Consists of about 300 units (neurons) arranged in three layers: an input layer, which reads the words, an output layer, which generates speech sounds, or phonemes, and a middle, ''hidden layer,'' which mediates between the other two. • The units are joined to one another with 18,000 synapses, adjustable connections whose strengths can be turned up or down.

NETtalk • At first volume controls are set at random and NetTalk is a structureless, homogenized tabula rasa. Provided with a list of words, it babbles incomprehensibly. But some of its guesses are better than others, and they are reinforced by adjusting the strengths of the synapses according to a set of learning rules. • After a half day of training, the pronunications become clearer and clearer until NetTalk can recognize some 1,000 words. In a week, it can learn 20,000.

NETtalk • NetTalk is not provided with any rules for how different letters are pronounced under different circumstances. (It has been argued that ''ghiti'' could be pronounced ''fish'' - ''gh'' from ''enough'' and ''ti'' from ''nation.'') • But once the system has evolved, it acts as though it knows the rules. They become implicitly coded in the network of connections, though no-one has any idea where the rules are located or what they look like. (On the surface, there’s just “numerical spaghetti”)

Back-Propagation • The network begins with a set of randomly selected connection weights. • It is then exposed to a large number of input patterns. • For each input pattern, some (initially incorrect) output is produced. • An automatic supervisory system monitors the output, compares it to the target output, and calculates small adjustments to the connection weights. • This is repeated until (often) the network solves the problem and yields the desired input-output profile.

Distributed Representation • A connectionist system’s knowledge base does not consist in a body of declarative statements written out in a formal notation. • Rather, it inheres in the set of connection weights and the unit architecture. • The information active during the processing of a specific input may be equated with the transient activation patterns of the hidden units. • An item of information has a distributed representation if it is expressed by the simultaneous activity of a number of units.

Superpositional Coding • Partially overlapping use of distributed resources, where the overlap is informationally significant. • For example, the activation pattern for a black panther may share some of the substructure of the activation pattern for a cat. • The public language words “cat” and “panther” display no such overlap.

“Free” Generalizations • A benefit of connectionist architecture. • Generalizations occur because a new input pattern, if it resembles the old one in some aspects, yields a response that’s rooted in that partial overlap.

Graceful Degradation • Another benefit of connectionist architecture. • The ability of the system to produce sensible responses given some systematic damage. • Such damage tolerance is possible in virtue of the use of distributed, superpositional storage schemes. • This is similar to what goes on in our brains. Compare: Messing with wiring in a computer.

Sub-symbolic representation • Physical symbol systems displayed semantic transparency: familiar words and ideas were rendered as simple inner symbols. • Connectionist approaches introduce greater distance between daily talk and the contents manipulated by the computational system. • The contentful elements in a subsymbolic program do not reflect our ways of thinking about the task domain. • The structure that’s represented by a large pattern of unit activity may be too rich and subtle to be captured in everyday language.

Post-training Analysis How do we figure out what knowledge and strategies the network is actually using to solve the problems in its task domain? • Artificial lesions. • Statistical Analysis, e.g. PCA, cluster analysis.

Recurrent Neural Networks • “Second generation” neural networks. • Geared towards producing patterns that are extended in time (e.g. commands to produce a running motion) and to recognizing temporally extended patterns (e.g. facial motions). • Includes a feedback-loop that “recycles” some aspects of the networks activity at time t1 along with the new inputs arriving at t2. • The traces that are preserved act as short-term memory, enabling the network to generate new responses that depend both on current input and on the previous activity of the network.

Dynamical Connectionism • “Third generation” connectionism. • Puts even greater stress on dynamic and time involving properties. • Introduces more neurobiologically realistic features, including special purpose units, more complex connectivity, computationally salient time delays in processing, deliberate use of noise, etc.

Minds and Machines