COM1070: Introduction to Artificial Intelligence: week 9 Yorick Wilks Computer Science Department

COM1070: Introduction to Artificial Intelligence: week 9 Yorick Wilks Computer Science Department University of Sheffield www.dcs.shef.ac.uk/-yorick

Rule-system Linguists: stress importance of rules in describing human behaviour. We know the rules of language, in that we are able to speak grammatically, or even to make judgements of whether a sentence is or is not grammatical. But this does not mean we know the rule like we know the rule ‘i before e except after c’: may not be able to state them explicitly. But has been held (e.g. Pinker, 1984 following Chomsky), that our knowledge of language is stored explicitly as rules. Only we cannot describe them verbally because they are written in a special code only the language processing system can understand:

Explicit inaccessible rule view Alternative view: no explicit inaccessible rules. Our performance is characterisable by rules, but they are emergent from the system, and are not explicitly represented anyway. e.g. honeycomb: structure could be described by a rule, but this rule is not explicitly coded. Regular structure of honeycomb arises from interaction of forces that wax balls exert on each other when compressed. Parallel distributed processing view: no explicit (albeit inaccessible) rules.

Advantages of using NNs to model aspects of human behaviour. • Neurally plausible, or at least ‘brain-style computing’. • Learned: not explicitly programmed. • No explicit rules; permits new explanation of phenomenon. • Model both produces the behaviour and fits the data: errors emerge naturally from the operation of the model. Contrast to symbolic models in all 4 respects (above)

Rumelhart and McClelland: …lawful behaviour and judgements maybe produced by a mechanism in which there is no explicit representation of the rule. Instead, we suggest that the mechanisms that process language and make judgements of grammaticality are constructed in such a way that their performance is characterizable by rules, but that the rules themselves are not written in explicit form anywhere in the mechanism..’ Important counter-argument to linguists, who tend to think that people were applying syntactic rules. Point: can have syntactic rules that describe language, but that doesn’t mean that when we speak syntactically (as if we were following those rules) that we literally are following rules.

Many philosophers have made a similar point against the reality of explicit rules --e.g. Wittgenstein. The ANN approach provides a computational model of how that might be possible in practice----to have the same behavioural effect as rules but without there being any anywhere in the system. On the other hand, the standard model of science is of amny possible rule systems describing the same phenomenon--that also allows that real rules (in a brain) could be quite different from the ones we invent to describe a phenomenon. Some computer scientists (e.g Charniak) refuse to accept incomprehensible explanations.

Specific criticisms of the model: Criticism 1 Performance of model depends on use of Wickelfeature representation: and this is an adaptation of standard linguistic featural analysis. – ie it relies on symbolic input representation(cf. phonemes in NETALK) Ie what’s the contribution of the architecture? Criticism 2 Pinker and Prince (1988): role of input and U-shaped curve. Model’s entry to Stage 2 due to addition of 410 medium frequency verbs. This change is more abrupt than is the case with children--there may be no relation between this method of partitioning the training data and what happens to children.

But later research (Plunkett and Marchman 1989) show that U-shaped curves can be achieved without abrupt changes in input. Trained on all examples together (using backpropogation net). Presented more irregular verbs, but still found regularization, and other Stage 2 phenomena for certain verbs. Criticism 3 Nets are not simply exposed to data, so that we can then examine what they learn. They are programmed in a sense: Decisions have to be made about several things including • Training algorithm to be used • Number of hidden units • How to represent the task in question

Input and output representation • Training examples, and manner of presentation Criticism 4 At some point after or during learning this kind of thing, humans become able to articulate the rule. Eg regular past tenses end in –ed. Also can control and alter these rules – eg could pretend to be a younger child and say ‘runned’ even though she knows it is incorrect (cf. some use learned and some learnt, lit is UK and lighted US) Hard to see how such kind of behaviour would emerge from a set of interconnected neurons.

Conclusions Although the Past-tense model can be criticised, it is best to evaluate it in the context of the time (1986) when it was first presented. At the time, it provided a tangible demonstration that • Possible to use neural net to model an aspect of human learning • Possible to capture apparently rule-governed behaviour in a neural net

Contrasting Neural Computing with Symbolic Artificial Intelligence • Overview of main differences between them. • Relationship to the brain (a) Similarities between Neural Computing and the brain (b) Differences between brain and Symbolic AI – evidence that brain does not have a von Neumann architecture.

Ability to provide an account of thought and cognition (a) Argument by symbolicists that only symbol system can provide an account of cognition (b) Counter-argument that neural computing (subsymbolic) can also provide an account of cognition (c) Hybrid account?

Main differences between Connectionism and Symbolic AI Knowledge: knowledge represented by weights and activations versus explicit propositions. Rules: rule-like behaviour without explicit rules versus explicit rules. Learning: Connectionist nets trained versus programmed. But there are now many machine learning algorithms that are wholly symbolic----both kinds only work in a specialised domain. Examinability: Can examine symbolic program to ‘see how it works’. Less easy in the case of Neural Computing – problems with black box nature – set of weights opague.

Relationship to the brain: Brain-style computing versus manipulation of symbols. Different models of human abilities. Ability to provide an account of human thought: see following discussion about need for symbol system to account for thought. Applicability to problems: Neural computing more suited to pattern recognition problems, Symbolic computing to systems characterisable by rules. But for a different view, that stresses similarities between GOFAI and NN approaches see Boden, M. (1991) Horses of a different colour, In Ramsey, W., Stich, S.P. and D.E. Rumelhart, ‘Philosophy and Connectionist Theory’, Lawrence Erlbaum Associates: Hillsdale, New Jersey, pp 3-19, where she points out some of the similarities. See also YW in Foundations of AI book, on web course list.

Fashions: historical tendency to model brain on fashionable technology. mid 17th century: water clocks and hydraulic puppets popular Descartes developed hydraulic theory of brain Early 18th century Leibniz likened brain to a factory. Freud: relied on electromagnetics and hydraulics in descriptions of mind. Sherrington: likened nervous system to telegraph. Brain also modelled as telephone switchboard.

We use a computer to model the human brain; but is human brain itself a computer? Differences between Brains and von Neumann machines McCulloch and Pitts: simplified account of neurons as On/Off switch. In early days seemed that neurons were like flip-flops in computers. Flip-flop: can be thought of as tiny switches that can be either off or on. But now clear that there are differences: • rate of firing of neuron important, as well as on/off feature • Neuron has enormous number of input and output connections, compared to logic gates.

speed: neuron much slower. Takes thousandth of a second to respond, whereas flip-flop can shift position from 0 to 1 in thousand-millionth of a second. I.e. brain takes a million times longer. Thus if brain running an AI program, stepping through instructions, would take at least 1000th sec for each instruction. Brain can extract meaning from sentence, or recognise visual pattern in about 1/10th second. So, if this is being accomplished by stepping through program, program can only be 100 instructions long. But current AI programs contain 1000s of instructions! Suggests brain operates in parallel, rather than as sequential processor.

Symbol manipulators: most (NOT ALL) are sequential – carrying out instructions in sequence. • human memories: content-addressable. Access to memory via its content. E.g. can retrieve memory via description: (e.g. could refer to Turing Test either as ‘Turing Test’, or as ‘assessment of intelligence based on Victorian parlour game, and would still access memory). But memory in computer has unique address: cannot get at memory without knowing its address (at the bottom level that is!). • memory distribution In a computer a string of symbol tokens exists at specific physical location in hardware. But our memories do not seem to function like that. E.g. Lashley and search for the engram

Trained rats to learn route through maze to food. Destroyed different areas of brain. As long as only 10 percent destroyed, no loss of memory, regardless of which area of brain destroyed. Lashley (1950) ‘…There are no special cells reserved for special memories… The same neurons which retain memory traces of one experience must also participate in countless other activities…’ and conversely a single memory must be stored in many places across a brain----there was brief fashion for ‘the brain as a hologram’ because of the way a hologram stores information.

Graceful degradation With injury, brain performance degrades gradually, but computers crash. E.g. Phineas Gage, railway worker. Speeding iron rod crashed through the anterior and middle left lobes of his cerebrum, but within minutes he was conscious, collected and speaking. He lived for thirteen years afterwards. BUT cannot conclude because of the differences between the brain, and the von Neumann machine, that the brain is not a computer. All the above is taken to show that brain does not have von Neumann architecture, but we could choose to store the same data all over a computer.

COM1070: Introduction to Artificial Intelligence: week 9 Yorick Wilks Computer Science Department