What’s next for Parallel Distributed Processing? Mathematical Cognition and Other New Directions

What’s next for Parallel Distributed Processing?Mathematical Cognition and Other New Directions Jay McClelland Stanford University

Core features of the PDP approach to representation and learning • The knowledge is in the connections • It’s intrinsically implicit • It is acquired by a blind automatic procedure • By performing gradient descent rather than by making explicit inferences or engaging in any kind of reasoning process • It can approximate systems of rules • Without ever having any • It captures the gradual nature of developmental change • And emphasizes the importance of the gradual accumulation of small changes /h/ /i/ /n/ /t/ H I N T

Second and Third Waves ofNeural Networks • Some classical applications of PDP models: • Reading and morphology • Sentence processing • Semantic cognition • Intuitive physics • Some recent breakthroughs in machine learning: • Object classification • Speech recognition • Language processing • Surpassing human performance in Atari games

Sentiment Analysis (Socher et al, 2013)

What’s Next?

My lab’s new direction: Mathematical Cognition

Why is Math so Hard to Learn? • Late grade-school-aged kids misunderstand equations • What goes in the blank: 7 + 3 + 4 = __ + 4 • Many middle-school-aged kids misunderstand fractions • Is 19/20 closer to 1 or 21? • Most Stanford undergraduates don’t understand the rudiments of trigonometry • Which expression below has the same value as cos(-30°)? sin(30°) -sin(30°) cos(30°) -cos(30°)

Failure to attach the appropriate meaning to mathematical expressions • A fraction N/D represents a certain number N of pieces of a unit whole divided into D equal parts • An equation represents an equivalence relation between two quantities, one to the left and one to the right of the equals sign • The sine / cosine of an angle θ in degrees represents • the projection of a point on the unit circle specified by θ onto the vertical / horizontal axis through the center of the circle, • or equivalently, the coordinates of the point on the circle XXX 4 7 5 ?

cos(70)

cos(–70+0)

sin(-θ) cos(-θ) Reported Circle Use: “A Lot” “A Little” or “Not at all”

Why are these things hard to learn?

Learning Depends on the Prepared Mind • Algebra for eighth graders? • “though strong math students can benefit from taking algebra in eighth grade, it is "decidedly harmful" for weaker math students to be rushed into advanced math concepts” • Failure to appreciate what X/Y means • Setting up the appropriate encoding habits • Failure to rely on the unit circle? • Failure of a module for visuospatial cognition or failure to develop the habit of mapping numbers into a multi-faceted coordinate framework?

Habits of Mind1 • Learning to encode expressions automatically so that their meaning is readily apparent in the mind requires gradual connection adjustments that occurs incrementally over repeated opportunities to learn • This is no different in principle from learning to read words aloud • We quickly loose awareness that we are engaging in these processes – once we understand well, meaning is a habit of mind we cannot readily appreciate that others do not have Margolis, H. (1987). Patterns, Thinking and Cognition. U. of Chicago Press.

Case Study in Readiness:The Balance Scale

Balance Scale Model Training involved more cases in which the weight varied than cases In which the distance varied Network’s task was to activate the unit corresponding to the side that Should go down, or (if the sides are in balance) to set the activation ofboth output units to .5

Siegler’s Readiness Experiment • Two groups of children • 5 year old Rule 1 children • 7-8 year old Rule 1 children • After pretext: • Children saw 15 conflict problems with feedback • 1/3: the side with greater weight would go down • 1/3: the side with greater distance would go down • 1/3: the two sides balance • Most 7-8 yr olds progressed to rule 2 • Most 5 yr olds showed no change or reverted to guessing

Rule 1 Start Rule 1 End

Benefit From a Brief Lesson in the Unit Circle Depends on a Prepared Mind

Applying these ideas to Mathematical Cognition • Application 1: Representation of approximate number • Stoianov & Zorzi, 2012; Zou & McClelland (poster) • Application 2: Learning to correctly solveequivalence problems • Mickey & McClelland (talk) • Application 3: Incremental improvements in strategies for adding small numbers • Hanson, McKenzie & McClelland (talk) • Application 4: Learning to geometry • Me and anyone who is willing to help me!

The Approximate Number Problem • Must we really imagine that a system for representing number approximately is innate, or can the problem be solved using a generic neural network? • Can we account for the developmental improvement in acuity of the approximate number system? • Can we understand why our representations of approximate number have the properties that they do? • Specifically, why does our sensitivity to approximate numbers approximately conform to Weber’s law?

Why Neural Networks? • Deep (unsupervised) learning can create an invertible internal representation that is driven solely by the goal of capturing the content of its inputs • As Stoianov & Zorzi (2012) showed, this is sufficient to support human level performance in numerosity judgment. • Using ‘stochastic gradient descent’ instead of batch learning allows us to explore both the initial state and progressive refinement of representations. • Zou & McClelland explore the developmental trajectory, also explored in subsequent work by S & Z.

Errors on Equivalence Problems • Children are reliably incorrect in answers problems of the form: a = b + __ • They tend to put the sum of a and b in the blank, rather than the correct answer, which is b – a. • When given such equations in a brief presentation, and asked to reproduce them, they tend to reproduce them as a + b = __ • Children’s experiences are biased in ways that are consistent with these errors.

Why a Neural Network? • Gradually learns in a way that depends on statistics of training set • Exhibits ‘pattern completion’ biases that capture both math errors and problem reconstruction errors • Gradually learns it was out of its errors, capturing patterns in the data

These two models are great, but… • When we solve mathematical problems, we often perform a sequence of operations. • These operations are not rigidly structured, so we need flexibility • And as we gain facility, we can (spontaneously) develop more efficient strategies

Strategy change in simple addition5 + 2 = 7 • Children appear to gradually progress through a series of alternative ‘strategies’, with strategy choice being probabilistic and with the probabilities changing gradually over age • Children can be induced to change strategies if given problems that give a clear advantage to one strategy • Children’s strategies seem constrained to be consistent with the principles of addition, even though children can’t necessarily articulate such principles.

Incremental, Hierarchical, Supervised Reinforcement Learning in a Neural Network • A strategy is a sequence of steps, and reward only comes at the end • The time it takes as well as the outcome are automatically considerations in reinforcement learning. • The use of a neural network as function approximator supports generalization. • Re-use of number skills previously acquired leads to selection of task-appropriate rather than task inappropriate strategies. • The ‘strategy’ as a whole emerges as an assemblage of strategy chunks, each associated with a component skill relevant to addition. • A key idea is that learning is curriculum based:The culture and educational system provide early experiences in initial components that then provide the previously acquired skills.

Intuitive Geometry Project: Motivations • Geometrical intuition as developing gradually with age, through a series of ‘levels’. • A year’s course in Geometry has no special impact on student’s ‘level’. • Lessons learned from presenting students with the Socratic Dialog uncovering the supposed prior understanding of how to create a square with twice the area of a given square. • In spite of profession of ‘understanding’ after walking through the dialog, those with many misconceptions can’t demonstrate the solution on a new square. • Geometry as grounded in Intuition but ultimately connected to proof • Carmenga, Transforming Geometric Proof with Reflections, Rotations and Translations • Henderson, Experiencing Geomety

Example: ASA(Informal) • Given: ∠A≅∠A’, AC≅A’C’, ∠C≅∠C’ • Prove: △ABC≅△A’B’C’ • Idea: • translate A to A’ • rotate △ABC until AC coincides with A’C’ • reflect over A’C’ if necessary. Then the whole triangle coincides!

Example: ASA(Rigorous) Given: ∠A≅∠A’, AC≅A’C’, ∠C≅∠C’ • Translate △ABC so that A coincides with A’. • Rotate △ABC so that ray AC coincides with ray A’C’. Since AC≅A’C’, C coincides with C’. • If B and B’ are on different sides of line AC, reflect △ABC over line AC. • Since ∠A≅∠A’ and AC and A’C’ coincide and are on the same side of the angle, ∠A coincides with ∠A’. • Since the angles coincide, the other rays AB and A’B’ coincide. • Similarly, since ∠C≅∠C’ and AC and A’C’ coincide, ∠C coincides with ∠C’ and the other rays CB and C’B’ coincide. • Since ray AB coincides with ray A’B’ and ray CB with ray C’B’and two lines intersect in at most one point, B coincides with B’. • Since all sides and angles coincide, △ABC≅△A’B’C’.

How Can We Begin to Make Progress on this Ambitious Project? • Create a simulated agent that must carry out tasks in a virtual world • Similar to the Deepmind ATARI project • Agent has a few actions it can perform • Change its point of view on its (2-D) world • Move, rotate and flip objects • Measure, copy, and construct objects to instruction using geometry tools • E.g., adjustable straightedge and compass • Demo some time during the conference! • Train the agent using incremental supervised learning • Initial tasks: • Find named objects, find objects that have the same shape as a given target • Translate, rotate, and flip objects to fit them through shaped wholes • Learn to measure length and angle • Learn to impose alternative frames of reference to identify congruent shapes under rotations and flips

Two Relevant Ideas • Use eye movements to bring objects to the center of gaze, where they can be recognized in canonical position. • Plaut, McClelland, & Seidenberg, NCPW 1 • Mnihet al, 2014 • Use transforming autoencoders to learn effects of transformations on the way objects look. • Hinton et al, 2011

Later Stages V. Carry out Euclidean constructions to instruction - Based on given diagrams - Purely from instruction VI. Determine perimeter and area of polygons and circles in given units VII. Establish correspondence between figures A and B VIII. Solve complex geometry problems requiring several intermediate inferences and computations

One Example Problem

Challenges, Open Questions, and Broader Directions • Explicit cognition, metacognitive knowledge, ant their relation to knowledge in connections • Abstract mathematics, proof, and justification • Are they, at least in part, extensions of concrete embodied reasoning • A broader understanding of understanding in embodied terms • We don’t just map mathematical expressions onto learned conceptual structures, we do the same in general when we understand ideas expressed in language as well • These are the questions for the next ten years

What’s next for Parallel Distributed Processing? Mathematical Cognition and Other New Directions