1 / 38

Using Backprop to Understand Apects of Cognitive Development

Using Backprop to Understand Apects of Cognitive Development. PDP Class Feb 8, 2010. Back propagation algorithm. Propagate activation forward Propagate “error” backward Calculate ‘weight error derivative’ terms = d r a s Change weights after Each pattern A batch of patterns. i.

deweyk
Download Presentation

Using Backprop to Understand Apects of Cognitive Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Backprop to Understand Apects of Cognitive Development PDP ClassFeb 8, 2010

  2. Back propagation algorithm • Propagate activation forward • Propagate “error” backward • Calculate ‘weight error derivative’ terms = dras • Change weights after • Each pattern • A batch of patterns i At the output level: di = (ti-ai)f’(neti) At other levels:dj = f’(netj)Sidiwij, etc. j k

  3. Variants/Embellishments to back propagation • We can include weight decay and momentum:Dwrs= eSpdrpasp – wwrs + aDwrs(prev) • An alternative error measure has both conceptual and practical advantages: CEp = -Si[tiplog(aip) + (1-tip)log(1-aip)] • If targets are actually probabilistic, minimizing CEp causes activations to match the probability of the observed target values. • This also eliminates the ‘pinned output unit’ problem.

  4. Is backprop biologically plausible? • Neurons do net send error signals backward across their weights through a chain of neurons, as far as anyone can tell. • But we shouldn’t be too literal minded about the actual biological implementation of the learning rule. • Some neurons appear to use error signals, and there are ways to use differences between activation signals to carry error information. (We will explore this in a later lecture.)

  5. Why is back propagation important? • Provides a procedure that allows networks to learn weights that can solve any deterministic input-output problem. • Contrary to expectation, it does not get stuck in local minima except in cases where the network is exceptionally tightly constrained. • Allows networks with multiple hidden layers to be trained, although learning tends to proceed slowly (later we will learn about procedures that can fix this). • Allows networks to learn how to represent information as well as how to use it. • Raises questions about the nature of representations and of what must be specified in order to learn them.

  6. The Time-Course of Cognitive Development • Networks trained with back-propagation address several issues in development including • Whether innate knowledge is necessary as a starting point for learning. • Aspects of the time course of development • What causes changes in the pattern of responses children make at different times during development? • What allows a learned to reach a the point of being ready to learn something s/he previously was not ready to learn?

  7. Two Example Models • Rumelhart’s semantic learning model • Addresses most of the issues above • Available as the “semnet” script in the bp directory • Model of child development in a ‘naïve physics’ task (Piaget’s balance scale task) • Addresses stage transitions and readiness to learn new things • We will not get to this; see readings of interested

  8. Quillian’s (1969)HierarchicalPropositional Model

  9. The Rumelhart (1990) Model

  10. The Training Data: All propositions true of items at the bottom levelof the tree, e.g.: Robin can {fly, move, grow}

  11. The Rumelhart Model: Target output for ‘robin can’ input

  12. The Rumelhart Model

  13. Early Later LaterStill Experie nce

  14. Inference and Generalizationin the PDP Model • A semantic representation for a new item can be derived by error propagation from given information, using knowledge already stored in the weights.

  15. Start with a neutral representation on the representation units. Use backprop to adjust the representation to minimize the error.

  16. The result is a representation similar to that of the average bird…

  17. Use the representation to infer what a this new thing can do.

  18. Some Phenomena in Conceptual Development • Progressive differentiation of concepts • Illusory correlations and U-shaped developmental trajectories • Domain- and property-specific constraints on generalization • Reorganization of Conceptual Knoweldge

  19. What Drives Progressive Differentiation? • Waves of differentiation reflect sensitivity to patterns of coherent covariation of properties across items. • Patterns of coherent covariation are reflected in the principal components of the property covariance matrix. • Figure shows attribute loadings on the first three principal components: • 1. Plants vs. animals • 2. Birds vs. fish • 3. Trees vs. flowers • Same color = features covary in component • Diff color = anti-covarying features

  20. Coherent Covariation • The tendency of properties of objects to co-occur in clusters. • e.g. • Has wings • Can fly • Is light • Or • Has roots • Has rigid cell walls • Can grow tall

  21. 12345678910111213141516 Properties Coherent Incoherent CoherenceTraining Patterns is can has is can has … Items No labels are provided Each item and each property occurs with equal frequency

  22. Effects of Coherence on Learning CoherentProperties Incoherent Properties

  23. Effect of Coherence on Representation

  24. Effects of Coherent Variation on Learning in Connectionist Models • Attributes that vary together create the acquired concepts that populate the taxonomic hierarchy, and determine which properties are central and which are incidental to a given concept. • Labeling of these concepts or their properties is in no way necessary, but it may contribute additional ‘covarying’ information, and can affect the pattern of differentiation. • Arbitrary properties (those that do not co-vary with others) are very difficult to learn. • And it is harder to learn names for concepts that are only differentiated by such arbitrary properties.

  25. A Sensitivity to Coherence Requires Convergence A A

  26. Illusory Correlations • Rochel Gelman found that children think that all animals have feet. • Even animals that look like small furry balls and don’t seem to have any feet at all.

  27. A typical property thata particular object lacks e.g., pine has leaves An infrequent, atypical property

  28. Domain Specificity • What constraints are required for development and elaboration of domain-specific knowledge? • Are domain specific constraints required? • Or are there general principles that allow for acquisition of conceptual knowledge of all different types?

  29. Differential Importance (Marcario, 1991) • 3-4 yr old children see a puppet and are told he likes to eat, or play with, a certain object (e.g., top object at right) • Children then must choose another one that will “be the same kind of thing to eat” or that will be “the same kind of thing to play with”. • In the first case they tend to choose the object with the same color. • In the second case they will tend to choose the object with the same shape.

  30. Can the knowledge that one kind of property is important for one type of thing while another is important for a different type of thing be learned? • They can in the PDP model, since it is sensitive to domain-specific patterns of coherent covariation.

  31. Adjustments to Training Environment • Among the plants: • All trees are large • All flowers are small • Either can be bright or dull • Among the animals: • All birds are bright • All fish are dull • Either can be small or large • In other words: • Size covaries with properties that differentiate different types of plants • Brightness covaries with properties that differentiate different types of animals

  32. Testing Feature Importance • After partial learning, model is shown eight test objects: • Four “Animals”: • All have skin • One is large, bright; one small, bright; one large, dull, one small, dull. • Four “Plants”: • All have roots • Same 4 combinations as above • Representations are generated by usingback-propagation to representation. • Representations are then compared to see which ‘animals’ are treated as most similar, and which ‘plants’ are treated as most similar.

  33. The Rumelhart Model

  34. Similarities of Obtained Representations Brightness is relevant for Animals Size is relevant for Plants

  35. Additional Properties of the model • The model is sensitive to amount and type of exposure, addressing frequency effects, expertise effects and capturing different types of expertise. • The model’s pattern of generalization varies as a function of the type of property as well as the domain. • The model can reorganize its knowledge: • It will first learn about superficial appearance properties if these are generally available; later, it can re-organize its knowledge based on coherent covariation among properties that are only occur in specific context.

More Related