Bayesian and Connectionist Approaches to Learning

71 Views

Download Presentation
## Bayesian and Connectionist Approaches to Learning

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Bayesian and Connectionist Approaches to Learning**Tom Griffiths, Jay McClelland Alison Gopnik, Mark Seidenberg**Who Are We and What Do We Study?**• We are Cognitive and developmental psychologists who use mathematical and computational models together with experimental studies of children and adults • We studyHuman cognitive processes ranging from object recognition, language processing, and reading to semantic cognition, naïve physics and causal reasoning**Our Question**How do probabilistic/Bayesian and connectionist/neural network models relate?**Schedule**• Tom Griffiths • Probabilistic/Bayesian Approaches • Jay McClelland • Connectionist/Neural Network Approaches • Alison Gopnik • Causal Reasoning • Mark Seidenberg • Language Acquision • Open Discussion • Robotics, Machine Learning, Other Applications…**Emergent Functions of Simple Systems**J. L. McClellandStanford University**Emergent probabilistic optimization in neural networks**Relationship between competence/rational approaches and mechanistic (including connectionist) approaches Some models that bring connectionist and probabilistic approaches into proximal contact Topics**Given**A unit representing hypothesis hi, with binary inputs j representing the state of various elements of evidence e, where for all j p(ej) is assumed conditionally independent given hi A bias on the unit equal to log(priori/(1-priori)) Weights to the unit from each input equal to log(p(ej|hi)/(log(p(ej|not hi)) If the output of the unit is computed, taking the logistic function of the net input neti = biasi + Sj aj wij ai = 1/[1+exp( -neti)] Then ai = p(hi|e) A set units for mutually exclusive alternatives can assign the posterior probability to each in a similar way, using the softmax activation function ai = exp(gneti)/Si’ exp(gneti’) If g = 1, this constitutes probability matching. As g increases, more and more of the activation goes to the most likely alternative(s). Input fromunit j wij Unit i Connectionist Units Calculate Posteriors based on Priors and Evidence**Emergent Outcomes from Local Computations (Hopfield, ’82,**Hinton & Sejnowski, ’83) • If wij = wji and if units are updated asynchronously, setting ai = 1 if neti >0, ai = 0 otherwiseA network will settle to a state s which is a local maximum in a measure Rumelhart et al (1986) called G • G(s) =Si<jwij aiaj + Siai(biasi + exti) • If each unit sets its activation to 1 with probability logistic(gneti) thenp(s) = exp(gG(s))/Ss’(exp(gG(s’))**A Tweaked Connectionist Model (McClelland & Rumelhart, 1981)**that is Also a Graphical Model • Each pool of units in the IA model is equivalent to a Dirichlet variable (c.f. Dean, 2005). • This is enforced by using softmax to set one of the ai in each pool to 1 with probability: pj = egnetj/Sj’egnetj’ • Weight arrays linking the variables are equivalent of the ‘edges’ encoding conditional relationships between states of these different variables. • Biases at word level encode prior p(w). • Weights are bi-directional, but encode generative constraints (p(l|w), p(f|l)). • At equilibrium with g = 1, network’s probability of being in state s equals p(s|I).**We want to learn how to represent the world and constraints**among its constituents from experience, using (to the fullest extent possible) a domain-general approach. In this context, the prototypical connectionist learning rules correspond to probability maximization or matching Back Propagation Algorithm: Dwij = ediaj Maximizes p(oi|I) for each output unit. Boltmann Machine Learning Algorithm: Dwij = e (ai+aj+ - ai-aj-) Learns to match probabilities of entire output stateso given current Input. That is, it minimizes ∫p(o|I) log(p(o|I)/q(o|I)) do But that’s not the true PDP approach to Perception/Cognition/etc… I o**Hinton’s deep belief networks are fully distributed**learned connectionist models that use a restricted form of the Boltzmann machine (no intra-layer connections). They are fast and beat other machine learning methods. Adding generic constraints (sparsity, locality) allow such networks to learn efficiently and generalize very well in demanding task contexts. Recent Developments Hinton, Osindero, and Teh (2006). A fast learning algorithm for deep belief networks. Neural Computation, 18, 1527-54.**Emergent probabilistic optimization in neural networks**Relationship between competence/rational approaches and mechanistic (including connectionist) approaches Some models that bring connectionist and probabilistic approaches into proximal contact Topics**People are rational, their behavior is optimal.**They seek explicit internal models of the structure of the world, within which to reason. Optimal structure type for each domain Optimal structure instance within type People evolved through an optimization process, and are likely to approximate optimality/rationality within limits. Fundamental aspects of natural/intuitive cognition may depend largely on implicit knowledge. Natural structure (e.g. language) does not exactly correspond to any specific structure type. Culture/School encourages us to think and reason explicitly, and gives us tools for this; we do so under some circumstances. Many connectionist models do not directly address this kind of thinking; eventually they should be elaborated to do so. Two perspectives**Resource limits and implementation constraints are unknown,**and should be ignored in determining what is rational/optimal. Inference is still hard, and prior domain-specific constraints are therefore essential. Human behavior won’t be understood without considering the constraints it operates under. Determining what is optimal sans constraints is always useful, even so Such an effort should not presuppose individual humans intend to derive an explicit model. Inference is hard, and domain specific priors can help, but domain-general mechanisms subject to generic constraints deserve full exploration. In some cases such models may closely approximate what might be the optimal explicit model. But that model might only be an approximation and the domain-specific constraints might not be necessary. Two Perspectives, Cont’d**A competence-level approach can ask, what is the best**representation a child could have given the data gathered to date? The entire data sample is retained, and the optimal model is re-estimated The developing child is an on-line learning system; the parameters of the mind are adjusted as each new experience comes in, and the experiences themselves are rapidly lost. Perspectives on Development**Is a Convergence Possible?**• Yes! • It is possible to ask what is optimal/rational within any set of constraints. • Time • Architecture • Algorithm • Reliability and dynamics of the hardware • It is then possible to ask how close some mechanism actually comes to achieving optimality, within the specified constraints. • It is also possible to ask how close it comes to explaining actual human performance, including performance in learning and response to experience during development.**Emergent probabilistic optimization in neural networks**Relationship between competence/rational approaches and mechanistic (including connectionist) approaches Some models that bring connectionist and probabilistic approaches into proximal contact Topics**Models that Bring Connectionist and Probabilistic Approaches**into Proximal Contact • Graphical IA model of Context Effects in Perception • In progress; see Movellan & McClelland, 2001. • Leaky Competing Accumulator Model of Decision Dynamics • Usher and McClelland, 2001, and the large family of related decision making models • Models of Unsupervised Category Learning • Competitive Learning, OME, TOME (Lake et al, ICDL08). • Subjective Likelihood Model of Recognition Memory • McClelland and Chappell, 1998; c.f. REM, Steyvers and Shiffrin, 1997), and a forthcoming variant using distributed item representations.