1 / 38

Autonomous Developmental Learning

Autonomous Developmental Learning. Gerhard Neumann SS 2005. Outline of the talk. General Theory and Philosophy Architecture Sensory Mapping Cognitive Mapping Sensorimotor System Experiments and Results Reinforcement Learning Action Chaining. Developmental Robotics.

wendi
Download Presentation

Autonomous Developmental Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Autonomous Developmental Learning Gerhard Neumann SS 2005

  2. Outline of the talk • General Theory and Philosophy • Architecture • Sensory Mapping • Cognitive Mapping • Sensorimotor System • Experiments and Results • Reinforcement Learning • Action Chaining

  3. Developmental Robotics • Investigates Models coming from developmental psychology or developmental neuroscience • Applying insights from studies on ontogenic neuroscience • Many different studies about • Social interaction [Fong 2003] • Sensorimotor control [Weng 2004, Metta 2003] • Categorization [Pfeifer 1999] • Value Systems [Pfeifer 1999, Sporns 2002] • Morphological changes and motor skill acquisition [Lungrella 2002] • Discussed in this talk: Autonomous Mental Development (AMD) [Weng 2001] as one approach for Developmental Robotics

  4. Machine Development Paradigms • Manual development • Given: Task T and ecological conditions Ec • Human developer H understands Task T and programs the agent: A = H(T, Ec) • Task specific architecture, representation and skills are developed by human hands • Autonomous development • Given: ecological conditions Ec, the task is unkown • Internal representation can not be predefined • Human developer H writes a task-non specific developmental program for the newborn agent: A(0) = H(Ec) • The task has to be understood by the agent itself • After Birth Human teachers can affect the behavior of the robot by: • Supervised learning • Reinforcement learning • Communicative learning

  5. SASE Agents • Self-Aware and Self-Effecting Agent: • Has additionally an internal environment (the brain) • Internal sensors and internal effectors in addition to external sensors and effectors • E.g. attention control and action realease are internal actions • All conscious internal actions have coresponding internal sensors • Internal and external environments are used for perception and cognition

  6. Internal Representation • Symbolic Representation (Traditional AI): • world-centered : Describes an object in the external world with an unique predefined set of attributes. • Each component in the representation has a predefined meaning. • Distributed Representation (AMD): • Body centered: Grown from the body‘s sensors and effectors • Vector form: A = (v1, v2, …, vn) consisting of sensory input and motor control output (or a function of both). • Representation of an object is distributed over many cortical areas. • Used by developmental programs because the task is unknown at programming time.

  7. Architecture: past and future contexts • System receives last context as the input vector: • l(t) = <xl(t), al(t)> • xl(t), al(t) : last sensation and last action • Include internal sensors and actions.

  8. Architecture: Primed Contexts • System predicts the primed (future) context: • p(t) = <xp(t), ap(t), Q(xp(t), ap(t))> • Not sufficient to predict one primed context • There might be multiple future possibilities • Reality mapping R : {p1(t), p2(t), …, pk(t)} = R(l(t)) • Value system V : selects a desirable context pi(t), based on the Q-Value • Second Mapping F for the far future. • R und F are developed incrementally through experience

  9. Sensory Mapping:Staggered Hierarchical Mapping • High dimensional sensory input • Visual: Images 100 x 100 : 10000 • Auditory: 300 – 1000 • Appropriate (autonomous) feature extraction is needed • Inspired by human early visual pathways • Uses incremential PCA in receptive fields • Apply filters to the receptive fields • Each Filter is given by the eigenvectors of the PCA calculation

  10. Non Overlapping : Many Filters for one receptive field (RF) All eigen-vectors are used for one RF Low resolution Overlapping (Staggered) One filter (a specified eigenvector) per RF Tradeoff between resolution and the dimension of the feature space Sensory Mapping:Staggered Receptive Fields

  11. Neurons of each layer are organized in a 2-D array (resembles the structure of images) For each neuron of layer k localized connections are applied to the neurons of layer k-1 At any position with any scale, a neuron can be found whose receptive field approximately covers the region. Sensory Mapping: Layered Structure

  12. Sensory Mapping: CCIPCA • Candid Covariance-Free Incremental Principal Component Analysis • Standard PCA is a batch method, not applicable for developmental learning • Usually very high dimensional input, can not compute covariance matrix in real time • Standard PCA: • Computes eigen-directions (direction of maximal variance) of the sample data • Eigen-directions are the eigenvectors of the covariance matrix A = E[u(t) uT(t)] • (u(t) … zero mean sample distribution)

  13. Sensory Mapping: Incremental PCA • Calculate 1st eigenvector • Converges to the eigenvector with the highest eigenvalue • No convergence if we have equal eigenvectors • High Order Eigenvectors • Substract the projection of the data sample with the lower order eigenvectors • Apply the same algorithm as for the first eigenvector

  14. Sensory Mapping:Eigengroups • Eigengroup of layer k is defined as n x n • n … maximum distance between two neurons to have their input region overlap • Define the number of the eigenvector for each neuron (filter) in a eigengroup • Usually this ordering is the same for all eigengroups in a layer • Calculate the eigenvectors incrementially with CCIPCA in the predefined order and substract the projection of the data with the coresponding eigenvector • Inhibition of nearby neurons: detect different statistically uncorrelated features • Output: Product of the input vector with the eigenvector • Apply a sigmoidal function

  15. Sensory Mapping: Eigengroups • Eigengroups: • Sharing Method: a single set of filters for all eigengroups • Applied to 5000 natural images • First several filters are similar to biological receptive field patterns

  16. Sensory Mapping:Selective Attention • Each sensory mapping unit has internal attention effectors • Layered structure: • Can not define clear-cut attended region in the input space • Attented Region: 3-D ellipsoid centered at (x,y,l), l is the layer. • Experiments with Occlusion • Occlude either the upper or lower half of face images • Own SHMs were used for the different occlusions • Outperforms the approach without attention control significantly

  17. Cognitive Mapping:Incremental Hierarchical Discriminant Regression (IHDR) • Cognitive Mapping: • X : space of last contexts • Y : space of primed contexts • Find discriminant features in input space • High dimensional input space • Classical decision trees are not applicable • Modelled by an Hierarchical Discriminant Regression Tree

  18. Cognitive Mapping:IHDR Tree • Each node contains: • x-clusters and coresponding y-cluster • y-clusters determine the virtual class labels • Defines to which cluster pair the example (x,y) belongs • x-clusters approximates the sample population in the X-space • Maximal q clusters of each type per node • Spawn a child node from the current node if a finer approximation is required • None of the clusters keep actual input samples, only first order statistics are used

  19. Cognitive Mapping:HDR Tree • Build tree for a set of samples S: • Cluster the y-vectors into p clusters • Assign each example to the nearest y-cluster • Calculate mean and covariance matrix of each x-cluster • Reassign each example to the nearest x-cluster • If the y-labels of the examples one cluster (S‘) differ significantly, create a new node and recursively build the tree (with subset S‘ as input). • Retrieval: • Calculate probabilistic-based distances to each cluster of a node • Always continue the search at the k nearest clusters

  20. Cognitive Mapping • The deeper a node is in the tree, the smaller is the variance • Gaussian distributions: hierarchical version of mixture of gaussian distribution models • Calculate the distance to the clusters: • Euclidian Distance • Mahalanobis Distance (single covariance matrix) • Gaussian Distance (individual covariance matrices) • The choice of the distance measure is based on the number of samples provided for the corresponding cluster

  21. Cognitive Mapping:Distance Measure • Mahalanobis Distance, Gaussian Distance: • Estimate of the cov. matrices is needed • Impossible for high-dimensional input spaces • Computations are done in the discriminant space D, not in X • D calculated by Fisher‘s linear discriminant analysis (LDA) • LDA calculates the best discriminating space for a K-label classification problem • For q clusters, we get a q-1 dimensional discriminant space D • Distance calculations and hence the calculations of the covariance matrices are done in the space D

  22. S: Spatial Sensory Mapping T: Spatiotemporal Sensory Mapping Each internal and external action output feeds back into the sensory input M: Motor mapping generates concise representations for stereotyped actions, selects primed context with the highest confident index Sensorimotor System: Level Building Element (LBE)

  23. Sensorimotor System: Level Building Element (LBE) • Priority Updating Queue (PUQ) used for the far future predictions F • At every time instant, put the selected primed context p(t) in the PUQ, remove the oldest entry. • Update each entry in the Queue (beginning with the newest entry) • Inspired by the Q-Learning Algorithm • Information embedded in the future primed context p(t+1) is back-propageted into earlier primed contexts. • F can be seen as average future context

  24. Sensorimotor System: Multilevel Architecture • Low-level architecture uses fine time steps • Higher level can become more abstract • Use low-level primed context as input for the high-level LBE • Same architecture is used for the different levels • Use the levels for different sensory integration (vision, audio…)

  25. AMD: Teaching the Robot • Internal Representation of the Robot can not be accessed at after creation • Supervised Learning: • Human imposes action by buttons or directly manipulating the robot • Set the value of an imposed action to a high value • Reinforcement Learning: • Human gives rewards for the action (good = 1, bad = -1) by two buttons • Cummunicative Learning: • Desired Action • Wether the current action is good • Rules to follow • Criterea to judge right and wrong

  26. SAIL: Single robot arm, wheele driven 13 DOF DAV: Wheele driven base, humanoid torso 43 DOF Sensors: Stereo-cameras, microphones, laser range scanner, touch sensors Experiments: Used Robots

  27. Experiment:Vision-guided navigation • Indoor navigation task using SAIL • Human teacher navigated robot through corridors: supervised learning • After 4 trips robot navigated autonomously, teacher had to hand push in certain situations • After 10 trips the robot managed to navigate without help • Experiment was repeated outdoors with limited success

  28. Experiment:Learning from Novelty and Rewards • Define Novelty Measure • Difference between the primed sensation xp(t) and the actual sensation xl(t) • R(t) : reward given from a human • Actual reward : • Q-Values learnt by Prototype Update Queue • Single Level System is used

  29. Experiment:Learning from Novelty and Rewards • 3 actions: • Stay at current view • Look right (30 °) • Look left (30 °) • 7 absolute viewing positions • Sensory Input: • Simulation: 100 x 100 image • SAIL: 40 x 30 x 3 x 2 images • Experiments: • Habituation Effect: Startet with initial positive Q-Value for stay at current scene, after a while roboter becomes bored • Integration of novelty and immediate reward: • Positive reward for turning left, otherwise negative • => always turns left • Moving Toy added to the environment in viewing angle 0 • => Stay in this position

  30. Experiments:Speech learning • Uses the same developmental architecture • Auditory streams have not been segmented or labeled • During learning, the entire system must listen to everything • No gramatical syntax is envolved • Word Recoginition • Numbers from 1 to 10 • 63 persons, 5 utterances per digit (3150 examples) • 4 layers in the sensory Mapping Module • Input: 13th order Mel-frequency Cepstral Coefficients (MFCCs) • Supervised learning

  31. Experiments:Speech learning • Selective Attention: • Two layers in the sensory mapping module • Different temporal integration • Attention Control: Choose from one of these layers • Learned through Reinforcement Learning • Attention is learned according to each word, speaker and utterance • Results after 10 Epochs: • Tail of one and seven are quite similiar • => Take 2nd layer

  32. Experiment:Action Chaining • Action Chaining: • CC, CS1, CS2 : Voice commands • AS1, AS2 : Actions • Conditioning Problem • Multi Level LBE are used • Pure reinforcment learning does not work well due to the lack of generalization • 2nd level gets the averaged version of the future context as input (the F context) => generalization over the current context

  33. Experiment:Action Chaining • Reinforcement Learning in the two levels • Lower Level proposes action, action is only executed if Q2 > 0, otherwise no action is executed • Experiment: • 4 primitive actions: • Behavior establishment: Supervised Learning • Action Chaining: „Start“, „one“, „two“, „three“, „four“ • Success: „Start“ -> execute actions • Experiment was repeated 20 times

  34. Experiment:Range-Based Navigation • Input: Laser Scanner • 360 laser rays (0.5° resolution) • Programmed Attention control: • If all readings are larger than treshold T, no special attention is needed because all objects are far away • If some readings are lower than T, only pass this readings, replace the other values through the average value. • Simulation experiments: • Supervised learning • 16 scenarios, 1157 examples • With attention control: performed successfully a 5 minutes run • Result was also tested on the Dav robot • 15 minutes run in a crowded corridor without collosion

  35. Summary/Conclusion • New area in robotic where the task is not given to the developer • No human bias • Can deal with uncontrolled environments, automatically extendable • Human can only adjust the behavior by teaching • Only way to control robots in unknown domains • Good methods for dealing with high dimensional input • Problematic to apply to highly accurate control tasks (humanoid robots)

  36. Literature • General Theory • [Lungrella 2004] „Beyond Gazing, Pointint and Reachng, A Survey of Developmental Robotics“ • [Fong 2003] „A Survey of socially interactive robots“, Robotics and Autonomous systems • [Metta 2000] „Babybot: A study Into sensorimotor development“, PhD Thesis • [Pfeifer 1999] „Understanding Intelligence“, MIT Press • [Sporns 2002] „Embodied cognition“, MIT Handbook of brain Theory and Neual Networks • [Lungrella 2003] „Learning to bounce: first lesson from a bouncing robot“, In Proc. Of the 4th Int. Conference on Simulation of Adaptive Motion in Animals and Machines • Autonomous Mental Development (Papers found on http://www.cse.msu.edu/%7Eweng/research/LM.html) • People Involved: J.Weng, Y. Zhang, W. Hwang • [Weng 2004] „Developmental Robotics: Theory and Experiments''  International Journal of Humanoid Robotics • [Weng 2002] „A Theory for Mentally Developing Robots,''  in Proc. 2nd International Conference on Development and Learning • [Weng 2004] A Theory of Developmental Architecture,  in Proc. 3rd International Conference on Development and Learning (ICDL 2004)

  37. Literature • Experiments • [Zhang 2002] „Action Chaining by a Developmental Robot with a Value System,''  in Proc. 2nd International Conference on Development and Learning, • [Huang 2002] "Novelty and Reinforcement Learning in the Value System of Developmental Robots,"  in Proc. Second International Workshop on Epigenetic Robotics • [Zeng 2004] „Obstacle Avoidance through Incremental Learning with Attention Selection,''  in Proc. IEEE Int'l Conf. on Robotics and Automation, • [Zhang 2001] „Grounded Auditory Development by a Developmental Robot,''  in Proc. INNS/IEEE International Joint Conference of Neural Networks 2001 (IJCNN 2001) • Sensory Mapping • [Zhang 2002] „A Developing Sensory Mapping for Robots“,  in Proc. 2nd International Conference on Development and Learning • [Weng 2003] „Candid Covariance-free Incremental Principal Component Analysis,'' IEEE Trans. Pattern Analysis and Machine Intelligence • Cognitive Mapping • [Hwang 2000] „Hierarchical Discriminant Regression'', IEEE Trans. Pattern Analysis and Machine Intelligence • [Weng 2000] „An incremental learning algorithm with automatically derived discriminating features'',  in Proc. Asian Conference on Computer Vision

  38. The End • Thank You !

More Related