autonomous developmental learning
Skip this Video
Download Presentation
Autonomous Developmental Learning

Loading in 2 Seconds...

play fullscreen
1 / 38

Autonomous Developmental Learning - PowerPoint PPT Presentation

  • Uploaded on

Autonomous Developmental Learning. Gerhard Neumann SS 2005. Outline of the talk. General Theory and Philosophy Architecture Sensory Mapping Cognitive Mapping Sensorimotor System Experiments and Results Reinforcement Learning Action Chaining. Developmental Robotics.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Autonomous Developmental Learning' - wendi

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
autonomous developmental learning

Autonomous Developmental Learning

Gerhard Neumann

SS 2005

outline of the talk
Outline of the talk
  • General Theory and Philosophy
  • Architecture
    • Sensory Mapping
    • Cognitive Mapping
    • Sensorimotor System
  • Experiments and Results
    • Reinforcement Learning
    • Action Chaining
developmental robotics
Developmental Robotics
  • Investigates Models coming from developmental psychology or developmental neuroscience
  • Applying insights from studies on ontogenic neuroscience
  • Many different studies about
    • Social interaction [Fong 2003]
    • Sensorimotor control [Weng 2004, Metta 2003]
    • Categorization [Pfeifer 1999]
    • Value Systems [Pfeifer 1999, Sporns 2002]
    • Morphological changes and motor skill acquisition [Lungrella 2002]
  • Discussed in this talk: Autonomous Mental Development (AMD) [Weng 2001] as one approach for Developmental Robotics
machine development paradigms
Machine Development Paradigms
  • Manual development
    • Given: Task T and ecological conditions Ec
      • Human developer H understands Task T and programs the agent: A = H(T, Ec)
    • Task specific architecture, representation and skills are developed by human hands
  • Autonomous development
    • Given: ecological conditions Ec, the task is unkown
      • Internal representation can not be predefined
      • Human developer H writes a task-non specific developmental program for the newborn agent: A(0) = H(Ec)
      • The task has to be understood by the agent itself
    • After Birth Human teachers can affect the behavior of the robot by:
      • Supervised learning
      • Reinforcement learning
      • Communicative learning
sase agents
SASE Agents
  • Self-Aware and Self-Effecting Agent:
    • Has additionally an internal environment (the brain)
    • Internal sensors and internal effectors in addition to external sensors and effectors
      • E.g. attention control and action realease are internal actions
      • All conscious internal actions have coresponding internal sensors
    • Internal and external environments are used for perception and cognition
internal representation
Internal Representation
  • Symbolic Representation (Traditional AI):
    • world-centered : Describes an object in the external world with an unique predefined set of attributes.
    • Each component in the representation has a predefined meaning.
  • Distributed Representation (AMD):
    • Body centered: Grown from the body‘s sensors and effectors
    • Vector form: A = (v1, v2, …, vn) consisting of sensory input and motor control output (or a function of both).
    • Representation of an object is distributed over many cortical areas.
    • Used by developmental programs because the task is unknown at programming time.
architecture past and future contexts
Architecture: past and future contexts
  • System receives last context as the input vector:
    • l(t) =
    • xl(t), al(t) : last sensation and last action
    • Include internal sensors and actions.
architecture primed contexts
Architecture: Primed Contexts
  • System predicts the primed (future) context:
    • p(t) =
    • Not sufficient to predict one primed context
      • There might be multiple future possibilities
    • Reality mapping R : {p1(t), p2(t), …, pk(t)} = R(l(t))
    • Value system V : selects a desirable context pi(t), based on the Q-Value
  • Second Mapping F for the far future.
  • R und F are developed incrementally through experience
sensory mapping staggered hierarchical mapping
Sensory Mapping:Staggered Hierarchical Mapping
  • High dimensional sensory input
    • Visual: Images 100 x 100 : 10000
    • Auditory: 300 – 1000
  • Appropriate (autonomous) feature extraction is needed
  • Inspired by human early visual pathways
  • Uses incremential PCA in receptive fields
  • Apply filters to the receptive fields
    • Each Filter is given by the eigenvectors of the PCA calculation
sensory mapping staggered receptive fields
Non Overlapping :

Many Filters for one receptive field (RF)

All eigen-vectors are used for one RF

Low resolution

Overlapping (Staggered)

One filter (a specified eigenvector) per RF

Tradeoff between resolution and the dimension of the feature space

Sensory Mapping:Staggered Receptive Fields
sensory mapping layered structure
Neurons of each layer are organized in a 2-D array (resembles the structure of images)

For each neuron of layer k localized connections are applied to the neurons of layer k-1

At any position with any scale, a neuron can be found whose receptive field approximately covers the region.

Sensory Mapping: Layered Structure
sensory mapping ccipca
Sensory Mapping: CCIPCA
  • Candid Covariance-Free Incremental Principal Component Analysis
    • Standard PCA is a batch method, not applicable for developmental learning
    • Usually very high dimensional input, can not compute covariance matrix in real time
  • Standard PCA:
    • Computes eigen-directions (direction of maximal variance) of the sample data
    • Eigen-directions are the eigenvectors of the covariance matrix A = E[u(t) uT(t)]
      • (u(t) … zero mean sample distribution)
sensory mapping incremental pca
Sensory Mapping: Incremental PCA
  • Calculate 1st eigenvector
    • Converges to the eigenvector with the highest eigenvalue
    • No convergence if we have equal eigenvectors
  • High Order Eigenvectors
    • Substract the projection of the data sample with the lower order eigenvectors
    • Apply the same algorithm as for the first eigenvector
sensory mapping eigengroups
Sensory Mapping:Eigengroups
  • Eigengroup of layer k is defined as n x n
    • n … maximum distance between two neurons to have their input region overlap
    • Define the number of the eigenvector for each neuron (filter) in a eigengroup
      • Usually this ordering is the same for all eigengroups in a layer
    • Calculate the eigenvectors incrementially with CCIPCA in the predefined order and substract the projection of the data with the coresponding eigenvector
    • Inhibition of nearby neurons: detect different statistically uncorrelated features
    • Output: Product of the input vector with the eigenvector
      • Apply a sigmoidal function
sensory mapping eigengroups15
Sensory Mapping: Eigengroups
  • Eigengroups:
    • Sharing Method: a single set of filters for all eigengroups
    • Applied to 5000 natural images
    • First several filters are similar to biological receptive field patterns
sensory mapping selective attention
Sensory Mapping:Selective Attention
  • Each sensory mapping unit has internal attention effectors
  • Layered structure:
    • Can not define clear-cut attended region in the input space
  • Attented Region: 3-D ellipsoid centered at (x,y,l), l is the layer.
  • Experiments with Occlusion
    • Occlude either the upper or

lower half of face images

    • Own SHMs were used for

the different occlusions

    • Outperforms the approach

without attention control significantly

cognitive mapping incremental hierarchical discriminant regression ihdr
Cognitive Mapping:Incremental Hierarchical Discriminant Regression (IHDR)
  • Cognitive Mapping:
    • X : space of last contexts
    • Y : space of primed contexts
  • Find discriminant features in input space
  • High dimensional input space
    • Classical decision trees are not applicable
  • Modelled by an Hierarchical Discriminant Regression Tree
cognitive mapping ihdr tree
Cognitive Mapping:IHDR Tree
  • Each node contains:
    • x-clusters and coresponding y-cluster
      • y-clusters determine the virtual class labels
        • Defines to which cluster pair the example (x,y) belongs
      • x-clusters approximates the sample population in the X-space
    • Maximal q clusters of each type per node
  • Spawn a child node from the current node if a finer approximation is required
  • None of the clusters keep actual input samples, only first order statistics are used
cognitive mapping hdr tree
Cognitive Mapping:HDR Tree
  • Build tree for a set of samples S:
    • Cluster the y-vectors into p clusters
    • Assign each example to the nearest y-cluster
    • Calculate mean and covariance matrix of each x-cluster
    • Reassign each example to the nearest x-cluster
    • If the y-labels of the examples one cluster (S‘) differ significantly, create a new node and recursively build the tree (with subset S‘ as input).
  • Retrieval:
    • Calculate probabilistic-based distances to each cluster of a node
    • Always continue the search at the k nearest clusters
cognitive mapping
Cognitive Mapping
  • The deeper a node is in the tree, the smaller is the variance
  • Gaussian distributions: hierarchical version of mixture of gaussian distribution models
  • Calculate the distance to the clusters:
    • Euclidian Distance
    • Mahalanobis Distance (single covariance matrix)
    • Gaussian Distance (individual covariance matrices)
    • The choice of the distance measure is based on the number of samples provided for the corresponding cluster
cognitive mapping distance measure
Cognitive Mapping:Distance Measure
  • Mahalanobis Distance, Gaussian Distance:
    • Estimate of the cov. matrices is needed
    • Impossible for high-dimensional input spaces
    • Computations are done in the discriminant space D, not in X
      • D calculated by Fisher‘s linear discriminant analysis (LDA)
      • LDA calculates the best discriminating space for a K-label classification problem
      • For q clusters, we get a q-1 dimensional discriminant space D
      • Distance calculations and hence the calculations of the covariance matrices are done in the space D
sensorimotor system level building element lbe
S: Spatial Sensory Mapping

T: Spatiotemporal Sensory Mapping

Each internal and external action output feeds back into the sensory input

M: Motor mapping generates concise representations for stereotyped actions, selects primed context with the highest confident index

Sensorimotor System: Level Building Element (LBE)
sensorimotor system level building element lbe23
Sensorimotor System: Level Building Element (LBE)
  • Priority Updating Queue (PUQ) used for the far future predictions F
    • At every time instant, put the selected primed context p(t) in the PUQ, remove the oldest entry.
    • Update each entry in the Queue (beginning with the newest entry)
      • Inspired by the Q-Learning Algorithm
      • Information embedded in the future primed context p(t+1)

is back-propageted into earlier primed contexts.

        • F can be seen as average future context
sensorimotor system multilevel architecture
Sensorimotor System: Multilevel Architecture
  • Low-level architecture uses fine time steps
    • Higher level can become more abstract
  • Use low-level primed context as input for the high-level LBE
  • Same architecture is used for the different levels
  • Use the levels for different sensory integration (vision, audio…)
amd teaching the robot
AMD: Teaching the Robot
  • Internal Representation of the Robot can not be accessed at after creation
  • Supervised Learning:
    • Human imposes action by buttons or directly manipulating the robot
    • Set the value of an imposed action to a high value
  • Reinforcement Learning:
    • Human gives rewards for the action (good = 1, bad = -1) by two buttons
  • Cummunicative Learning:
    • Desired Action
    • Wether the current action is good
    • Rules to follow
    • Criterea to judge right and wrong
experiments used robots

Single robot arm, wheele driven

13 DOF


Wheele driven base, humanoid torso

43 DOF

Sensors: Stereo-cameras, microphones, laser range scanner, touch sensors

Experiments: Used Robots
experiment vision guided navigation
Experiment:Vision-guided navigation
  • Indoor navigation task using SAIL
  • Human teacher navigated robot through corridors: supervised learning
  • After 4 trips robot navigated autonomously, teacher had to hand push in certain situations
  • After 10 trips the robot managed to navigate without help
  • Experiment was repeated outdoors with limited success
experiment learning from novelty and rewards
Experiment:Learning from Novelty and Rewards
  • Define Novelty Measure
    • Difference between the primed sensation xp(t) and the actual sensation xl(t)
    • R(t) : reward given from a human
    • Actual reward :
    • Q-Values learnt by Prototype Update Queue
  • Single Level System

is used

experiment learning from novelty and rewards29
Experiment:Learning from Novelty and Rewards
  • 3 actions:
    • Stay at current view
    • Look right (30 °)
    • Look left (30 °)
    • 7 absolute viewing positions
  • Sensory Input:
    • Simulation: 100 x 100 image
    • SAIL: 40 x 30 x 3 x 2 images
  • Experiments:
    • Habituation Effect: Startet with initial positive Q-Value for stay at current scene, after a while roboter becomes bored
    • Integration of novelty and immediate reward:
      • Positive reward for turning left, otherwise negative
        • => always turns left
      • Moving Toy added to the environment in viewing angle 0
        • => Stay in this position
experiments speech learning
Experiments:Speech learning
  • Uses the same developmental architecture
    • Auditory streams have not been segmented or labeled
    • During learning, the entire system must listen to everything
    • No gramatical syntax is envolved
  • Word Recoginition
    • Numbers from 1 to 10
      • 63 persons, 5 utterances per digit (3150 examples)
    • 4 layers in the sensory Mapping Module
    • Input: 13th order Mel-frequency Cepstral Coefficients (MFCCs)
    • Supervised learning
experiments speech learning31
Experiments:Speech learning
  • Selective Attention:
    • Two layers in the sensory mapping module
      • Different temporal integration
    • Attention Control: Choose from one of these layers
    • Learned through Reinforcement Learning
      • Attention is learned according to each word, speaker and utterance
    • Results after 10 Epochs:
      • Tail of one and seven are quite similiar
        • => Take 2nd layer
experiment action chaining
Experiment:Action Chaining
  • Action Chaining:
    • CC, CS1, CS2 : Voice commands
    • AS1, AS2 : Actions
    • Conditioning Problem
  • Multi Level LBE are used
      • Pure reinforcment learning does not work well due to the lack of generalization
      • 2nd level gets the averaged version of the future context as input (the F context) => generalization over the current context
experiment action chaining33
Experiment:Action Chaining
  • Reinforcement Learning in the two levels
    • Lower Level proposes action, action is only executed if Q2 > 0, otherwise no action is executed
  • Experiment:
    • 4 primitive actions:
    • Behavior establishment: Supervised Learning
    • Action Chaining: „Start“, „one“, „two“, „three“, „four“
      • Success: „Start“ -> execute actions
    • Experiment was repeated 20 times
experiment range based navigation
Experiment:Range-Based Navigation
  • Input: Laser Scanner
    • 360 laser rays (0.5° resolution)
    • Programmed Attention control:
      • If all readings are larger than treshold T, no special attention is needed because all objects are far away
      • If some readings are lower than T, only pass this readings, replace the other values through the average value.
  • Simulation experiments:
    • Supervised learning
    • 16 scenarios, 1157 examples
    • With attention control: performed successfully a 5 minutes run
  • Result was also tested on the Dav robot
    • 15 minutes run in a crowded corridor without collosion
summary conclusion
  • New area in robotic where the task is not given to the developer
    • No human bias
    • Can deal with uncontrolled environments, automatically extendable
    • Human can only adjust the behavior by teaching
  • Only way to control robots in unknown domains
  • Good methods for dealing with high dimensional input
  • Problematic to apply to highly accurate control tasks (humanoid robots)
  • General Theory
    • [Lungrella 2004] „Beyond Gazing, Pointint and Reachng, A Survey of Developmental Robotics“
    • [Fong 2003] „A Survey of socially interactive robots“, Robotics and Autonomous systems
    • [Metta 2000] „Babybot: A study Into sensorimotor development“, PhD Thesis
    • [Pfeifer 1999] „Understanding Intelligence“, MIT Press
    • [Sporns 2002] „Embodied cognition“, MIT Handbook of brain Theory and Neual Networks
    • [Lungrella 2003] „Learning to bounce: first lesson from a bouncing robot“, In Proc. Of the 4th Int. Conference on Simulation of Adaptive Motion in Animals and Machines
  • Autonomous Mental Development (Papers found on
    • People Involved: J.Weng, Y. Zhang, W. Hwang
    • [Weng 2004] „Developmental Robotics: Theory and Experiments''  International Journal of Humanoid Robotics
    • [Weng 2002] „A Theory for Mentally Developing Robots,''  in Proc. 2nd International Conference on Development and Learning
    • [Weng 2004] A Theory of Developmental Architecture,  in Proc. 3rd International Conference on Development and Learning (ICDL 2004)
  • Experiments
    • [Zhang 2002] „Action Chaining by a Developmental Robot with a Value System,''  in Proc. 2nd International Conference on Development and Learning,
    • [Huang 2002] "Novelty and Reinforcement Learning in the Value System of Developmental Robots,"  in Proc. Second International Workshop on Epigenetic Robotics
    • [Zeng 2004] „Obstacle Avoidance through Incremental Learning with Attention Selection,''  in Proc. IEEE Int'l Conf. on Robotics and Automation,
    • [Zhang 2001] „Grounded Auditory Development by a Developmental Robot,''  in Proc. INNS/IEEE International Joint Conference of Neural Networks 2001 (IJCNN 2001)
  • Sensory Mapping
    • [Zhang 2002] „A Developing Sensory Mapping for Robots“,  in Proc. 2nd International Conference on Development and Learning
    • [Weng 2003] „Candid Covariance-free Incremental Principal Component Analysis,'' IEEE Trans. Pattern Analysis and Machine Intelligence
  • Cognitive Mapping
    • [Hwang 2000] „Hierarchical Discriminant Regression'', IEEE Trans. Pattern Analysis and Machine Intelligence
    • [Weng 2000] „An incremental learning algorithm with automatically derived discriminating features'',  in Proc. Asian Conference on Computer Vision
the end
The End
  • Thank You !