A Biologically Inspired Adaptive Working Memory for Robots Marjorie Skubic and James M. Keller University of Missouri-Columbia David Noelle, Mitch Wilkes and Kazuhiko Kawamura Vanderbilt University
Outline • The role of working memory in cognitive systems • Incorporating a human-inspired WM into robots • Enabling components for robotic embodiment • Central Executive • Interactive Spatial Language • SIFT Object Recognition • Pre-attentive Vision System • Conclusions • Demo available
Working Memory Working memory systems are those that actively maintain transient information that is critical for successful decision-making in the current context. A working memory system can be viewed as a relatively small cache of task relevant information that is strategically positioned to efficiently influence behavior.
Robotic Working Memory Could robot control systems benefit from the inclusion of a working memory system? • The highly limited capacity of working memory, along with its tight coupling with deliberation mechanisms, might alleviate the need for costly memory searches. • Information needed to fluently perform the current task is temporarily kept “handy” in the working memory store. Can computational neuroscience models of the working memory mechanisms of the human brain shed light on the design of a robotic working memory system?
Potential Uses • Focus attention on the most relevant features of the current task. • Guide perceptual processes by limiting the perceptual search space. • Provide a focused short-term memory to prevent the robot from being confused by occlusions. • Provide robust operation in the presence of distracting irrelevant events.
Adaptive Working Memory How does the working memory system know when a given chunk of information should be actively maintained in working memory? • Hand Coding – For relatively routine and well understood tasks, designers may hand code procedures for the identification of useful chunks. • Learning – If the robot is expected to flexibly respond in novel task situations, or even acquire new tasks, it would be beneficial to have a means to learn when to store a particular chunk in working memory. The central focus of this project is on assessing the utility of adaptive working memory mechanisms for robot control.
Adaptive Working MemoryIn The Brain • A number of brain regions are implicated as important components of the human working memory system. • One important region is dorsolateral portions of prefrontal cortex. • Working memory is exhibited in delay period activity. • Cells have been found which encode for locations, visual features, and association rules.
Recurrence • How are high neural firing rates sustained over a delay? • Mutual excitation of neurons. • Dense recurrent connections inprefrontal cortex. Stripe sets. • Attractor network computational models.
Controlling Updating How does the working memory system know when to actively maintain a given chunk? How does it know when to abandon a previously maintained chunk? The dynamics of recurrent attractor networks are insufficient to meet the simultaneous constraints of (1) active maintenance in the face of distraction and (2) rapid updating when needed. A dynamic gating mechanism is needed.
Temporal Difference (TD) Learning Change in expected reward is called the temporal difference (TD) error (delta). It is the value that drives learning in a powerful form of reinforcement learning called Temporal Difference (TD) Learning.
The Actor-Critic Framework(Barto, Sutton, & Anderson, 1983) Fixed Critic (reinforcer) Sensory System r Adaptive Critic (value function) External Environment Actor (policy function) Motor System
TD & Neural Networks TD(0) may be implemented in a connectionist framework, allowing for large continuous state and action spaces and generalization to novel states. Critic: Actor: Sensory Inputs Sensory Inputs Actions The delta value may be used as the error signal for an adaptive critic network learning to produce and also as the error signal for a competitive actor network which implements the policy.
Dopamine & Working Memory • The dopamine system may be encoding a TD error signal which is useful for learning sequential behaviors. (Montague, Dayan, & Sejnowski) • If the dopamine system can be used to learn to choose overt actions, why couldn't it be used to choose covert actions, such as deciding when to close the gate on working memory contents?(Braver & Cohen) • There are extensive dopamine projections to PFC. • There is some evidence that dopamine may influence PFC neurons in a manner consistent with “gating”.
The Working Memory Toolkit • Memory traces or chunks will be pointers to arbitrary C++ data structures. • The adaptive working memory toolkit will require the user to specify: • the capacity of the working memory • a function which extracts features from chunks • a function which provides relevant features of the current system state, including candidate chunks • a function which provides instantaneous external reward information • The toolkit provides a function for examining the contents of working memory, returning chunk pointers.
Critical Related Technologies • Feature extraction is critical for success! • Advances in perception systems are needed to extract appropriate high level features from experiences. • Guide attention to relevant aspects of experiences. • Identify features associated with objects or object categories. • Identify important qualitative spatial relationships. • Advances in motor control systems are needed to fully leverage the benefits of an adaptive working memory.
Enabling components for robotic embodiment • Central Executive
A Humanoid Cognitive Robot • A cognitive robot has the capacity to reflect and generalize to new situations in a complex, changing world. • Toward this goal, we have implemented numerous memory structures within an agent-based system. ISAC
Multiagent-based Cognitive Robot Architecture Central Executive In this project, we concentrate on the Central Executive (CE) and the Working Memory System (WMS) which are two key elements of Cognitive Control
Mechanism for intelligent behavior selection and control Behaviors are selected based on task context and past experience Central Executive (CE) Selects and loads candidate chunks (behaviors) into the WM Controls task execution of loaded behaviors Evaluates and updates criteria for selection and control Working Memory System (WMS) Maintains task related info Focuses on execution of current task Cognitive Control
Cognitive Control Current State Sensor WMS Command State State Behavior Behavior w w 1 1 Estimator Estimator Controller Controller Action ISAC w w 2 2 w w N N Task Relevancy TD Learning Behavior Selector Central Executive Legend PM=Procedural Memory DM=Declarative Memory SES=Sensory Ego - Sphere STM STM LTM LTM AN=Attention Network TD=Temporal - Difference WMS=Working Memory System PM PM DM DM SES / AN SES / AN LTM=Long - Term Memory STM=Short - Term Memory Action Selection in a Cognitive Robot Working Memory • Behaviors are loaded into the WMS based on past experience • A behavior consists of a State Estimator which predicts the next system state, and a Controller which issues actual motor commands. • Action Selection • Behaviors are executed based on goal related information
Goal Position Initial WM Experiment • A set of task-related behaviors is taught to ISAC. • For the task, ISAC is asked to reach to a point on the table. ISAC must select correct behaviors and combine them in order to perform the task successfully • Later, ISAC will be asked to identify and point to an object on the table Blue lines denote loaded candidate behavior motions. Red dotted line denotes final behavior motion
Enabling components for robotic embodiment • Central Executive • Interactive Spatial Language
Interactive Spatial Language • Cognitive models indicate that people use spatial relationships in navigation and other spatial reasoning (Previc, Schunn) • More natural interaction with robots • Spatial language can be used to: • Focus attention • “look to the left of the telephone” • Issue commands • “pick up the book on top of the desk” • Describe a high level representation of a task • “go behind the counter, find my coffee cup on the table, and bring it back to me” • Receive feedback from the robot describing the environment • “there is a book on top of the desk to the right of the coffee cup
Our Spatial Modeling Tool Capturing qualitative spatial information between 2 objects The histogram of gravitational forces The histogram of constant forces Matsakis et al. 1999, 2001 Features extracted from the histograms are used to generate linguistic spatial terminology
Linguistic Scene Description Example The tower is perfectly to the LEFT of the stack buildings.
The group of storehouses is loosely ABOVE-LEFT of the stack buildings. Linguistic Scene Description Example
The storehouse is ABOVE the stack buildings, but shifted to the RIGHT. Linguistic Scene Description Example
HRI Scenario: Scene 1 Human: “How many objects do you see?” Robot: “I am sensing 4 objects.” Human: “Where are they?” Robot: “There are objects in front of me. I am surrounded from the rear. The object # 4 is to the right of me.” Human: “Where is the nearest object in front of you?” Robot: “The object # 3 is in front of me but extends to the right relative to me. The object is very close.”
HRI Scenario: Scene 1, cont. Human: “Object #3 is a box.” Robot: “I now know object#3 is a box.” Human: “Go to the right of the box.”
HRI Scenario: Scene 2 The robot has moved to the right of the box. The pillar has now come into view (object #4)
JK’s Special Blend Extend to 3D by Combining the Horizontal and Vertical Planes • Look for the coffee cup on top of the desk to the right of the computer. • Continue user studies to validate the algorithms
JK’s Special Blend Use WM to Find Jim’s Coffee Cup Use the working memory toolkit to test a global spatial representation vs. a relational spatial representation
Enabling components for robotic embodiment • Central Executive • Interactive Spatial Language • SIFT Object Recognition
Scale Invariant Feature Transform (SIFT) for Object Recognition Based on the work by David Lowe • Find features that are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine transformations or 3D projection • Create Keypoints from extrema in scale space • Generate relative position features (naturally translation invariant) • Compute directional histograms that are invariant to rotation • Method of calculation also gives insensitivity to affine stretches • Normalization helps with Illumination Changes
Hunt for local extrema in space and scale Gaussian Blurring and Differencing Keypoint locations on training image Keypoint Descriptions Directional Histograms Sixteen Gradient Histograms Created • Major direction of gradients is determined • Rotate gradient locations so that keypoint orientation is 0º. • Rotate individual gradient directions to be consistent with orientation
Recognition Examples Top Images Are Training; Bottom Are Test Still matches Keypoints on occluded objects
Keypoints Matching Stereo Vision Left Eye Right Eye
3D Representation for Spatial Relations 3D keypoints projected onto the horizontal and vertical planes The scene
Can We Use WM to Learn Interesting Landmarks? Use Keypoint Clusters to Determine Potential Areas of Interest Must eliminate the concentration of keypoints along the skyline
Enabling components for robotic embodiment • Central Executive • Interactive Spatial Language • SIFT Object Recognition • Pre-attentive Vision System
Pre-attentive Vision System Goals • Learn broad categories of objects from experience. • Be able to explain how it makes decisions, as well as to justify any particular decision. • Detect if there are novel elements in a visual scene, and use this to trigger new learning, i.e., self-directed learning. • After making a general class identification, use other object recognition algorithms to identify a specific object.
Elements of Pre-attentive Vision System • Feature vectors consist of a color histogram of 250 colors and a measure of texture roughness, 251 features total • Fuzzy rules extracted from training data • ML estimator for classes • Perceptual memory of past experiences • Interaction interface for teaching and assessment
Train the system on the empty scene. Add new elements to the scene. Identify the new elements by novelty. Novelty Detection
ML Segmentation Yellow = Sidewalk, Blue = Grass, Red = Tree, Green = Artificial Landmark
sky trees gravel WM Experiment • Pre-attentive processing significantly reduces the search space for other algorithms such as SIFT. • Use WM to learn the most successful pre-attentive identifications, e.g., which lead to the greatest success in reaching a navigational goal.