Pat Langley Computational Learning Laboratory Center for the Study of Language and Information

A Cognitive Architecture for Physical Agents Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California http://cll.stanford.edu/ Thanks to Dongkyu Choi, Kirstin Cummings, Negin Nejati, Seth Rogers, Stephanie Sage, and Daniel Shapiro for contributions to this research.

Assumptions about Cognitive Architectures A cognitive architecture specifies the infrastructure that holds constant over domains, as opposed to knowledge, which varies. A cognitive architecture focuses on functional structures and processes, not on the knowledge or implementation levels. A cognitive architecture commits to representations and organizations of knowledge and processes that operate on them. A cognitive architecture comes with a programming language for encoding knowledge and constructing intelligent systems. A cognitive architecture should demonstrate generality and flexibility rather than success on a single application domain.

Examples of Cognitive Architectures Some cognitive architectures produced over 30 years include: ACTE through ACT-R (Anderson, 1976; Anderson, 1993) Soar (Laird, Rosenbloom, & Newell, 1984; Newell, 1990) PRODIGY (Minton & Carbonell., 1986; Veloso et al., 1995) PRS (Georgeff & Lansky, 1987) 3T (Gat, 1991; Bonasso et al., 1997) EPIC (Kieras & Meyer, 1997) APEX (Freed et al., 1998) However, these systems cover only a small region of the space of possible architectures.

Goals of the ICARUS Project We are developing ICARUS, a new cognitive architecture that: focuses on physical and embodied agents; integrates perception and action with cognition; unifies reactive execution with problem solving; combines symbolic structures with numeric utilities; learns structures and utilities in a cumulative manner. In this talk, I report on our recent progress toward these goals.

Some Target Phenomena We intend for ICARUS to model high-level phenomena rather than detailed performance effects. For instance, we know that, in complex domains, humans: typically continue tasks to completion but can shift to other tasks if advantageous; prefer to engage in routine behavior but can solve novel problems when required; learn new concepts and skills in a cumulative fashion that builds on the results of previous learning. Such issues have received much less attention than they deserve.

Theoretical Commitments of ICARUS Our designs for ICARUS have been guided by six principles: Cognitive reality of physical objects Cognitive separation of categories and skills Primacy of categorization and skill execution Hierarchical organization of long-term memory Correspondence of long-term/short-term structures Modulation of symbolic structures with utility functions These ideas distinguish ICARUS from most other architectures.

Perceptual Buffer Overview of the ICARUS Architecture* Short-Term Conceptual Memory Long-Term Conceptual Memory Categorization and Inference Perception Environment Skill Retrieval Long-Term Skill Memory Short-Term Skill Memory Means-Ends Analysis Skill Execution Motor Buffer * without learning

Some Concepts from the Blocks World (on (?block1 ?block2) :percepts ((block ?block1 xpos ?x1 ypos ?y1) (block ?block2 xpos ?x2 ypos ?y2 height ?h2)) :tests ((equal ?x1 ?x2) (>= ?y1 ?y2) (<= ?y1 (+ ?y2 ?h2))) ) (clear (?block) :percepts ((block ?block)) :negatives ((on ?other ?block)) ) (unstackable (?block ?from) :percepts ((block ?block) (block ?from)) :positives ((on ?block ?from) (clear ?block) (hand-empty)) )

Primitive Skills from the Blocks World (pickup (?block ?from) :percepts ((block ?block xpos ?x) (table ?from height ?h)) :start ((pickupable ?block ?from)) :requires ( ) :actions ((* move ?block ?x (+ ?h 10))) :effects ((holding ?block)) :value 1.0 ) (stack (?block ?to) :percepts ((block ?block) (block ?to xpos ?x ypos ?y height ?h)) :start ((stackable ?block ?to)) :requires ( ) :actions ((* move ?block ?x (+ ?y ?h))) :effects ((on ?block ?to) (hand-empty)) :value 1.0 )

A Nonprimitive Skill from the Blocks World (puton (?block ?from ?to) :percepts ((block ?block) (block ?from) (table ?to)) :start ((ontable ?block ?from) (clear ?block) (hand-empty) (clear ?to)) :requires ( ) :ordered ((pickup ?block ?from) (stack ?block ?to)) :effects ((on ?block ?to)) :value 1.0 ) (puton (?block ?from ?to) :percepts ((block ?block) (block ?from) (block ?to)) :start ((on ?block ?from) (clear ?block) (hand-empty) (clear ?to)) :requires ( ) :ordered ((unstack ?block ?from) (stack ?block ?to)) :effects ((on ?block ?to)) :value 1.0 )

Hierarchical Organization of Memory ICARUS’ long-term memories are organized into hierarchies: Conceptual memory is similar to a Rete network, but each node represents a meaningful category. Different expansions for skills and concepts also make them similar to Horn clause programs. These hierarchies are encoded by direct reference, rather than through working-memory elements, as in ACT and Soar. concepts can refer to percepts and to other concepts; skills refer to percepts, to concepts, and to other skills.

ICARUS’ Short-Term Memories short-term skill memory (deliver-package g029) (avoid-collisions g001) short-term concept memory (ahead-right-corner g008) (ahead-left-corner g011) (behind-right-corner g017) (approaching g001 g023) (opposite-direction g001 g023) (parallel-to-line g001 g019) (on-cross-street g001 g029) perceptual buffer (self g001 speed 32 wheel-angle -0.2 fuel-level 0.4) (corner g008 r 15.3 theta 0.25 street-dist 12.7) (corner g011 r 18.4 theta -0.34 street-dist 12.7) (corner g017 r 7.9 theta 1.08 street-dist 5.2) (lane-line g019 dist 1.63 angle -0.07) (street g025 name campus address 1423) (package g029 street panama cross campus address 2134)

Categorization and Inference Perceptual Buffer Short-Term Conceptual Memory Long-Term Conceptual Memory Categorization and Inference Perception On each cycle, perception deposits object descriptions into the perceptual buffer. ICARUS matches its concepts against the contents of this buffer. Categorization proceeds in an automatic, bottom-up manner, much as in a Rete matcher. This process can be viewed as a form of monotonic inference that adds concept instances to short-term memory.

Retrieving and Matching Skill Paths On each cycle, ICARUS finds all paths through its skill hierarchy which: • begin with an instance in skill STM; • have start and requires fields that match; • have effects fields that do not match. Each instantiated path produced in this way terminates in an executable action. ICARUS adds these candidate actions to its motor buffer for possible execution. Perceptual Buffer Short-Term Conceptual Memory Skill Retrieval Short-Term Skill Memory Long-Term Skill Memory Motor Buffer

skills skill expansions Retrieving and Matching Skill Paths Each path through the skill hierarchy starts at an intention and ends at a primitive skill instance.

Evaluating and Executing Skills For each selected path, ICARUS computes a utility by summing the values of each skill along that path. For each path, in order of decreasing utility: • If required resources are available, execute actions; • If executed, commit the resources for this cycle. These actions alter the environment, which affects the perceptual buffer and thus conceptual memory. Environment Short-Term Skill Memory Skill Execution Motor Buffer

Modulating Utilities with Persistence Fully reactive behavior is not always desirable when one is completing a task that requires extended activity. In response, ICARUS retains the most recently executed path for each skill instance in short-term memory, For each matched path, it modulates the path’s utility to be U = U  (1 + p  si=1 k i /  dj=1 k j ) , Where U is the original utility, d is the depth of the candidate path, s is the number of steps it shares with the previous path, p is a persistence factor, and 0 < k < 1 is a decay term. The greater the persistence factor, the greater the agent’s bias toward continuing to execute skills it has already initiated.

An Experimental Study of Persistence To evaluate the influence of persistence on agent behavior, we: • created skills for in-city driving and package delivery; • gave the agent top-level intentions to deliver three packages; • generated five distinct sets of package delivery tasks; • varied the persistence level used in pursuing these tasks. We ran the system on each task and, for each level, averaged the number of cycles ICARUS needed to deliver all the packages. We predicted that an intermediate persistence level would give the best results.

Effects of Persistence on Delivery Time We also found delivery time scaled linearly with size of the city.

When a selected skill cannot be executed, ICARUS invokes a variant of means-ends analysis that: • finds concepts which, if true, would let execution continue; • adds one of the unsatisfied concept instances to a goal stack; • chains off a relevant skill or off the concept’s definition; • backtracks when no further chaining is feasible. Each step takes one cycle and, unlike most AI planning systems, execution occurs whenever possible. Means-Ends Problem Solving Long-Term Skill Memory Short-Term Skill Memory Means-Ends Analysis

An Abstract Means-Ends Trace 11 10 9 8 1 7 6 3 2 5 4

Learning Skills from Means-Ends Traces ICARUS learns hierarchical skills from traces of problem solutions. When it attempts to execute S2 to achieve concept O, but must first execute S1 to achieve the start conditions of S2, it creates a skill with S1 and S2 as subskills and O as the effect. When it attempts to achieve concept O by achieving subconcepts {O1, …, On}, which it does by executing {S1, …, Sn}, it creates a skill with S1, …, Sn as subskills and O as the effect. In both cases, ICARUS also defines new concepts to serve as start conditions for the skills that ensure they have the desired effect. This method is akin to macro learning and production compilation, but it constructs a reactive skill hierarchy rather than flat rules.

Learning Skills from Means-Ends Traces 11 10 9 8 1 A 7 6 3 2 5 4

Learning Skills from Means-Ends Traces 11 10 9 8 1 B A 7 6 3 2 5 4

Learning Skills from Means-Ends Traces C 11 10 9 8 1 B A 7 6 3 2 5 4

Learning Skills from Means-Ends Traces D C 11 10 9 8 1 B A 7 6 3 2 5 4

Learning Skills from Means-Ends Traces E D C 11 10 9 8 1 B A 7 6 3 2 5 4

An Experimental Study of Skill Learning To evaluate ICARUS’ ability to learn hierarchical skills, we: • created primitive concepts and skills for the blocks world; • gave the agent problems in order of increasing complexity; • sampled randomly from 200 different training orders; • ran the architecture with learning turned on and off. For each condition and experience level, we counted the number of cycles needed to solve a problems and averaged the results.

Effects of Skill Learning in Blocks World

Utility Functions in ICARUS Skill expansions in ICARUS have associated utility functions that: Taking entire paths into account produces context effects, in that an action has different utility depending on its calling skills. An earlier version of ICARUS acquired these value functions from delayed reward, using a hierarchical variant of Q learning. describe expected utility in terms of perceived object attributes; are summed along each path through the skill hierarchy; are used to decide which paths to execute on each cycle.

Extended Utility Functions We are implementing an extended version of ICARUS that: associates reward functions with individual concepts; uses skill effects and durations to compute expected values; updates probability of success on completion or abandonment; updates the expected duration of each skill upon completion. This model-based approach to learning from delayed reward should be much faster than standard methods. However, it views reward as internal to the agent, rather than as coming from the environment.

Domains Studied to Date To demonstrate generality, we have developed initial ICARUS programs for a number of domains, including: In-city driving Pole balancing Tower of Hanoi Multi-column subtraction Peg solitaire Each of these connects to a simulated environment that contains objects the system can perceive and effect.

Intellectual Precursors ICARUS’ design has been influenced by many previous efforts: earlier research on integrated cognitive architectures especially influenced by ACT, Soar, and Prodigy earlier work on architectures for reactive control especially universal plans and teleoreactive programs research on learning macro-operators and search-control rules decision theory and decision analysis previous versions of ICARUS (going back to 1988). However, the framework combines and extends ideas from its various predecessors in novel ways.

Future work on ICARUS should introduce additional methods for: Directions for Future Research forward chaining and mental simulation of skills; learning durations and success rates to support lookahead; allocation of scarce resources like perceptual attention; probabilistic encoding and matching of Boolean concepts; flexible recognition of skills executed by other agents; extension of short-term memory to store episodic traces. Taken together, these features should make ICARUS a more general and powerful cognitive architecture.

ICARUS is an integrated architecture for intelligent agents that: Concluding Remarks includes separate memories for concepts and skills; organizes both memories in a hierarchical fashion; modulates reactive execution with persistence; augments routine behavior with problem solving; and learns new skills and concepts in a cumulative manner. This constellation of concerns distinguishes ICARUS from other research on integrated architectures.

End of Presentation

In-City Driving: A Cognitive Task for Embodied Agents

Perceptual Buffer Overview of the ICARUS Architecture* Short-Term Conceptual Memory Long-Term Conceptual Memory Categorization and Inference Perception of Environment Environment Skill Selection/ Execution Nomination of Skills Long-Term Skill Memory Short-Term Skill Memory Abandonment of Skills Means-Ends Problem Solving * without learning

Examples of Long-Term Concepts (corner-ahead-left (?corner) :percepts ((corner ?corner r ?r theta ?theta)) :tests ((< ?theta 0) (>= ?theta -1.571)) :value (+ (* 5.6 ?r) (* 3.1 ?theta)) ) (in-intersection (?self) :percepts ((self ?self) (corner ?ncorner street-dist ?sd)) :positives ((near-block-corner ?ncorner) (corner-straight-ahead ?scorner)) :negatives ((far-block-corner ?fcorner)) :tests ((< ?sd 0.0)) :value -10.0 )

Examples of Long-Term Skills (make-right-turn (?self ?corner) :objective ((behind-right-corner ?corner)) :start ((in-rightmost-lane ?self) (ahead-right-corner ?corner) (at-turning-distance ?corner)) :requires ((near-block-corner ?corner) (at-turning-speed ?self)) :ordered ((begin-right-turn ?self ?corner) (end-right-turn ?self ?corner)) :value 30.0 ) (slow-for-intersection (?self) :percepts ((self ?self speed ?speed) (corner ?corner street-dist ?d)) :objective ((slow-enough-intersection ?self)) :requires ((near-block-corner ?corner) :actions ((* slow-down)) :value (+ (* -5.2 ?d) (* 20.3 ?speed)) )

Retrieving and Matching Skill Paths

Short-Term Conceptual Memory Nomination of Skills Long-Term Skill Memory Short-Term Skill Memory Abandonment of Skills Skill Nomination and Abandonment ICARUS adds skill instances to short-term skill memory that: • refer to concept instances or goals in short-term memory; • have expected utility > agent’s discounted past utility. ICARUS removes a skill when its expected utility<< past utility.

Learning from Skill Subgoaling P = 0.74 D = 50 S10 S7 S8 S9 P = 0.90 D = 16 P = 0.90 D = 20 P = 0.90 D = 14 S2 S3 S4 S6 S1 S5 P = 0.99 D = 6 P = 0.90 D = 10 P = 0.95 D = 12 P = 0.95 D = 8 P = 0.90 D = 8 P = 0.99 D = 6 S10 S7 S8 S9 S1 S2 S3 S4 S5 S6 time

For a recent DARPA project, we are developing systems that: Intelligent Assistance for Office Planning assist users in planning and scheduling trips; assist users in planning and scheduling meetings; accept advice about how to accomplish these tasks; learn new skills from such interactions; infer the preferences of individual users. We hope to extend the ICARUS framework to support these new performance and learning tasks.

Directions in which we need to extend ICARUS include: Some Necessary Extensions Representation time needed to execute skills resources required by each skill plans (specfic instances of expanded skills) episodes (past instances of executed skills) Performance plan generation plan revision advice taking Learning new skills from advice preferences from interactions These additions will make ICARUS a more complete architecture for intelligent agents.

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information