Soar One-hour Tutorial

Soar One-hour Tutorial John E. Laird University of Michigan March 2009 http://sitemaker.umich.edu/soar laird@umich.edu Supported in part by DARPA and ONR

Tutorial Outline • Cognitive Architecture • Soar History • Overview of Soar • Details of Basic Soar Processing and Syntax • Internal decision cycle • Interaction with external environments • Subgoals and meta-reasoning • Chunking • Recent extensions to Soar • Reinforcement Learning • Semantic Memory • Episodic Memory • Visual Imagery

How can we build a human-level AI? History Talking on cell phone Shopping Calculus Tasks Sudoku Driving Reading Learning Brain Structure Neural Circuits Neurons

How can we build a human-level AI? History Talking on cell phone Shopping Programs Calculus Tasks Sudoku Driving Reading Learning Computer Architecture Brain Structure Neural Circuits Logic Circuits Neurons Electrical circuits 4

How can we build a human-level AI? History Talking on cell phone Shopping Programs Calculus Symbolic Long-Term Memories Tasks Sudoku Driving Reading Procedural Episodic Semantic Learning Chunking Computer Architecture Brain Structure Semantic Learning Episodic Learning Reinforcement Learning Symbolic Short-Term Memory Decision Procedure Cognitive Architecture Appraisals Neural Circuits Logic Circuits Imagery Perception Action Neurons Electrical circuits 5

Cognitive Architecture Knowledge Goals Architecture Body Task Environment Fixed mechanisms underlying cognition • Memories, processing elements, control, interfaces • Representations of knowledge • Separation of fixed processes and variable knowledge • Complex behavior arises from composition of simple primitives Purpose: • Bring knowledge to bear to select actions to achieve goals Not just a framework • BDI, NN, logic & probability, rule-based systems Important constraints: • Continual performance • Real-time performance • Incremental, on-line learning

Common Structures of manyCognitive Architectures Declarative Long-term Memory Procedural Long-term Memory Declarative Learning Procedure Learning Short-term Memory Goals Action Selection Perception Action

Different Goals of Cognitive Architecture • Biological plausibility: Does the architecture correspond to what we know about the brain? • Psychological plausibility: Does the architecture capture the details of human performance in a wide range of cognitive tasks? • Functionality: Does the architecture explain how humans achieve their high level of intellectual function? • Building Human-level AI

Short History of Soar 1980 1985 1995 2000 2005 1990 Modeling New Capabilities Virtual Agents Learning from Experience, Observation, Instruction Multi-method Multi-task problem solving Subgoaling Chunking Integration Large bodies of knowledge Teamwork Real Application UTC Natural Language HCI External Environment Pre-Soar Problem Spaces Production Systems Heuristic Search Functionality

Distinctive Features of Soar • Emphasis on functionality • Take engineering, scaling issues seriously • Interfaces to real world systems • Can build very large systems in Soar that exist for a long time • Integration with perception and action • Mental imagery and spatial reasoning • Integrates reaction, deliberation, meta-reasoning • Dynamically switching between them • Integrated learning • Chunking, reinforcement learning, episodic & semantic • Useful in cognitive modeling • Expanding this is emphasis of many current projects • Easy to integrate with other systems & environments • SML efficiently supports many languages, inter-process

System Architecture Soar Kernel Soar 9.0 Kernel (C) gSKI Higher-level Interface (C++) Encodes/Decodes function calls and responses in XML (C++) KernelSML SML Soar Markup Language Encodes/Decodes function calls and responses in XML (C++) ClientSML Wrapper for Java/Tcl (Not needed if app is in C++) SWIG Language Layer Application Application (any language)

Operator ? ? Soar Basics Agent in new state Agent in real or virtual world Agent in new state • Operators: Deliberate changes to internal/external state • Activity is a series of operators controlled by knowledge: • Input from environment • Elaborate current situation: parallel rules • Propose and evaluate operators via preferences: parallel rules • Select operator • Apply operator: Modify internal data structures: parallel rules • Output to motor system

Select Operator Apply Operator Elaborate State Elaborate Operator Input Apply Output Propose Operators Evaluate Operators Basic Soar Architecture Long-Term Memory Procedural Chunking Symbolic Short-Term Memory Decision Procedure Perception Action Body Decide

Production Memory East North South Working Memory Soar 101: Eaters Input Propose Operator Input Output Propose Operator Evaluate Operators Apply Operator Output Select Operator Evaluate Operators Apply Operator Select Operator If operator <o1> will move to a empty cell --> operator <o1> < If operator <o1> will move to a bonus food and operator <o2> will move to a normal food, --> operator <o1> > <o2> If an operator is selected to move <d> --> create output move-direction <d> If cell in direction <d> is not a wall, --> propose operator move <d> move-direction North North > East South < North > East South > East North = South

(s1 ^block b1 ^block b2 ^table t1) (b1 ^color blue ^name A ôntop b2 ^size 1 ^type block ^weight 14) (b2 ^color yellow ^name B ôntop t1 ^size 1 ^type block ûnder b1 ^weight 14) (t1 ^color gray ^shape square ^type table ûnder b2) yellow b1 ^color ^block B ûnder ^name ^size ^block S1 b2 1 ^type ôntop block ^table ^weight t1 14 Example Working Memory A B Working memory is a graph. All working memory elements must be “linked” directly or indirectly to a state.

Select Operator Select Operator Apply Operator Apply Operator Elaborate State Elaborate State Elaborate Operator Elaborate Operator Output Input Input Apply Apply Output Propose Operators Propose Operators Evaluate Operators Evaluate Operators Soar Processing Cycle Decide Rules Impasse Subgoal Decide

TankSoar Red Tank’s Shield Borders (stone) Walls (trees) Health charger Missile pack Blue tank (Ouch!) Energy charger Green tank’s radar

Wander Move Turn Soar 103: Subgoals Input Input Select Operator Select Operator Output Output Propose Operator Propose Operator Compare Operators Compare Operators Apply Operator Apply Operator If enemy not sensed, then wander

Shoot Soar 103: Subgoals Input Output Propose Operator Compare Operators Apply Operator Select Operator If enemy is sensed, then attack Attack

TacAir-Soar [1997] Controls simulated aircraft in real-time training exercises (>3000 entities) Flies all U.S. air missions Dynamically changes missions as appropriate Communicates and coordinates with computer and human controlled planes Large knowledge base (8000 rules) No learning

If instructed to intercept an enemy then propose intercept If intercepting an enemy and the enemy is within range ROE are met then propose employ-weapons Intercept Employ Weapons If employing-weapons and missile has been selected and the enemy is in the steering circle and LAR has been achieved, then propose launch-missile Launch Missile If launching a missile and it is an IR missile and there is currently no IR lock then propose lock-IR Lock IR TacAir-Soar Task Decomposition Execute Mission Intercept Fly-route Fly-Wing Ground Attack Achieve Proximity Employ Weapons Search Scram Execute Tactic Get Missile LAR Select Missile Launch Missile Get Steering Circle Sort Group Lock Radar Lock IR Fire-Missile Wait-for Missile-Clear >250 goals, >600 operators, >8000 rules

Impasse/Substate Implications: • Substate is really meta-state that allows system to reflect • Substate = goal to resolve impasse • Generate operator • Select operator (deliberate control) • Apply operator (task decomposition) • All basic problem solving functions open to reflection • Operator creation, selection, application, state elaboration • Substate is where knowledge to resolve impasse can be found • Hierarchy of substate/subgoals arise through recursive impasses

Tie Impasse East North South = 10 Tie Subgoals and Chunking Input Input Propose Operator Output Propose Operator Evaluate Operators Apply Operator Select Operator Select Operator Evaluate Operators Chunking creates rules that create preferences based on what was tested North > East South > East North = South Evaluate-operator (North) Evaluate-operator (South) Evaluate-operator (East) = 5 = 10 = 10 Chunking creates rule that applies evaluate-operator North = 10

Chunking Analysis • Converts deliberate reasoning/planning to reaction • Generality of learning based on generality of reasoning • Leads to many different types learning • If reasoning is inductive, so is learning • Soar only learns what it thinks about • Chunkingis impasse driven • Learning arises from a lack of knowledge

Extending Soar Episodic Episodic Semantic Semantic Symbolic Long-Term Memories Procedural Semantic Learning Semantic Learning Episodic Learning Episodic Learning Reinforcement Learning Chunking Reinforcement Learning Appraisal Detector Appraisal Detector Symbolic Short-Term Memory Decision Procedure Visual Imagery Visual Imagery Clustering Clustering Perception Action Body • Learn from internal rewards • Reinforcement learning • Learn facts • What you know • Semantic memory • Learn events • What you remember • Episodic memory • Basic drives and … • Emotions, feelings, mood • Non-symbolic reasoning • Mental imagery • Learn from regularities • Spatial and temporal clusters

Theoretical Commitments Stayed the Same Changed Multiple long-term memories Multiple learning mechanisms Modality-specific representations & processing Non-symbolic processing Symbol generation (clustering) Control (numeric preferences) Learning Control (reinforcement learning) Intrinsic reward (appraisals) Aid memory retrieval (WM activation) Non-symbolic reasoning (visual imagery) • Problem Space Computational Model • Long-term & short-term memories • Associative procedural knowledge • Fixed decision procedure • Impasse-driven reasoning • Incremental, experience-driven learning • No task-specific modules

Reinforcement LearningShelly Nason

RL in Soar Perception Reward Internal State Action Update Value Function Value Function Action Selection Encode the value function as operator evaluation rules with numeric preferences. Combine all numeric preferences for an operator dynamically. Adjust value of numeric preferences with experience.

The Q-function in Soar The value-function is stored in rules that test the state and operator, and create numeric preferences. sp {rl-rule (state <s> ^operator <o> +) …--> (<s> ^operator <o> = 0.34)} Operator Q-value = the sum of all numeric preferences. Selection: epsilon greedy, or Boltzmann epsilon-greedy: With probability ε the agent selects an action at random. Otherwise the agent takes the action with the highest expected value. [Balance exploration/exploitation] O1: {.34, .45, .02} = 8.1 O2: {.25, .11, .12} = 4.8 O3: {-.04, .14, -.05} = .05

Updating operator values r = reward = .2 R1(O1) = .20 Sarsaupdate:Q(s,O1)  Q(s,O1) + α[r + λQ(s’,O2) – Q(s,O1)] .1 * [.2 + .9*.11 - .33] = -.03 Update is split evenly between rules contributing to O1 = -.01. R1 = .19, R2 = .14, R3 = -.03 R2(O1) = .15 O1 = .33 O2= .11 R3(O1)= -.02 Q(s,O1) = sum of numeric prefs. Q(s’,O2) = sum of numeric prefs. of selected operator (O2)

Results with Eaters

RL TankSoar Agent

Semantic MemoryYongjia Wang

Memory Long Term Memory Short Term Memory Declarative Procedural Perceptual Representation System Procedural Memory Working Memory Semantic Memory Episodic Memory Memory Systems

Declarative Memory Alternatives • Working Memory • Keep everything in working memory • Retrieve dynamically with rules • Rules provide asymmetric access • Data chunking to learn (complex) • Separate Declarative Memories • Semantic memory (facts) • Episodic memory (events)

Basic Semantic Memory Functionalities • Encoding • What to save? • When to add new declarative chunk? • How to update knowledge? • Retrieval • How the cue is placed and matched? • What are the different types of retrieval? • Storage • What are the storage structures? • How are they maintained?

state Semantic Memory Functionalities AutoCommit Working Memory Semantic Memory Feature Match B A A C Retrieval Expand Cue Save Cue E NIL NIL Save NIL D E F Expand A B Save Update with Complex Structure E D F E Remove-No-Change

Episodic Memory Andrew Nuxoll

Memory Long Term Memory Short Term Memory Declarative Procedural Perceptual Representation System Procedural Memory Working Memory Semantic Memory Episodic Memory Memory Systems

Episodic vs. Semantic Memory • Semantic Memory • Knowledge of what we “know” • Example: what state the Grand Canyon is in • Episodic Memory • History of specific events • Example: a family vacation to the Grand Canyon

Characteristics of Episodic Memory: Tulving • Architectural: • Does not compete with reasoning. • Task independent • Automatic: • Memories created without deliberate decision. • Autonoetic: • Retrieved memory is distinguished from sensing. • Autobiographical: • Episode remembered from own perspective. • Variable Duration: • The time period spanned by a memory is not fixed. • Temporally Indexed: • Rememberer has a sense of when the episode occurred.

Implementation Long-term Procedural Memory Production Rules Encoding Initiation? Storage Retrieval Cue Output Working Memory Input Retrieved When the agent takes an action.

Current Implementation Long-term Procedural Memory Production Rules Encoding Initiation Content? Storage Retrieval Cue Output Working Memory Input Retrieved The entire working memory is stored in the episode

Current Implementation Long-term Procedural Memory Production Rules Episodic Memory Encoding Initiation Content Storage Episode Structure? Retrieval Cue Output Working Memory Episodic Learning Input Retrieved Episodes are stored in a separate memory

Current Implementation Long-term Procedural Memory Production Rules Episodic Memory Encoding Initiation Content Storage Episode Structure Retrieval Initiation/Cue? Cue Output Working Memory Episodic Learning Input Retrieved Cue is placed in an architecture specific buffer.

Current Implementation Long-term Procedural Memory Production Rules Episodic Memory Encoding Initiation Content Storage Episode Structure Retrieval Initiation/Cue Retrieval Cue Output Working Memory Episodic Learning Input Retrieved The closest partial match is retrieved.

? Cognitive Capability: Virtual Sensing • Retrieve prior perception that is relevant to the current task • Tank recursively searches memory • Have I seen a charger from here? • Have I seen a place where I can see a charger?

Virtual Sensors Results

Retrieve the best matching memory Retrieve the nextmemory Use the change in score to evaluate the proposed action Move North = 10 points East North South Episodic Retrieval Retrieve Next Memory Cognitive Capability: Action Modeling Agent attempts to choose direction Agent’s knowledge is insufficient - impasse Evaluate moving in each available direction Create a memory cue

Episodic Memory:Multi-Step Action Projection [Andrew Nuxoll] • Learn tactics from prior success and failure • Fight/flight • Back away from enemy (and fire) • Dodging

Soar One-hour Tutorial

Soar One-hour Tutorial

Presentation Transcript

SOAR

SOAR

NL-Soar tutorial

Soar One-hour Tutorial

Taster workshop: One hour

The One Hour Essay

GOES One-hour Winds Product

SOAR!

SOAR

One Hour Cash Loans – Obtain Quick Finance Within One Hour

Soar...

One-Hour Witnessing Workshop

Tutorial One

SOAR

Parallel Python (2 hour tutorial)

SOAR

One Hour Tour

SOAR

SOAR

SOAR

One-Hour Witnessing Workshop

SOAR