Learning from Language: It takes an architecture

Learning from Language: It takes an architecture Kenneth D. Forbus Qualitative Reasoning Group Northwestern University

Overview • Why language needs cognitive architecture • Language for learning & reasoning • Companion cognitive systems • Software social organisms • Project: Explanation Agent • 2nd generation Learning by reading system • Project: Long-lived Learning Software Collaborators • Interactive learning, longevity • Wrap-up

Goal: Human-level AI • Our approach: Build a software social organism • Organisms maintain themselves, with help from their environment and friends • Organisms adapt • Social organisms learn from their culture • Accumulating lots of experience is crucial • Using language & sketch understanding for accumulating experience & interaction • Human experience >> 4×106 - 1.5×107ground facts • Between 2x and 10x the size of ResearchCyc

Language is a blunt instrument • Ambiguity is ubiquitous • “Where is the explorer?” • Context is crucial • Sources of context • Tasks • Experience • TREC Q: Amount of folic acid a pregnant woman needs per day? A: Six tons • Watson:What is Toronto????

Cognitive Architectures Provide Context • Examples: SOAR, ACT-R, Companions, Icarus, Polyscheme, Clarion, … • Same system, many tasks • Provides framing for learning problems • Types of internal decisions provide learning problems • Ideally: Learn large bodies of knowledge • Still unsolved, e.g. reading a textbook well enough to answer most of the questions in it would be a landmark • Ideally: Work and learn over long periods of time • Still unsolved: Most architectures are used for short experiments and then rebooted

The Analogical Mind • SME compares examples or generalizations to new situations • MAC/FAC retrieves experiences and generalizations from LTM • SAGE incrementally produces generalizations • Generalizations are probabilistic, partially abstracted, without logical variables SME MAC/FAC Knowledge Base (semantic + experiences+ generalizations) SAGE Essence of the Companion cognitive architecture

Companion Cognitive Architecture DomainReasoning & Learning • Analogy is central • SME matching • MAC/FAC for retrieval • SAGE for generalization • Engineering approximations • Agent-based decomposition • Can run on cluster or desktop • Logic-based TMS for working memory • HTN planning & execution system Language & Sketch Understanding LTM, uses MAC/FAC and SAGE Interaction Manager Tickler Self-monitoring & Self-modeling Session Manager UI, Debugging tools Session Reasoner Executive ResearchCyccontents + other initial endowment Companion cognitive system

Practical Language Approach(cf. James Allen’s Practical Dialogue approach) • Goal: Language as a practical interaction modality • Sacrifice syntactic breadth to focus on semantic breadth • Use simplified syntax • Simple English Wikipedia, BBC Simple English web site, [Many others] • Produce outputs that reasoning systems use • Contrast with usual CL annotate-train-test cycle • Principle: Thousands for representation and reasoning, not one penny for text annotation! • Tasks: Cognitive modeling, learning by reading

Project: Explanation Agent • 2nd Generation Learning by Reading system • Goal: Fully automatic reading of article-length simplified English texts, with sketches • Sources: Hand-translated textbooks, Simple English Wikipedia • At least one book’s worth of material • Goal: Self-guided learning • Select & pursue its own learning goals • Seek new articles to read, to fill out background knowledge

Learning Reader: A 1st generation LbR system QA Performance: Before reading: 10% After reading: 37% After rumination: 50% Accuracy > 99% Simplified texts Parameterized Questions Online Reader (DMAP) QA Reasoner Ruminator Assimilated knowledgeof world history & politics Ruminator Question Generation Strategies KB Forbus, Riesbeck, Birnbaum, …AAAI 2007 Research Cyc initial endowment

EANLU Most cognitive simulations use hand-coded representations Problems Tailorability Difficulty scaling Consistency Requires expertise Solution: Make production of representations more automatic DRT-based Packed Semantics Builder Allen Parser ResearchCycKB Contents Query-basedAbductive Semantic Interpreter COMLEXlexicon Formal representation of story

“Because of a dam on a river, 20 species of fish will be extinct.” Universe: dam19729 river19839 (isa river19839 River) (isa dam19729 Dam) (on-Physical dam19729 river19839) (causes-ThingProp dam19729 ) (thereExists (TheList dam19729 river19839) (and (isa river19839 River) (on-Physical dam19729 river19839) (isa dam19729 Dam) (causes-ThingProp dam19729 (thereExists group-of-species19884 (and (isa group-of-species19884 Set-Mathematical) (cardinality group-of-species19884 20) (forAll species19884 (implies (member species19884 group-of-species19884) (thereExists group-of-fish19905 (and (isa group-of-fish19905 Set-Mathematical) (forAll fish19905 (implies (member fish19905 group-of-fish19905) (isa fish19905 Fish))) (generalizes species19884 group-of-fish19905) (isa species19884 BiologicalSpecies))))) (willBe (thereExists be19993 (and (isa be19993 Extinction) (objectActedOn be19993 group-of-species19884))))))))) Universe: group-of-species19884 (isa group-of-species19884 Set-Mathematical) (cardinality group-of-species19884 20) (implies-DrsDrs (willBe ) ) Universe: species19884 (member species19884 group-of-species19984) Universe: group-of-fish19905 (isa species19884 BiologicalSpecies) (generalizes species19884 group-of-fish19905) (isa group-of-fish19905 Set-Mathematical) (implies-DrsDrs ) Universe: fish19905 (member fish119905 group-of-fish19905) (isa fish119905 Fish) Universe: be19993 (isa be19993 Extinction) (objectActedOn be19993 group-of-species19884)

Learning by Reading w/Diagrams A computational version of Mayer’s multimedia learning theory CogSketch A lever has three basic parts. A fulcrum is a basic part of a lever. A force is a basic part of a lever. A weight is a basic part of a lever. (sketchForDiscourse "kb-resource://Figure1-1.sk" (DrsCaseFn DRS-3446218074-8197)) F is the Fulcrum. E is the force. A2 is the distance between the weight and the fulcrum. A1 is the distance between the force and the fulcrum. A1 is an arm of the lever. A2 is an arm of the lever. Part of Kate Lockwood’s Ph.D. work: After reading simplified NL version of chapter, correctly answers 12/15 homework questions (KCAP09)In vitro, not in vivo EA NLU SME

Automatic Reading Strategies • Experience-based filtering • First used in Learning Reader: statistics on joint categorization • e.g., no known military actions in DNA adenine base • Analogical interpretation • Store disambiguation decisions as cases, reapply by analogy • Use SAGE to generate probabilistic disambiguation rules • Conceptual triangulation • Examine how the same ambiguous word is used in multiple similar texts

Self-Guided Learning • Formulating learning goals • Monitor Q/A performance • Use model-based diagnosis to suggest weaknesses • Set up activities to remedy these weaknesses • Learning self-models • How much it knows in various areas • Qualitative models of its own operations • Internal inconsistency detection • Optimizing use of human attention • Small question budget

Towards Long-lived Learning Software Collaborators Interactive Learning. Software needs to be able to learn via apprenticeship: starting as a student, moving to participation in shared work, and increased autonomy and responsibility as it learns. Longevity. Software needs to be able to learn and adapt over substantial periods of time, without intervention by people who know their internals.

Strategy games = Rich simulated environment for learning • Space • Terrain and its effects • Exploration strategies • Designing/optimizing city placement,transportation networks. • Dynamics • Coordinating physical movements of units. • Setting up and maintaining an economy • Balancing short-term versus long-term payoffs, including research • Guns versus butter tradeoffs. • Conflict & Cooperation • Military strategy and tactics. • Diplomacy • Joint play Constrained yet broad domain

Planned Apprenticeship Trajectory Next will serveas aide-de-campfor players,learning fromjoint play Can you keep Boston from starving? Will start asstudent, learningbasic skills fromtrainers Don’t we need better defenses? Remember when they… Finally, autonomous operation, playing people on network servers Oh my.. Will getfeedbackfrom human mentor Goal:Companions operatingas apprentices,for at least a monthat a time I got nuked. What should I have done instead?

Learning Spatial Terms • Idea: Use SAGE, with multimodal inputs • Allow Companion to do active learning, by finding potential examples to label

CogSketch Examples Can copy/paste from PowerPoint Reduces tailorability in spatial stimuli Learning spatial prepositions Andrew Lovett’s PhD thesis Best Generalization IN Size: 3 (candle in bottle, cookie in bowl, marble in water) --DEFINITE FACTS: (rcc8-TPP figure ground) --POSSIBLE FACTS: 33% (Basin ground) 33% (Bowl-Generic ground) Kate Lockwood’s PhD thesis 20

Example “This is a choke point” 2nd order learning goal: How should this example be encoded for SAGE?

One Strategy: Intersect with Terrain Model Derive local terrain representations on demand Light Green = Hills Dark Green = Grassland Yellow = PlainsRed = Mountains Blue = Water

Learning from Stories • Tell Companion stories • e.g., the Strait of Hormuz is of strategic importance because it is a choke point for oil distribution. Dispute over the control of it led to a war between Iran and Iraq between 1984 and 1987. • Mulling over the story, it should think about • Guarding choke points in its own distribution networks • Find choke points in enemy civs and exploit them • Recognize similar behaviors that arose in games it has already played

Extracting Qualitative Models from Language • Currently analyzing the Freeciv manual • Objective source of examples of information and advice • Goals of the analysis • What sorts of information and advice are communicated via natural language? • What requirements do those impose on NLU systems? • Produce simplified texts that can be used in learning experiments

Analysis Results so far • Examining three chapters • Cities (89 sentences), terrain (69 sentences), economy (62 sentences) • 17% can be parsed into qualitative causal relations • 38 original sentences, 76 simplified • Other kinds of available knowledge: • Limit points (e.g. “When population reaches zero,…”) • Direct influences (e.g. “while those producing less deplete their granary”) • Extensive/Intensive parameters (e.g. “Any leftover production points…”

Experiment: Verifying utility of qualitative knowledge • Objective: Build the largest civilization you can in 75 turns. • Compare performance under different conditions: • Legal player makes random choices from among legal actions. • Rational player uses a hand-coded qualitative model to make decisions that lead to goal (not necessarily optimal) • Canned strategies follows hand-coded plans to maximize goal.

Summary • Language evolved as a communication system for intelligent social organisms • To fully participate, our software should also be social organisms • Cognitive architectures provide decision problems that frame learning goals • Potentially provide accumulation of experience • Companions cognitive architecture being used to explore • Learning by reading • Interactive learning, longevity

What we’re looking for • Dialogue system that we can extend • Explanation Agent: Interactive Q/A • Collaborators: Q/A, advice, joint planning • Intelligent tutoring systems • Broader coverage parser • We’re reengineering our semantic interpreter this summer • Lexicon extensions • Have developed broad-coverage (but uncertain quality) lexicon ourselves

Questions? • Acknowledgements • ONR Project: David Barbella, Abhishek Sharma • AFOSR project: Tom Hinrichs, Matt McClure, Chris Blair • Other Companions hackers: Scott Friedman • Alums: Matt Klenk, Sven Kuehne, Kate Lockwood, Emmett Tomai

How Much Experience? • Upper bound = ???, lower bound easier • One type of experience: Reading books • Lockwood knowledge capture experiment • 8 assertions/simplified English sentence, on average • 388 assertions/sketched diagram, on average • 8 page chapter with lots of diagrams, ~10K assertions •  100 page book, 32K-125K assertions • Suppose a child reads 10 books/year for grades 1-12 • 4×106 if text only, 1.5×107 if diagram-heavy • Between 2x and 10x the size of ResearchCyc

Our Approach to Sketch Understanding • Sketching is about spatial relations • Continuous ink understood in qualitative terms • Provides input for spatial and conceptual reasoning • Most AI approaches focus only on object recognition • e.g., parts in a circuit schematic • Limited to specific, small domains • Requires training, errors are distracting • Ignores relationships between objects, reasoning about the contents of the sketch • Observation: Recognition is not required when people sketch with each other • We verbally label objects as we sketch

CogSketch in a nutshell • Open-domain sketch understanding system • Users draw glyphs, which are given conceptual labels • Recognition is a catalyst, not a requirement • Labels chosen from OpenCyc-derived knowledge base • CogSketch performs visual processing to understand visual and spatial relationships inthe sketch • Analogical mapping , retrieval & generalization built-in Download from: spatiallearning.org

Modeling spatial language learning Analogical generalization can be used to learn several English and Dutch spatial prepositions from sketched input Lockwood et al, Spatial Cognition 2008

Evaluation • Used publisher provided homework assignment for Basic Machines • Test set of 15 multiple choice questions on the topic covered in Chapter 1 (Levers) • All questions were hand-translated to predicate calculus • Diagrams were sketched using CogSketch • System was not allowed to guess if it could not derive an answer • Correctly answered 12 out of 15 problems (P < 10-5)

Learning from Language: It takes an architecture