Symbolic Supercomputer for Artificial Intelligence and Cognitive Science Research

Symbolic Supercomputerfor Artificial IntelligenceandCognitive ScienceResearch Kenneth D. Forbus Dedre Gentner Northwestern University

Overview • Why symbolic supercomputing? • Off-line experiments • Work in progress: Large-scale corpus analysis • Distributed experiments harness • Interactive Cognitive Architecture experiments • Companion Cognitive Systems (DARPA) • Explanation Agent

Off-line experiments • Sensitivity Analysis • Every cognitive simulation has parameters • Analyzing how performance depends on parameters important for understanding models • Sensitivity analyses can be expensive • 1994 MAC/FAC simulations took weeks of CPU time • 2000: 4.8 million SME runs in SEQL sensitivity analyses took 23 days (400 mhz PII), should be 4 days today. • Corpora Analyses • Text • Sketches • Problems

Larger-scale simulations • Goal: Increased use of automatically generated inputs • Reduce tailorability • Increase # of stimuli generated and used. • Processes • Analogical Encoding • Conceptual problem solving

Symbolic models and parallelism • Our approach is based on Gentner’s (1983) structure-mapping theory • Assumes parallel processing both within modules and between modules • Currently emulate on serial processors • Coarse-grained parallelism could provide important benefits • Continue to simulate within-module parallelism on single CPUs • Exploit parallel processing between modules • Incrementally update retrievals during reasoning • Incrementally construct generalizations during reasoning • Reason about domain, interactions, and self in parallel

Traditional Supercomputers ineffective for symbolic processing • Optimized for • Floating-point processing • Pipelined, with vector or grid model •  okay CPUs, low RAM, fast floating point • Symbolic processing • Involves many pointer operations • Some floating-point, but over irregular structures (graphs, sparse-vectors) •  fast CPUs, high RAM, okay floating point

Optimizing a cluster for symbolic processing • Use the fastest CPU available. • Distribute the processing in large, functionally-organized units. • Avoid communication overhead • Data-parallel programming style poor fit for clusters • Replicate knowledge base as needed • Organize memory to be as fast as possible. • Maximize RAM, cache • Avoid virtual memory

Why large memories are crucial • If a program is going to know a lot, it has to put it somewhere • Example: Subset of Cyc KB contents we use • 35,070 concepts, 8939 relations, and 3,917 functions • 1,283,835 axioms, divided into 3,537 microtheories • Added knowledge (DARPA HPKB, CPOF, RKF) • Military tasks, units, equipment • Countries, international relationships, terrorist incidents • Qualitative models, terrain, trafficability, visual representation conventions, developed by our group • Takes roughly 495 MB of storage, due to indexing overhead • May double in size as we learn by accumulating experiences

Hardware: Linux Networx 5 year maintenance contract 67 nodes Dual 3.2Ghz Xeon CPUs 3GB RAM/node 80GB disk/node Allegro Common Lisp for Linux Provides flexible development environment Mk2

Mk2 Cluster Network Gigabit switched Ethernet Packet filtering, trusted whitelist of hosts One-command provisioning, P2P data distribution system

Model of ongoingsituation/system Situationupdates Queries Explanation Agent Estimates,warnings Knowledge Base(general knowledge + libraries of cases) New examples Qualitative reasoning for intelligent agents (ONR AI Program) Objective Create science base for intelligent software agents that can • Reason about the physical phenomena and systems in a human-like way • Extend their knowledge incrementally, by communicating with human collaborators in natural language. Technical Approach • Develop qualitative reasoning techniques for solving problems under time pressure with partial, incomplete knowledge (“back of the envelope” reasoning) • Explore the use of qualitative representations as part of the semantics for a natural language system. • Develop techniques to assimilate controlled-language reports to extend an agent’s models of the physical world.

Idea: Qualitative Process theory can be used as a framework for understanding NL descriptions of physical phenomena. Right level of abstraction Consistent with human mental models Support for compositionality Approach Identify syntactic patterns corresponding to QP theory concepts via corpus analysis Recast QP theory in terms of frames Use controlled subset of English to simplify parsing, focus on semantics Current status NL system translates paragraph sized texts about physical processes into formal representations Tested on a dozen examples Next steps Expand range of texts handled Develop knowledge assimilation techniques to construct knowledge bases by reading multiple texts QP Theory in Natural Language Semantics C2 C1 Type: (isa flow3606 Translation-Flow) Participants: (isa c1 Container) [QuantityFrame q3609] (isa c2 Container) [QuantityFrame q3603] Conditions: (> (pressure c1) (pressure c2)) Quantities: [QuantityFrames q3608 and q3605] Consequences: (qprop (flowrate flow3606) (pressure c1)) (qprop- (flowrate flow3606) (pressure c2)) (I- (water c1) (flowrate flow3606)) (I+ (water c2) (flowrateflow3606)) (1) A pipe connects cylinder c1 to cylinder c2. (2) Cylinder C1 contains 5 liters of water. (3) Cylinder C2 contains 2 liters of water. (4) Water flows from cylinder C1 to cylinder C2, because the pressure in cylinder C1 is greater than the pressure in cylinder C2. (5) The higher the pressure in cylinder C1, the higher the flowrate of the water. (6) When the pressure in cylinder C2 increases, the flowrate of the water decreases.

The EA natural language system Input text Parser Retrieval of semantic information Facts Word-SenseDisambiguation Frame Construction Process Frame Construction QRG-CEgrammar Lexicon WSD Data FrameRules MergeRules ProcessRules 1.2 million fact subset of Cyc KB Only 15 out of ~100grammar rules are QP-specific Patterns for QP-specific constituents QP Frames Sven Kuehne’sPh.D thesis QP Theory constraints

Corpus Analysis (in progress) • Kuehne and Forbus (2002) used by-hand corpus analysis to identify syntactic patterns • Four chapters of an introductory science book, 216 sentences total • 43% of the material in physical explanatory text could be captured via QP theory. • Do the syntactic patterns that we found for explanatory physical texts apply to everyday texts? • If they do, what is their coverage? • How many more patterns are there?

Looking for quantities • 1999 volume of the New York Times, consisting of 6.4 million sentences • First stage used 30 word list for filtering (7.5 hours) • ~172,000 sentences output • Second stage used regular expressions (12 hours) • Derived from vocabulary and syntactic patterns from previous corpus analysis. • Result: ~19,000 sentences worth examining more closely • Third stage uses modified version of our Explanation Agent NLU system (less than 2 days, 17 hours, on 3 nodes) • Previously, used Quantity and PhysicalQuantity • Generalized to the Cyc concept ScalarInterval, • Subsumes temperament, monetary values, feeling attributes, formality/politeness of speech, plus others. • 14,000+ quantities found. • 0.2% of the sentences mention a recognizable quantity • Lexicon limitations may have a strong effect here • Expanding it via hand-labor (Cycorp) plus co-training is probably necessary • e.g., “intensification of the war effort”

Qualitative changes in the New York Times • Starting point: Corpus of 6.4 million sentences • Filter using word list of 89 synonyms for increases, 66 for decreases (~10 hours each) • 62,117 candidate sentences mentioning decreases • 195,452 candidate sentences mentioning increases • Around 4% of corpus • Contrast: 43% of the material in physical explanatory text could be captured via QP theory. • Larger analysis only concerns qualitative proportionalities • Qualitative representations may play a smaller role in understanding political texts versus physical texts. • Genre differences: newspapers versus explanatory material • E.g., “(X i.e., Y)” common on web, not in newspapers

Dexp: Distributed Experiment Tool • Provides support for running distributed experiments • Written in Common Lisp • Uses sandbox to avoid configuration issues • Experimenter divides computation into work units • Example: For N queries, find all of the solutions to them • Provides list of work units to dexp as a file, along with a startup file and code tree to use • Gets back a set of files containing the results.

dexp Architecture distributed experiment pool n31 n33 n34 n15 n65 n66 • Experiment Coordinator • Manage distribution and execution of work units • Collect results • Experiment pool nodes • Executes a work unit, returns results. • Execution uses sandbox for configuration control • Load Balancer • Dynamically allocate nodes for work units • Will balances demands from multiple simultaneous experiments Coordinator Loadbalancer (*)

How dexp simplifies experiments: Example • A experiment analyzing semantic translations in ResearchCyc KB consisted of ~1200 work units • Each consisted of a query to see how many examples in the KB satisfied the semantic patterns given for verbs • With 24 nodes, most of the experiment was completed in 34 minutes • Estimate: 11 hours on a single CPU, if no failures • Five work units churned for 12 hours, failed to finish due to heap blow-out • Most of the results were available quickly • Much easier to diagnose what was going wrong, instead of waiting for hours to hit a failure.

A new cognitive systems architecture Robust reasoning and learning Companions will learn about their domains, their users, and themselves. Longevity Companions will operate continuously over weeks and months at a time. Interactivity Companions will be capable of high bandwidth interaction with their human partners. This includes taking advice. Sketching is a majorinteraction modality Central hypotheses Analogical processing will enable us to create systems with human-like learning and reasoning abilities Able to handle relational information Able to incrementally adapt and extend their knowledge Able to apply what they learn in one domain to other domains Using a cluster can make an analogical processing architecture fast enough to be used in interactive systems Changes the kinds of experiments that become feasible as well. Companion Cognitive Systems Mk2 (ONR, 67 nodes) Colossus (DARPA, 5 nodes)

Psychological Bets Ubiquitous use of structure-mapping for reasoning and learning SME for matching MAC/FAC for similarity-based retrieval SEQL for generalization Qualitative representations play central role Part of visual structure in spatial reasoning Representation of causal knowledge and arguments Engineering Choices Distributed agent architecture using KQML Logic-based TMS for working memory No hardwired working-memory capacity limits Companions as Structure-Mapping Architecture

Companion Architecture Year One Cluster SessionReasoner Facilitator MAC/FACDomain Tickler Node Master node Node User’s Windows box w/Thomas Hinrichs, Jeff Usher, Matt Klenk, Greg Dunham, Emmett Tomai, Tom Ouyang, Hyeonkyeong Kim, and Brian Kyckelhahn SessionManager nuSketch System(sKEA or nuSketchBattlespace) RelationalConceptMap

Widely used standardized exam for technicians Used in cognitive psychology as indicator of spatial ability Difficulty lies in breadth of situations, not narrow technical knowledge Best score to date: 10 correct out of a subset of 13 BMCT problems (77%). [P < 0.001] Bennett Mechanical Comprehension Test Example describes how physical principles apply to a real-world situation Analogies with example provides causal models needed for solution Q: Which crane is more stable?

Suggesting visual/conceptual relations by analogy MAC/FAC Knowledge Base(including case libraries of examples) 109 candidates 184 candidates Analogical inferences are surmises, not certainties 189 candidates 109 candidates SuggestionsFiltering CandidateInferenceExtraction

Ex1: Focused Tasking 54 sketches (18 situations drawn by three KEs) as case library for BMCT experiment Round Robin method: For each sketch, remove from library, remove its VCR answers, generate suggestions via analogy Yielded “exam” of 181 VCR questions Score = 74.25 (P << 10-5) Coverage = 54% Accuracy = 87% Ex2: Open tasking 10 situations selected from BMCT problems, covering larger range of phenomena (e.g., “a boat moving in water”, “a bicycle”) Each situation sketched by two graduate students, told to illustrate the principle(s) you think are important. Round Robin method Yielded “exam” of 138 questions Score = 21.75 (P < 10-7) Coverage = 46% Accuracy = 57% Visual/Conceptual Relations: Experimental Results

OfflineLearning OfflineLearning OfflineLearning MAC/FACUser Model Tickler MAC/FACDomain Model Tickler SEQLDomain Generalizer MAC/FACSelf Model Tickler SEQLSelf Model Generalizer SEQLUser Model Generalizer DialogueManager SessionReasoner Executive HeadlessnuSketch Facilitator Cluster SessionManager RelationalConceptMap InteractiveExplanationInterface nuSketch GUI User’sWindows box CompanionsArchitecture as of 9/05

Explanation Agent Prototype • Use Companions Architecture as infrastructure • Incorporate other ONR advances • EA NLU system (Sven Kuehne) • Back of the envelope reasoning (Praveen Paritosh) • Spatial prepositions model to link language and sketches (Kate Lockwood) • Analogical Problem Solver (Tom Ouyang) • Use for cognitive simulations • Natural language, sketching for stimulus input

Back of the Envelope Reasoning (Paritosh) Is anyone still alive in there? How longto repair it? How muchoxygen is left? • Qualitative representations essential for framing the problems, supporting comparisons • Analogical reasoning used to find similar situations for estimation models, construct qualitative representations via generalization over experience Goal: Develop theories that enable software to reason quantitativelyin real-world situations

Implemented BoTE-Solver Solves 13 problems to date Examples How many K-8 school teachers are in the USA? How much money is spent on newspapers in USA per year? What is the total annual gasoline consumption by cars in US? What is the annual cost of healthcare in USA? How much power can an adult human generate? Claim: There is a core collection of strategic knowledge, specifically, seven strategies that capture most of back of the envelope reasoning. Source: Strategies in Bote-Solver Analysis of all problems (n=44) from Force and Pressure, Rotation and Mechanics, Heat and Astronomy from Clifford Swartz’s Back-of-the-Envelope Physics. Back of the Envelope Reasoning Progress

CARVE: Using analogy to generate qualitative representations C1 Dimensional partitioning for each quantity (k-means clustering) (isa Algeria (HighValueContextualizedFn Area AfricanCountries) . . C1 Input cases Quantity 1 Add these facts to original cases S2 Structural clustering using SEQL S3 Cj S1 Cases + structural limit points and distributional partitions L1 L2

Analogical Estimation • Analogical estimator: makes guesses for a numeric parameter based on analogy. (GrossDomesticProduct Brazil ?x) • The value is known. • Find an analogous case for which value is known. • Find anything in the KB which might be a basis for an estimate. • Hypothesis: Representations augmented with symbolic representation will lead to more accurate estimates.

Basketball Stats Domain • Quantities (e.g., points per game, rebounds per game, assists per game, etc.) • Causal relationships • Being taller helps being able to rebound and block • Power forwards are taller and are expected to shoot, rebound and block • Being good at getting 3 point field goals means one is a good shooter, so their free throw success rates will be higher. • Case library • 15 players from different positions on field • 11 facts per player (seasonThreePointsPercent JasonKidd 0.404) (qprop seasonThreePointPercent seasonFreeThrowPercent BasketballPlayers)

Results: Errors

Recent research points to role of non-geometric properties in spatial preposition use Coventry 1994; Coventry & Prat-Sala, 1999; Herskovitz, 1986; Feist & Gentner, 2003; Garrod et al., 1999; Coventry & Garrod, 2004; Carlson & van der Zee, 2005 Spatial language can affect retrieval of pictures Feist and Gentner, 2001 Multimodal interfaces potentially useful for military needs Language plus diagrams, other spatial displays Software’s notion of similarity needs to be like their human partners Including visual properties Including retrieval, for shared history Including shared language SpaceCase: Motivation Lockwood, K., Forbus, K., and Usher, J. SpaceCase: a Model of Spatial Preposition Use Proceedings of CogSci-05, to appear

sKEA Sketching Interface

sKEA Sketching Interface medium_curvature firefly -> insect -> animate functions as weak container firefly dish ground_supports_figure

Sketch corpus crucial for model development • Building a corpus of sketches • Gathering library of examples from literature • Use sKEA to capture them in machine-understandable form • Estimate: ~ 200 sketches will be needed to cover the set of prepositions and phenomena to be modeled • Cluster will be used for • Regression testing • Sensitivity analyses: How does performance depend on parameter values?

Problem-solving experiments • Starting point: Pisan’s (1998) Thermodynamics Problem Solver • Solved 80% of the problems typically found in first four chapters in engineering thermodynamics textbooks • Used graphs and property tables • Produced human-like solutions • Generalize: Analogical Problem Solver • Focus on conceptual comprehension questions • Declarative strategies now include analogical processing • when/what to retrieve, what candidate inferences to use, level of effort in testing • Experiment in progress: Can strategy variations explain novice/expert differences? • Pilot results promising, should have full data by end of summer.

Questions?

Technology Transfer

Goal: Generate plausible hypotheses about who performed an event. Formal version: Given some event E whose perpetrator is unknown, construct a small set of hypotheses {Hp} about the identity of the perpetrator of E. Include explanations as to why these are the likely ones Able to explain on demand why others are less likely. Assumptions & Limitations Formal inputs. Structured descriptions, including relational information, expressed in CycL. Accurate inputs. One-shot operation. No incremental updates. Passive operation. Doesn’t generate differential diagnosis information The Whodunit Problem

Method 1: Closest Exemplar Memory pool Probe Output = memory item + SME results CVmatch SME CVmatch SME CVmatch SME CVmatch Cheap, fast, non-structural • Use MAC/FAC to retrieve events similar to E. • For each similar event, remove it if it doesn't include a candidate inference about the perpetrator. • Iterate until enough hypotheses are generated. • (Optional) Generate explanations and expectations by analyzing the similarities and differences between each Hp and E. MAC/FAC models similarity based retrieval • Scales to large memories • Accounts for psychological phenomena • Memory pool = All cases concerning the 98 perpetrators, minus the test set.

Method 2: Closest Generalization Generalizations … Exemplars New Example • Preprocessing: • Partition case library according to perpetrator. • Use SEQL to construct generalizations for each perpetrator. • Generating hypotheses: • Given an incident E, pick the n closest generalizations, as determined by SME's structural evaluation score. SEQL SME SEQL models generalization • Assimilate new exemplars into a generalization when close enough. • Models psychological data, used to made successful predictions of human behavior. • Recent extension: use probability to improve noise immunity

Used 3,379 terrorist incidents from Cycorp’s Terrorist knowledge base Between 6 and 158 propositions per case, 20 on average 98 perpetrators involved in at least 3 incidents in the TKB Pick one incident at random for test set, remove perpetrator Elaborate via inference Add attributes (e.g., (CityInCountryFn Italy))using genls hierarchy Three performance levels: Best bet Top 3: Best plus plausible alternatives Top Ten list: Foci for additional collection, analysis Whodunit Experiment

Whodunit Example

Whodunit Results Pure retrieval surprisingly good Adding probability yielded 5% improvement Symbolic generalization adds valve for weaker criteria

Background Material

Basketball Stats Estimation by Analogy Given: An estimation problem (seasonThreePointsPercent JasonKidd ?x) and a case library Find the most similar player to JasonKidd in the case library for whom we know the value for seasonThreePointsPercent. Use that as an estimate for the given problem. Compare accuracy over the initial case library, and the case library enriched with representations from CARVE.

SpaceCase sKEA input stimulus KB inkprocessingroutines Evidence Rules Bayesianupdatingalgorithm Spatial Preposition Label

Performance • Labeling task (Feist & Gentner, 2003) • <figure> is in/on the <ground> • 36 total stimuli • {firefly, coin} • {bowl, dish, plate, slab, rock, hand} • {low, medium, high} • Consistent on all 36 trials for values of parameters given

Symbolic Supercomputer for Artificial Intelligence and Cognitive Science Research