CSC 599: Computational Scientific Discovery

Lecture 9: Introduction to the Scienceomatic Architecture CSC 599: Computational Scientific Discovery

Outline • Motivation • CSD thus far • Scienceomatic Architecture

First Trend in CSD • Data structures that are more predictive • Single simple equations BACON, late 1980s • List of mechanisms MECHEM, mid 1990s • Differential equations Lagramge, mid 1990s • Process network IPM, mid 2000's

Second Trend in CSD 2. Better application of domain knowledge “Better” in the sense that • Provides more efficiency for limiting search • Provides In scientist-friendly format Examples: • Ad hoc BACON, late 1980s • Grammar Lagramge, mid 1990s • Domain constraints of acceptable solutions MECHEM, mid 1990s • Abstract processes IPM, mid 2000's

Emphases of CSD Predictive data structures: • More structurally complex • More embedded in knowledge scientists already have • Use domain knowledge • More “understandable” • Integrates simulation and exhaustive search • 2 strengths of computers

But what do scientists do? Give reasons why! (explanations)‏ • Templates for solving problems: • Philosophy of science • Kuhn's exemplars • Artificial Intelligence • Explanation Based Learning • Explanations need: • Assertions • Reasoning method(s) to tie them together

About Explanations Assertions come in (at least) two flavors • What Prolog would call “facts” • Measurements (thermometer1 read 20.6 C at time t0)‏ • Fundamental properties (c = 299,792,458 m/s)‏ • What Prolog would call “rules” • F = ma • Modern philosophers of science don't like this • Thinks it smells too much like logical empiricism Reasoning comes in several flavors • Deduction: A; A->B; therefore B • Abduction: B; A->B; therefore A • Analogy: f(A); g(a); relates(A,a); f(B); g(b); therefore relates(B,b)‏ • Maybe Induction: f(1); f(2); f(3); therefore "n: f(n)‏

Explanation-based Learning Deductive learning from one training example • Requires: • The training example • World provides proof of one legal configuration • A Goal Concept • High level description of what to learn • An Operationality Criteria • Tells which concepts are usable • A Domain Theory • Tells relationship between rules & action in domain • EBL generalizes example to describe goal concept and satisfy operationality criteria • Explanation: remove unimportant details from training example with respect to goal concept • Generalization: generalize as much as can while still describing goal concept

EBL Applied to Scientific Reasoning We have • Newton's Law of gravity: F = GMm/r2 (domain know.)‏ • (Example): • Mass of an apple • Force on apple due to gravity (ie. Its weight)‏ • mass[earth] >> mass[apple]; r = radius[earth] • Force of weight (Goal concept to learn)‏ • Data struct outlining how (Operationality criterion)‏ weight[apple]=(GM/radius[earth]) *mass[apple] Generalize data struct to anything fitting criteria: mass[earth] >> mass[X], r=radius[earth] weight[X]=GM*mass[X]/radius[earth]

But what else do scientists do? Remember what has been tried, and why! • Historical trajectory of scientific effort • Reason where to put effort • Human science • Funding agency • Artificial Intelligence • Reinforcement learning • Issues: • Strategy vs. tactics • Tried-and-true vs. brand new

Science under limited resources Ranking (priority queue) of operators to try Funding agencies • Limited resource = money (and time)‏ Reinforcement learning • Limited resource = CPU time (and memory)‏

What is Reinforcement Learning? Is type of learning • Not particular algorithm! • Agent always acting in environment • Gets reward at end, or as goes along • Can be delay j between action and its payoff • Goal: maximize the payoff

Strategy vs. Tactics (Military)‏ (Definitions from Compact Oxford English Dictionary)‏ Strategy: “a plan designed to achieve a particular long-term aim” • Examples: • “Destroy enemy's forces” • “Destroy enemy's economy/industrial base” • “Destroy enemy's morale” Tactics: “the art of disposing armed forces in order of battle and of organizing operations.” • Examples: • Frontal assault • Siege • Pincer • Hit and run

Strategy (Scientific Discovery)‏ Strategy: “What should the long-term process of science be?” Topic in contemporary philosophy of science: Examples: Lakatos • Minimize number unpredicted phenomena • Cumulatively build upon research programmes' hardcore Laudan • Maximize number of predicted attributes, • Research traditions less structured, not necessarily cumulative

Tactics (Scientific Discovery)‏ Tactics: “What should this scientist be doing right now?” Related to inductive bias in machine learning Examples: • Information gain • Minimize cross-validation error • Maximize conditional independence

Strategy vs. Tactics: related issues Tried-and-true vs. Brand new • When does the strategy switch from conventional tactics to unconventional ones? • Philosophy of science: • Kuhn: Normal science vs. revolution • Lakatos: Progressive vs. degenerate research programmes • Artificial Intelligence • Exploration vs. exploitation • Exploration (revolution: look for the brand new)‏ • Exploitation (normal science: get what can from known structure)‏ • Issue studied in reinforcement learning community

Can Implement each object separately Assertion data structure • Directly uses assertions Explanation data structure • Gives explanations Historical data structure • Gives historical context to justify what to do next

Assertion Usage Object Sample of important methods: • Retrieve assertion a1 • Show assertion • Edit assertion • Predict object o1's attribute attr1 • Plot these values • Compare predicted and recorded values • Justify (e.g. logical resolution) assertion a1

Explanation Usage Object Sample of important methods: • Predict o1's attribute attr1 • Satisfy with assertion usage obj • Satisfy with solved problem library • Philosophy of science justification • Kuhnian exemplar: what scientists do • Artificial Intelligence justification: • EBL: cheaper thande novo reasoning • Give trace why object o1's attribute attr1 is value v1. • Give trace how assertion a1 is justified (e.g. derived)‏ • Refine reasoning method “I like traces like this over traces like thatbecause . . .”

Historical Trajectory Object Sample of important methods: • Predict o1's attribute attr1 • Show vs. edit (assertion object)‏ • De novo vs. exemplar (explanation)‏ • Why this trace? Previous traces? • Change strategy • Lakatos, Laudan or other? • Change tactics • Which inductive bias • When to use operator op1 • Change operator library • Add/delete/modify operators • Examine history • How well do ops work, and when? • Selectively erase history • Change priority queue • Reorder operator instances

Three objects underManual or Autonomous Control

Ontology Is-a hierarchy • Single inheritance (except for processes)‏ Instance-of leaves • Each instance only belongs to one class Inherited properties from classes • Override-able at instance or derived class level

Assertion Usage Object Types of assertions “Assertions of state” • “Facts” (in the Prolog sense)‏ • “Rules” (in the Prolog sense)‏ • Relations (e.g. equations)‏ • Numeric computation • Decision trees • Symbolic computation “Assertions of motion” • Process classes • Process instances • analogous to “facts” • Rules • Numeric relations for processes • Decision trees for processes

About assertions Assertions have: • Name • List of entities (things they interrelate)‏ • Conditions (when they hold)‏ • Expression • <entity,attribute> pair that they define (optional)‏ • Authorship • Who is responsible for putting them in kb • When placed in? • Where they came from (Operator? User edit?)‏ • List of assertions that they depend on (if created by operator)‏

Numeric Relation example Numeric relation: Ideal gas law • Name: • ideal_gas_law • List of entities (things they interrelate)‏ • [gas_ent, container_ent, molecule_ent] • gas_ent is the gas sample being described • container_ent is the container holding the gas sample • molecule_ent is the • Conditions (when they hold), ex: “Gas phase molecules are not attractive or repulsive” • Expression • PV = nRT • <entity,attribute> pair that they define (optional)‏ • Gas's thermal energy = PV = nRT (?)‏

Numeric Relation Example (2)‏ • Authorship • Of discovery • Who discovered it • person(some_scientist)‏ • operator(bacon3)‏ • When it was discovered • date(century19)‏ • date(1990)‏ • On inclusion • Who included it • person(joseph_perry_phillips)‏ • When it was included • date(2008,may,27)‏

Processes Describe changes over time • Process classes • Langley et al call them “abstract processes” • Whole class of similar events • Arranged hierarchically • Have assertions associated with them • Process instances • Instance of process class • Single event • May be decomposed into finer process instances

Processes example Motion • Very abstract 1-D motion • Specifies that motion along one dimension only • abstract means “fnc to be given in derived class” 1-D uniform acceleration • Specifies uniform accel. • abstract_const means “constant to be given in derived class” 1-D gravitational accel. • Gives conditions

Process assertion example Decision tree to stochastically compute child's genotype form parents • Non-leaves are tests • Some are random • Leaves are answers

Explanation Usage Object Returns traces of reasoning • Akin to resolution refutation traces • Given: A; B; A∧B -> C (or not(A)∨not(B)∨C)‏ • Prove: C • Method: • Assume not(C)‏ • Show contradiction • Cmust be true!

Explanation Usage Object (2)‏ Can look up in library • If not found calls assertion usage object Works for: • Justifying single values • Justifying whole assertions Optionally allow more than deduction

History Trajectory Object Does several things: • Decides which operator to do next based on: • How successful they have been (operator id)‏ • Type of data (data id)‏ • Tactics • Strategy • Keeps track of what's been tried before • operator/data • success/failure • “by how much” • who/when/why/etc. • Modifiable • Learns best operators on for given data • PROGRAMMABLE?!? (Under these conditions create an operator that does this . . .)‏

Next time • More detail about kb structure • A “culture” for science • Value hierarchy • States and time • Java/C++ simulators • Writing programs in Scienceomatic architecture • Dynamically configured discovery operators in history trajectory object • Scienceomatic in action • What might “normal science” look like? • What might “revolution” look like?

CSC 599: Computational Scientific Discovery