CSC 599: Computational Scientific Discovery

Lecture 10: The Scienceomatic Systems: Deductive Reasoning and Model Design CSC 599: Computational Scientific Discovery

Outline History Trajectory Object • Application: Exhaustive Search Explanation Usage Object • Application: Explanation Preference Assertion Reasoning Object • Application: Simulation

What Scientists Want Tell me: “What is the prediction?” “How shall I improve this model?” “How did you get that answer?” “What shall we do next?” “What's the precision of that value?” User sees one interface, but Assertion Reasoning Object answers “Predict value of X at time Y” Explanation Usage Object answers “Why is value of X at time Y equal to Z?” History Trajectory Object answers “What is good to try next?”

Recall

History Trajectory Object Does several things: • Decides which operator to do next based on: • How successful they have been (operator id)‏ • Type of data (data id)‏ • Tactics • Strategy • Keeps track of what's been tried before • operator/data • success/failure • “by how much” • who/when/why/etc. • Modifiable • Learns best operators on for given data • PROGRAMMABLE?!? (Under these conditions create an operator that does this . . .)‏

History Trajectory Object: Application 1: Systematic Search Exhaustive Search: simplest to more complex • Plays to strengths of computers • Examples: • MECHEM • Inductive Process Modeling History Trajectory Object can do this: • Queries Assertion Usage Object for assertions • Manipulates assertions to build more complex ones

History Trajectory Object: Systematic Search (2)‏ Idea: • There are 3 processes total. • Two (leaf1 and leaf2) are primitive. • The third (node2) can be made arbitrarily more complex by taking a type1 process as 1st parameter and type2 process as 2nd. • leaf1 is type1; leaf2 and node2 are type2 Program (In Prolog): process1(leaf1). process2(leaf2). process2(node2(P1,P2)) :- process1(P1), process2(P2).

History Trajectory Object: Systematic Search (3)‏ Program generates process templates “Template” means model of same structure • parameters have not yet been computed (by calculus, simulated annealing, etc.)‏ Generated from simplest -> increasingly more complex Output: ?- process2(A). A = leaf2 ; A = node2(leaf1, leaf2) ; A = node2(leaf1, node2(leaf1, leaf2)) ; A = node2(leaf1, node2(leaf1, node2(leaf1, leaf2))) etc.

History Trajectory Object: Implementation This algorithm is fundamentally same as MECHEM and Inductive Process Modeling One algorithm that generates model templates! • Calls numeric program to do parameter fitting • Can do both MECHEMandIPM! Making it efficient • Like MECHEM rely on domain knowledge “Reactions that form pure Carbon are unlikely” • Like IPM rely on object type information “Rabbits are prey, coyotes are predators or prey” Implement in: Prolog? (Scienceomatic)‏ Lisp? (MECHEM, IPM re-write)‏ ML? Haskell? Anything else?

Explanation Usage Object Sample of important methods: • Predict object1's attribute attribute1 • Satisfy with assertion usage object • Satisfy with solved problem library • Philosophy of science justification • Kuhnian exemplar: what scientists do • Artificial Intelligence justification: • EBL: cheaper thande novo reasoning • Give trace why object object1's attribute attribute1 is value value1. • Give trace how assertion assertion1 is justified (e.g. derived)‏ • Refine reasoning method “I like traces like this over traces like thatbecause . . .”

Explanation Usage Object: Application 2: Explanation Preference Default behavior: Favor shallowest explanation (more on details of this later)‏ Problem Shallowest may not be most correct! We've seen this before with MECHEM: • Computer scientist's “best mechanism” means shortest syntax • Chemist's “best mechanism” means least energetic rate determining step

Explanation Usage Object: Explanation Preference (2)‏ falling_without_drag: F = mg falling_with_drag: F = mg - 0.5 * r * v2 * A * Cd where: r = fluid's (e.g. air's) density v = velocity A = object's area Cd = Drag coefficient

Explanation Usage Object: Explanation Preference (3)‏ Scienceomatic would can compute both answers • falling_without_drag answer might be returned first if uses less deep explanation tree Can tell Scienceomatic preferences: “Prefer falling_with_drag answer before falling_without_drag answer”

Explanation Usage Object: Implementation An explanation datastructure is a tree • Lisp, ML, Haskell most general • Prolog acceptable • May need Prolog's “2nd order” predicates: =.., functor, arg • Example: ?- f(a,b) =.. L. L = [f, a, b] ; • C/C++/Java/C# Possible (of course) but may not be natural • Your ideas?

Assertion Reasoning Object Sample of important methods: • Retrieve assertion assertion1 • Show assertion • Edit assertion • Predict object object1's attribute attribute1 • Plot these values • Compare predicted and recorded values • Justify (e.g. logical resolution) assertion assertion1

Large Scale Knowledge Organization Five components • Definitions/Expectations/Assumptions “Meters measure length” “100 cm = 1 meter” “Evolution is impossible because of X, Y, Z” • Theory Newton's Laws of Motion and Gravitation • Generalization Johannes Kepler's Laws • Data Tycho Brahe's Observations • Analytics How to sum & integrate, change coord. systems, etc.

Reasoning Over Components Which components queried -> Reasoning type theorize: d/e/a, theory, general, data, analytics empiricize: d/e/a, data, general, theory, analytics ab_initio: d/e/a, theory, analytics read_data: d/e/a, data, analytics d/e/a always first Enforce agreement with base assumptions analytics always last Recast query to other form if all else fails

Each Component • Inherited Knowledge Birds fly, but penguins don't fly • Dynamic Knowledge (processes)‏ For falling things: a = F/m = g v = v0 + gt height = h0 + v0t + gt2 • Static Knowledge For homogeneous gases: PV = nRT

Inherited Knowledge Works with ontology is_a(bird,animal). inherit(bird,can_fly_attr,true). is_a(penguin,bird). inherit(bird,can_fly_attr,false). instance_of(tweety,bird). instance_of(opus,penguin). Deduce: “tweety can fly” “opus can not fly”

Inherited Knowledge (2)‏ Works with ontology, cont'd Remove instance_of(opus,penguin). Add is_a(penguin_with_pilots_license, penguin). inherit(penguin_with_pilots_license, can_fly_attr,true). instance_of(opus,penguin_with_pilots_license). Deduce: “opus can fly”

Static Knowledge Assertions • Modular units of knowledge Numeric relations (e.g. equations)‏ Decision trees • A name to uniquely identify them ideal_gas_law • A typed entity list of entity names and set or domain in which they must reside, gas_ent (in single_compound_gas_class), container_ent (in fluid_container_class), molecule_ent (in molecule_class),

Static Knowledge • A condition list telling when knowledge is applicable: • molecule_ent.is_gas_phase_mutually_attractive_attr == false • molecule_ent.is_gas_phase_mutually_replusive_attr == false, • gas_ent.total_molecular_volume_attr << container_ent.containers_volume_attr • gas_ent.is_gas_randomly_moving_attr = true • molecule_ent.is_newtonian_particle_attr == true • gas_ent.materials_molecule_attr = molecule_ent • gas_ent.fluids_container_attr == container_ent

Static Knowledge • An expression: PV = RnT: gas_ent.gases_pressure_attr * container_ent.objects_internal_volume_attr == value(normal(8.3145,0.00005), joules_per_mole_kelvin_domain )‏ * gas_ent.materials_mole_num_attr * gas_ent.objects_temperature_attr

Dynamic (Process) Knowledge Processes have • Name • Types entity list • Entity mappings from inherited to base process entities • Conditions • If process happened we know they held • Sources of knowledge rather than things to check • Subassertions • Numeric relations or decision trees telling what happens • Simulation code • Compiles to Java source • Test constraints • Instances of the class • Serial or parallel decomposition

Example: Pendulum (1)‏ Entities • process_ent: The process • init_state_ent: init. state • intermediate_state_ent: intermediate states • final_state_ent: final state • axes_configuration_ent: configuration of X and Y axes in which pendulum swings • pendulum_ent: pendulum on end of string • arm_ent: swinging arm • gravitational_field_ent: gravitational field

Example: Pendulum (2)‏ Conditions arm_ent.entities_mass_attr << pendulum_ent.entities_mass_attr (maybe more)‏

Example: Pendulum (3)‏ Subassertions object.attribute.descriptor descriptors: .cont: continuous .init: initial .final: final .delta: Dattribute .current/.next/.prev: current/next/previous states discrete • Arm's length is constant: pendulum_ent.x.cont ^ 2 + pendulum_ent.y.cont ^ 2 == arm_ent.length

Example: Pendulum (4)‏ Subassertions • X axis forces: pendulum_ent.objects_mass_attr * (pendulum_end.x.delta.delta / process_ent.time.delta.delta) == arm_ent.objects_lengthwise_tension_attr.cont * pendulum_ent.x.cont * arm_ent.objects_length_attr

Example: Pendulum (5)‏ Subassertions, cont'd • Y axis forces: pendulum_ent.objects_mass_attr * (pendulum_end.y.delta.delta / process_ent.time.delta.delta) == arm_ent.objects_lengthwise_tension_attr.cont * pendulum_ent.y.cont * arm_ent.objects_length_attr – pendulum_ent.objects_mass_attr * gravitational_field_ent.grav_fields_acceleration_attr

Hierarchy of Processes Motion • Very abstract 1-D motion • Specifies that motion along one dimension only • abstract means “fnc to be given in derived class” 1-D uniform acceleration • Specifies uniform accel. • abstract_const means “constant to be given in derived class” 1-D gravitational accel. • Gives conditions

Reasoning, Revisited “Deduction” 1st Inherited, 2nd Dynamic, 3rd Static, (Prolog?) • Iterative Deepening-Depth First Search • Assumption: “shallowest” answer is simplest • Specify preference with Explanation Usage Obj Probabilistic reasoning • Bayesian Logic Program on top of Prolog • Simulator -> Conditional Probability Table (CPTs)‏ • CPTs -> Bayesian Logic Program • Thanks Tony Garcia! Simulation

Assertion Usage Object: Application 3: Make a simulator Use simulator code for each process to make Java (or C++) simulator • Free parameters picked based on domain ranges • Does N Monte Carlo simulation runs

Biology: Intelligent Design • Life is too complex to have arisen by chance, therefore someone must have designed it • ≠Creationism: Doesn't say who designer was (maybe Space Aliens)‏ • Selection occurs, butEvolution requires: • geographical isolation, AND • new trait mutation, AND • superior fitness • For all of these to be true is improbable

Biology: Evolution • No such restraints • New Synthesis view of Evolution: • Speciation by geographic isolation and selection • Post New Synthesis: • Keep geographic isolation • Do we need maximal selection?

Biology: Common Model • Logistic Growth • dN = (growth_rate) * N * (1 – N/(regions_capacity))‏ • Small population -> fast growth; Large pop. -> slow • Mendelian Genetics • 2 genes: A dominates over a, B dominates over b • Hardy-Weinberg Conditions • H-W proof: no change in allele freq., given assumptions • Relaxedassumptions: large population, no fitness diff. • Retainedassumptions: no mutation, no immigration, no emigration, only same-generation random mating

Specific Model: “Best of a Bad Circumstance” • Everyone has 2 gene_a alleles and 2 gene_b alleles • Having at least one copy of A is best • Having at least one copy of B is 2nd best • Having neither is worst • fitness(A???) >= fitness(aaB?) >= fitness(aabb)‏

To Test • Intelligent Design • “B's frequency will never increase other than due to fitness” • We test “Does B ever increase?” • Do more detailed statistical analysis afterward • Evolution • The Null hypotheses: That B can increase above fitness

Biological Processes

Biological Free Parameters

Simulator Algorithm • Scilog + Bio Model -> C++ simulator • Has random seed as parameter • Simulation run 100,000 times • Index used as random seed

Simulator Results • frac(A)alwaysincreases • frac(B) increases in 57,038 of 100,000 trials • I.D. “Isn't that selection?”

Principle Component Analysis • V4: increase in Bwith selection (16.62%), but . . . • V2: increase in Bwithout selection (23.38%)

Discussion • Q: How can B increase withoutselection? • A: TheFounder Effect! • Random variations in the initial generation influence later generations • For our simulations: (init. generation size) < 20 • This may not be real-world threshold • Only requirements: • Geographic isolation • Diverse initial population

Values “Chicago is 185 meters above sea level, ±5 meters” Primary value or sample distribution The value being held One value: 1.85e+2 Explicit set: [180, 185, 190] Implicit set: normal(185,5)‏ Domain Metadata about the value Dimensions (e.g. length)‏ Units (e.g. meters)‏ Axis (e.g. “height above sea level, somewhere on Earth”)‏ Legal values (e.g. “0 meters to 10,000 meters”)‏

Values (2)‏ State When the value is said to hold “Mean/median/mode value during all of 2007” Subject Object the value describes chicago Attribute Aspect of object being described height_above_sea_level_attr

Questions for you: Easiest language to write History Trajectory and Explanation Usage Objects in? Lisp? ML? Haskell? Prolog? Easiest to program GUI?

CSC 599: Computational Scientific Discovery