Interactive Software for Computational Modeling and Discovery

Interactive Software Environments for Computational Modeling and Discovery Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University, Stanford, California http://www.isle.org/~langley langley@isle.org Thanks to S. Bay, V. Brooks, L. Chrisman, S. Klooster, A. Pohorille, C. Potter, K. Saito, H. Spencer, J. Shrager, M. Schwabacher, and A. Torregrosa.

The Challenge of Systems Science As a field of science matures, researchers move beyond accounts of simple, isolated phenomena to: develop models of complex systems with many components; compare these models to observational data from the systems; evaluate their models’ ability to fit these observations; and improve their models in response to detected anomalies. Developing, testing, and revising such models is a challenging endeavor that would benefit from computational aides. Our research goal is to design, construct, evaluate, and understand such computational tools for systems science.

Lessons about Scientific Knowledge Discovery Our research collaborations in Earth science and microbiology have suggested some important lessons: 1. Traditional notations from machine learning and data mining are not communicated easily to domain scientists. 2. Scientists often want models that move beyond description to provide explanations of their data. 3. Scientists often have initial models and background knowledge that should influence the discovery process. 4. Scientific data are often rare and difficult to obtain rather than being plentiful, making variance reduction a key issue. 5. Scientists often want computational assistance rather than automated discovery systems. These observations suggest clear needs for additional research in computational approaches to scientific knowledge discovery.

model AquaticEcosystem variables: nitro, phyto, zoo, nutrient_nitro, nutrient_phyto observables: nitro, phyto, zoo process phyto_exponential_growth equations: d[phyto,t] = 0.1  phyto process zoo_logistic_growth equations: d[zoo,t] = 0.1  zoo / (1  zoo / 1.5) process phyto_nitro_consumption equations: d[nitro,t] = 1  phyto  nutrient_nitro, d[phyto,t] = 1  phyto  nutrient_nitro process phyto_nitro_no_saturation equations: nutrient_nitro = nitro process zoo_phyto_consumption equations: d[phyto,t] = 1  zoo  nutrient_phyto, d[zoo,t] = 1  zoo  nutrient_phyto process zoo_phyto_saturation equations: nutrient_phyto = phyto / (phyto + 0.5) process exponential_growth variables: P {population} equations: d[P,t] = [0, 1,]  P process logistic_growth variables: P {population} equations: d[P,t] = [0, 1, ]  P  (1  P / [0, 1, ]) process constant_inflow variables: I {inorganic_nutrient} equations: d[I,t] = [0, 1, ] process consumption variables: P1 {population}, P2 {population}, nutrient_P2 equations: d[P1,t] = [0, 1, ]  P1  nutrient_P2, d[P2,t] =  [0, 1, ]  P1  nutrient_P2 process no_saturation variables: P {number}, nutrient_P {number} equations: nutrient_P = P process saturation variables: P {number}, nutrient_P {number} equations: nutrient_P = P / (P + [0, 1, ]) training data Inductive Process Modeling learned knowledge Induction background knowledge

Why Are Process Models Interesting? Process models are good targest for knowledge discovery because: they incorporate scientific formalisms rather than AI notations; that are easily communicable to scientists and engineers; they move beyond descriptive generalization to explanation; while retaining the modularity needed to support induction. These reasons point to process models as an ideal representation for scientific and engineering knowledge. Process models are an important alternative to formalisms used currently in machine learning and data mining.

NBLR NBLA PBS + - + - DFR psbA1 Health + - - + - RR psbA2 Photo + + - Light cpcB Earth ecosystem gene regulation Three Challenging Scientific Domains NPPc = Smonthmax (E·IPAR, 0) E = 0.56 · T1 · T2 · W T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt2 T2 = 1.18 / [(1 + e0.2 · (Topt – Tempc – 10) ) · (1 + e0.3 · (Tempc – Topt – 10) )] W = 0.5 + 0.5 · EET / PET PET = 1.6 · (10 · Tempc / AHI)A · PET-TW-M if Tempc > 0 PET = 0 if Tempc < 0 A = 0.00000068 · AHI3 – 0.000077 · AHI2 + 0.018 · AHI + 0.49 IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG) , 0.95] SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000) heart capacity heart rate heart activity human activities activity level GSR lung activity lung capacity resp. rate

Challenges of Inductive Process Modeling Process model induction differs from typical learning tasks in that: process models characterize behavior of dynamical systems; variables are mainly continuous and data are unsupervised; observations are not independently and identically distributed; process models contain unobservable processes and variables; multiple processes can interact to produce complex behavior. Compensating factors include a focus on deterministic systems and the availability of background knowledge.

An Environment for Interactive Process Modeling We plan to develop an interactive environment that lets users: specify process models of static and dynamic systems; display and edit a model’s structure and details graphically; utilize a model to simulate a system’s behavior over time; incorporate background knowledge cast as generic processes; indicate which processes to consider during model revision; invoke a revision module that improves a model’s fit to data. Our initial implementation focuses on quantitative processes, but future versions should also support qualitative models.

model npp; variables NPPc, E, IPAR, T1, T2, W, Topt, tempc, eet, PET, PETTWM, ahi, A, FPARFAS, monthlySolar, SolConver, MONFASNDVI, umd_veg; observable ahi,eet,tempc,Topt,MONFASNDVI,monthlySolar,PETTWM,umd_veg; process CarbonProd; equations NPPc = E * IPAR; process PhotoEfficiency; equations E = (0.389 * (T1 * (T2 * W))); process TempStress1; equations T1 = (0.8 + ((0.02 * Topt) - (0.0005 * (Topt ^ 2)))); process TempStress2; equations T2 = ((1.1814 / (1 + (2.718281828 ^ (0.2 * (Topt - 10 - tempc))))) / (1 + (2.718281828 ^ (0.3 * (tempc - 10 - Topt))))); process WaterStress; conditions PET!=0; equations W = (0.5 + (0.5 * (eet / PET))); process WSNoEvapoTrans; conditions PET==0; equations W = 0.5; process EvapoTrans; conditions tempc>0; equations PET = 1.6 * (10 * tempc / ahi) ^ A * PETTWM; • • • A Process Model for Carbon Production

Viewing and Editing a Process Model

Initial model: E = 0.56 · T1 · T2 · W T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] PET = 1.6 · (10 · Tempc / AHI)A · PET-TW-M SR  {3.06, 4.35, 4.35, 4.05, 5.09, 3.06, 4.05, 4.05, 4.05, 5.09, 4.05} Cross-validated RMSE = 465.212 and r2 = 0.799 Revised model: E = 0.353 · T10.00 · T2 0.08 · W 0.00 T2 = 0.83 / [(1 + e 1.0 · (Topt – Tempc – 6.34) ) · (1 + e 1.0 · (Tempc – Topt – 11.52) )] PET = 1.6 · (10 · Tempc / AHI)A · PET-TW-M SR  {0.61, 3.99, 2.44, 10.0, 2.21, 2.13, 2.04, 0.43, 1.35, 1.85, 1.61} Cross-validated RMSE = 397.306 and r2 = 0.853 [ 15% reduction ] Initial Results on Ecosystem Model Revision • • •

+ - NBLR NBLA PBS + - dspA psbA1 Health + - - + - + RR psbA2 Photo + - Light cpcB A Qualitative Model of Gene Regulation How do plants modify their photosynthetic apparatus in high light? This model is qualitative but relates continuous variables, much as formalisms from qualitative physics (e.g., Forbus, 1984).

simulation languages, numerical analysis qualitative reasoning human-computer interaction biology, physiology, Earth science Fields Contributing to the Proposed Research computational scientific discovery

Plans for Experimental Evaluation Our plans for evaluation include a variety of methods, including: demonstrating new functionality in each of three domains collecting and analyzing traces of users’ interactions formulation of hypotheses about the human-computer system lesion studies with synthetic data to test those hypotheses revision of environment based on results of experiments Taken together, these studies should uncover the design principles that produce successful modeling and discovery environments. The methodology for evaluating intelligent assistants is not yet mature, so we must develop it along the way.

Some Legitimate Reviewer Concerns 1. A general-purpose modeling environment may not be justified given the differences in the proposed application domains. 2. We should take a closer look at existing modeling environments like STELLA and link our work to them if possible. 3. The research plan for modeling human activities is vague. 4. The schedule of work follows a standard software life cycle, rather than giving detail about tasks relevant to the project.

Less Legitimate Reviewer Concerns 5. We may not need to develop new modeling formalism, since inductive logic programming can handle most of our needs. 6. The proposed research program will not use a "cutting-edge AI approach" because it relies on the heuristic search metaphor. 7. We should not incorporate qualitative physics because it did not scale well, has made little progress, and has had little impact. 8. The proposal reads like a CYC project for scientists. 9. The work plan is sketchy and, since the main task is developing the modeling environment, one postdoc may not be enough.

Less Legitimate Reviewer Concerns 10. The proposal makes little commitment to data-mining methods and it does not offertimely advances. There is no conceptual novelty, and the framework is not "radically new". 11. No work is cited for keeping qualitative, quantitative, verbal, and visual representations consistent. 12. There is a fundamental assumption that automated discovery tools are inferior to interactive ones. 13. We should take advantage of recent advances in genetic methods and ones for learning generative models. 14. The research seems unlikely to have a big commercial impact.

Planned Collaborations Likely collaborations with current UCC researchers include: using constraints to control search for models (Freuder et al.) learning numeric constraints from observations (Freuder et al.) using methods for case adaptation to revise models (Bridge) modeling regulation of apoptotic cell death (Cotter, Higgins) modeling behavior of Irish ecosystems (O’Kane) We also plan to continue ongoing collaborations with scientists at: • Stanford University, ISLE, and NASA Ames (USA) • Josef Stefan Institute (Slovenia) • NTT Communication Science Laboratories (Japan)

Proposed Research Staff Principal Investigator – Oversight of entire research project Senior Scientist – Oversight of environment design/implementation Postdoc – Implementing and maintaining modeling environment Postdocs – One for each scientific application domain Postdoc – Experimental evaluation of modeling environment PhD students – Two for each scientific application domain Laboratory manager – Responsible for general operations Computer manager – Responsible for computing environment Technical writer – Prepare manuals and co-author research reports

In summary, unlike work in the data-mining paradigm, our research on computational modeling and discovery: Concluding Remarks moves beyond description and prediction to explanatory models; uses domain knowledge to initialize and constrain search for improved models; provides an interactive environment that lets the user specify initial models and direct the revision process; presents the revised knowledge in some communicable notation that is familiar to domain experts. This approach holds great potential to aid the modeling of complex systems in science and engineering.

The NPPc Portion of CASA NPPc = Smonthmax (E·IPAR, 0) E = 0.56 · T1 · T2 · W T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt2 T2 = 1.18 / [(1 + e0.2 · (Topt – Tempc – 10) ) · (1 + e0.3 · (Tempc – Topt – 10) )] W = 0.5 + 0.5 · EET / PET PET = 1.6 · (10 · Tempc / AHI)A · PET-TW-M if Tempc > 0 PET = 0 if Tempc < 0 A = 0.00000068 · AHI3 – 0.000077 · AHI2 + 0.018 · AHI + 0.49 IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG) , 0.95] SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000)

The NPPc Portion of CASA NPPc E IPAR e_max W T2 T1 SOLAR FPAR A PET EET Topt SR AHI PETTWM Tempc NDVI VEG

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Bacon.1–Bacon.5 Abacus, Coper Fahrehneit, E*, Tetrad, IDSN Hume, ARC DST, GPN LaGrange SDS SSF, RF5, LaGramge AM Glauber NGlauber IDSQ, Live RL, Progol HR Dendral Dalton, Stahl Stahlp, Revolver Gell-Mann BR-3, Mendel Pauli BR-4 IE Coast, Phineas, AbE, Kekada Mechem, CDP Astra, GPM Numeric laws Qualitative laws Structural models Process models History of Research on Computational Scientific Discovery Legend

Interactive Software for Computational Modeling and Discovery

Interactive Software for Computational Modeling and Discovery

Presentation Transcript

Pat Langley Institute for the Study of Learning and Expertise and Center for the Study of Language and Information Stanf

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information

Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and

The Battle of Palo Alto

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information

Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and

Pat Langley Center for the Study of Language and Information

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information

Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information

Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and

Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and

Palo Alto PCNSE7 study guide

Azure AIP and Palo Alto

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information

Pat Langley Arizona State University and Institute for the Study of Learning and Expertise

Pat Langley Center for the Study of Language and Information

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information