1 / 24

Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and

Lesssons for the Computational Discovery of Scientific Knowledge. Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University, Stanford, California http://www.isle.org/ ~ langley

nanji
Download Presentation

Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lesssons for the Computational Discovery of Scientific Knowledge Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University, Stanford, California http://www.isle.org/~langley langley@isle.org Thanks to S. Bay, V. Brooks, S. Klooster, A. Pohorille, C. Potter, K. Saito, J. Shrager, M. Schwabacher, and A. Torregrosa.

  2. Outline of the Talk 1. History of machine learning applications 2. Traditional lessons from applied machine learning 3. History of computational scientific discovery 4. Two application efforts in scientific discovery 5. Lessons from these application efforts 6. Directions for future research

  3. Early 1980s: D. Michie et al. champion use of decision-tree induction on industrial problems. • During 1980s: Parallel application developments in neural networks and case-based learning. • Early 1990s: Initial reviews of machine learning applications. • Mid 1993: First workshops on applications of machine learning. • Mid 1995: CACM paper analyzes factors underlying success. • Mid 1995: KDD conference becomes the default meeting for papers on machine learning applications. • Early 1998: Special issue of Machine Learning, with editorial, on applications. History of Machine Learning Applications

  4. Steps in the Application of Machine Learning Formulating the Problem Engineering the Representation Collecting and Preparing Data Induction Process Evaluating the Learned Knowledge Gaining User Acceptance

  5. Areas of Machine Learning Applications There exist a number of application movements within the field of machine learning: data mining for classification/regression tasks empirical natural language processing applied reinforcement learning adaptive interfaces for personalized services computational scientific discovery These types of applications differ in the demands they make and in the issues they raise.

  6. Data Mining vs. Scientific Discovery There exist two computational paradigms for discovering explicit knowledge from data: Data mining generates knowledge cast as decision trees, logical rules, or other notations invented by AI researchers; Computational scientific discovery instead uses equations, structural models, reaction pathways, or other formalisms invented by scientists and engineers. Both approaches draw on heuristic search to find regularities in data, but they differ considerably in their emphases.

  7. 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Bacon.1–Bacon.5 Abacus, Coper Fahrehneit, E*, Tetrad, IDSN Hume, ARC DST, GPN LaGrange SDS SSF, RF5, LaGramge AM Glauber NGlauber IDSQ, Live RL, Progol HR Dendral Dalton, Stahl Stahlp, Revolver Gell-Mann BR-3, Mendel Pauli BR-4 IE Coast, Phineas, AbE, Kekada Mechem, CDP Astra, GPM Numeric laws Qualitative laws Structural models Process models History of Research on Computational Scientific Discovery Legend

  8. Successes of Computational Scientific Discovery Over the past decade, systems of this type have helped discover new knowledge in many scientific fields: • stellar taxonomies from infrared spectra (Cheeseman et al., 1989) • qualitative chemical factors in mutagenesis (King et al., 1996) • quantitative laws of metallic behavior (Sleeman et al., 1997) • qualitative conjectures in number theory (Colton et al., 2000) • temporal laws of ecological behavior (Todorovski et al., 2000) • reaction pathways in catalytic chemistry (Valdes-Perez, 1994, 1997) Each of these has led to publications in the refereed literature of the relevant scientific field (see Langley, 2000).

  9. Steps in Applying Computational Scientific Discovery problem formulation representation engineering algorithm manipulation data collection/ manipulation algorithm invocation filtering and interpretation

  10. Two Applications for Scientific Discovery Given Find Data on climate variables and carbon production over space and time A model of the Earth’s ecosystem that fits and explains these data Given Find Gene expression levels, over time, for wild and mutant organisms. A model of gene regulation that fits and explains these data

  11. NBLR NBLA PBS + - + - DFR psbA1 Health + - - + - RR psbA2 Photo + + - Light cpcB Traditional notations from machine learning are not communicated easily to domain scientists. Lesson 1 Ecosystem model Gene regulation model NPPc = Smonthmax (E·IPAR, 0) E = 0.56 · T1 · T2 · W T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt2 T2 = 1.18 / [(1 + e0.2 · (Topt – Tempc – 10) ) · (1 + e0.3 · (Tempc – Topt – 10) )] W = 0.5 + 0.5 · EET / PET PET = 1.6 · (10 · Tempc / AHI)A · PET-TW-M if Tempc > 0 PET = 0 if Tempc < 0 A = 0.00000068 · AHI3 – 0.000077 · AHI2 + 0.018 · AHI + 0.49 IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG) , 0.95] SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000)

  12. × × NBLR NBLA PBS + - + + DFR psbA1 Health + - - - RR psbA2 Photo + - Light cpcB NBLR NBLA PBS + - + - DFR psbA1 Health + - - + - RR psbA2 Photo + + - Light cpcB Scientists often have initial models that should influence the discovery process. Lesson 2 Discovery Observations Initial model m Revised model

  13. Number of variables Number of equations Number of parameters Number of samples 8 11 20 303 Scientific data are often rare and difficult to obtain rather than being plentiful. Lesson 3 Ecosystem model Gene regulation model Number of variables Number of initial links Number of possible links Number of samples 9 11 70 20

  14. NPPc E IPAR NBLR NBLA PBS e_max W T2 T1 SOLAR FPAR + - + - DFR psbA1 Health A PET EET Topt SR + - - + - RR psbA2 Photo AHI PETTWM Tempc NDVI VEG + + - Light cpcB Lesson 4 Scientists want models that move beyond description to provide explanations of their data. Ecosystem model Gene regulation model

  15. × × NBLR NBLA PBS + - + + DFR psbA1 Health + - - - RR psbA2 Photo + - Light cpcB NBLR NBLA PBS + - + - DFR psbA1 Health + - - + - RR psbA2 Photo + + - Light cpcB Scientists want computational assistance rather than automated discovery systems. Lesson 5 Discovery Observations Initial model Revised model

  16. An Environment for Interactive Modeling In response, we are developing an environment that lets users: specify process models of static and dynamic systems; display and edit a model’s structure and details graphically; utilize a model to simulate a system’s behavior over time; incorporate background knowledge cast as generic processes; indicate which processes to consider during model revision; invoke a revision module that improves a model’s fit to data. The current environment focuses on quantitative processes, but future versions will also support qualitative models.

  17. model npp; variables NPPc, E, IPAR, T1, T2, W, Topt, tempc, eet, PET, PETTWM, ahi, A, FPARFAS, monthlySolar, SolConver, MONFASNDVI, umd_veg; observable ahi,eet,tempc,Topt,MONFASNDVI,monthlySolar,PETTWM,umd_veg; process CarbonProd; equations NPPc = E * IPAR; process PhotoEfficiency; equations E = (0.389 * (T1 * (T2 * W))); process TempStress1; equations T1 = (0.8 + ((0.02 * Topt) - (0.0005 * (Topt ^ 2)))); process TempStress2; equations T2 = ((1.1814 / (1 + (2.718281828 ^ (0.2 * (Topt - 10 - tempc))))) / (1 + (2.718281828 ^ (0.3 * (tempc - 10 - Topt))))); process WaterStress; conditions PET!=0; equations W = (0.5 + (0.5 * (eet / PET))); process WSNoEvapoTrans; conditions PET==0; equations W = 0.5; process EvapoTrans; conditions tempc>0; equations PET = 1.6 * (10 * tempc / ahi) ^ A * PETTWM; • • • A Process Model for Carbon Production

  18. Viewing and Editing a Process Model

  19. Directions for Future Research These lessons suggest the field needs increased research on: methods for discovering knowledge in scientific formalisms techniques for revising existing scientific models approaches to dealing with small data sets algorithms for discovering explanatory models interactive environments for scientific knowledge discovery Taken together, these emphases should address the needs of domain scientists and produce interesting new methods.

  20. In Memoriam Early last year, computational scientific discovery lost two of its founding fathers: Both contributed to the field in many ways: posing new problems, inventing methods, training students, and organizing meetings. Moreover, both were interdisciplinary researchers who contributed to computer science, psychology, philosophy, and statistics. Herb Simon and Jan Zytkow were excellent role models that we should all aim to emulate. Herbert A. Simon (1916 – 2001) Jan M. Zytkow (1945 – 2001)

  21. The NPPc Portion of CASA NPPc = Smonthmax (E·IPAR, 0) E = 0.56 · T1 · T2 · W T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt2 T2 = 1.18 / [(1 + e0.2 · (Topt – Tempc – 10) ) · (1 + e0.3 · (Tempc – Topt – 10) )] W = 0.5 + 0.5 · EET / PET PET = 1.6 · (10 · Tempc / AHI)A · PET-TW-M if Tempc > 0 PET = 0 if Tempc < 0 A = 0.00000068 · AHI3 – 0.000077 · AHI2 + 0.018 · AHI + 0.49 IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG) , 0.95] SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000)

  22. The NPPc Portion of CASA NPPc E IPAR e_max W T2 T1 SOLAR FPAR A PET EET Topt SR AHI PETTWM Tempc NDVI VEG

  23. A Model of Photosynthesis Regulation How do plants modify their photosynthetic apparatus in high light? + - NBLR NBLA PBS + - DFR psbA1 Health + - - + - + RR psbA2 Photo + - Light cpcB

More Related