1 / 8

Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona

Challenges for the Computational Discovery of Scientific Knowledge. Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Institute for the Study of Learning and Expertise Palo Alto, California.

naeva
Download Presentation

Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenges for the Computational Discovery of Scientific Knowledge Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Institute for the Study of Learning and Expertise Palo Alto, California Thanks to K. Arrigo, D. Billman, M. Bravo, S. Borrett, W. Bridewell, S. Dzeroski, and L. Todorovski for their contributions to this research, which is funded by a grant from the National Science Foundation.

  2. Drawbacks of Scientific Data Mining Because it borrows from work on commercial applications, most work on scientific data mining: generates models in forms inappropriate to most sciences makes incorrect assumptions about the available inputs focuses on convenient algorithmic issues, not scientists’ needs We need to redirect attention toward a broader range of discovery tasks that actually arise in scientific fields. Data-mining researchers would benefit from looking at the older literature on computational scientific discovery.

  3. NBLR NBLA PBS + - + - DFR psbA1 Health + - - + - RR psbA2 Photo + + - Light cpcB Traditional data-mining notations are not easily understood by or communicated to domain scientists. Most sciences state and communicate models in formalisms they have used for decades. We need more work on discovering scientific knowledge cast in communicable forms (Dzeroski & Todorovski, 2007). Claim 1: Scientific Notations Ecosystem model Gene regulation model NPPc = Smonthmax (E·IPAR, 0) E = 0.56 · T1 · T2 · W T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt2 T2 = 1.18 / [(1 + e0.2 · (Topt – Tempc – 10) ) · (1 + e0.3 · (Tempc – Topt – 10) )] W = 0.5 + 0.5 · EET / PET PET = 1.6 · (10 · Tempc / AHI)A · PET-TW-M if Tempc > 0 PET = 0 if Tempc < 0 A = 0.00000068 · AHI3 – 0.000077 · AHI2 + 0.018 · AHI + 0.49 IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG) , 0.95] SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000)

  4. NBLR NBLA PBS + - + + DFR psbA1 Health + - - × × - RR psbA2 Photo + - NBLR NBLA PBS Light cpcB + - + - DFR psbA1 Health + - - + - RR psbA2 Photo + + - Light cpcB Scientists often have initial knowledge that should influence the discovery process. Ignoring this knowledge can produce models that scientists reject as nonsensical (Pazzani et al., 2001). Claim 2: Background Knowledge Observations Model Revision Revised model Initial model

  5. Number of variables Number of equations Number of parameters Number of samples Number of variables Number of initial links Number of possible links Number of samples 8 11 20 303 9 11 70 20 Most data-mining work assumes that large data sets are available. But in many scientific domains, data are rare and hard to obtain. Discovering scientific knowledge from small data sets raises an entirely different set of challenges (Lee et al., 1998). We need more research on this important aspect of discovery. Claim 3: Small Data Sets Ecosystem model Gene regulation model

  6. NPPc E IPAR NBLR NBLA PBS e_max W T2 T1 SOLAR FPAR + - + - DFR psbA1 Health A PET EET Topt SR + - - + - RR psbA2 Photo AHI PETTWM Tempc NDVI VEG + + - Light cpcB Most work on data mining finds models that, although accurate, merely describe the observations. However, scientists often want models that explain their data using familiar concepts. Explanatory models can include theoretical entities and processes that link back to domain knowledge (Langley et al., 2002). Claim 4: Scientific Explanation Ecosystem model Gene regulation model

  7. NBLR NBLA PBS + - + + DFR psbA1 Health + - - × × - RR psbA2 Photo + - NBLR NBLA PBS Light cpcB + - + - DFR psbA1 Health + - - + - RR psbA2 Photo + + - Light cpcB Most data-mining work focused on entirely automated algorithms. But most scientists want computational aids rather than systems that would replace them. We need more work on interactive discovery (Bridewell et al., 2007). Claim 5: Interactive Discovery Domain user Model Revision Observations Revised model Initial model

  8. The PROMETHEUS System (Bridewell et al., 2007)

More Related