Loading in 2 Seconds...

Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin

Loading in 2 Seconds...

- By
**emily** - Follow User

- 426 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information Stanford University Kevin' - emily

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Data Mining vs. Scientific Discovery

### A Model of Ross Sea Ecosystem

### Inductive Revision of Ecosystem Models

### A Space of Ecosystem Models

### Phytoplankton Loss in Ross Sea Ecosystem

### Grazing in the Ross Sea Ecosystem

### Process Model of Ross Sea Ecosystem

### Inductive Revision of Process Models

### Generic Processes for Aquatic Ecosystems

### A Method for Process Model Revision

### Revised Model of Ross Sea Ecosystem

### Initial Results on Ross Sea Training Data

### Initial Results on Ross Sea Test Data

### Revised Results on Ross Sea Test Data

### Interfacing with Scientists

### Intellectual Influences

### Directions for Future Research

### Contributions of the Research

### The Challenge of Systems Science

### Why Are Process Models Interesting?

### Advantages of Quantitative Process Models

### Inductive Process Modeling

### Challenges of Inductive Process Modeling

### Generating Predictions and Explanations

### Generic Processes as Background Knowledge

### Estimating Parameters in Process Models

### A Process Model for an Aquatic Ecosystem

### Generic Processes for Aquatic Ecosystems

### Inductive Process Modeling

### The NPPc Portion of CASA

### Results of Revising the NPP Model

### Generic Processes for Photosynthesis Regulation

### A Process Model for Photosynthetic Regulation

Ecological Process Models

Nima Asgharbeygi, Pat Langley, Stephen Bay

Center for the Study of Language and Information

Stanford University

Kevin Arrigo

Department of Geophysics

Stanford University

Thanks to S. Dzeroski, J. Sanchez, K. Saito, J. Shrager, and L. Todorovski for their

contributions to this research, which is funded by the US National Science Foundation.

There exist two computational paradigms for discovering explicit knowledge from data.

The data mining movement develops computational methods that:

In contrast, computational scientific discovery focuses on:

induce predictive models from large (often business) data sets;

represent models in notations invented by AI researchers.

- constructing models from (often small) scientific data sets;
- stated in formalisms invented by scientists themselves.

This talk focuses on applications of the second framework to environmental and ecosystem modeling.

model RossSeaEcosystem

variables: phyto, zoo, nitro, residue

observables: phyto, nitro

d[phyto,t,1] = 0.307 phyto 0.495 zoo + 0.411 phyto

d[zoo,t,1] = 0.251 zoo + 0.615 0.495 zoo

d[residue,t,1] = 0.307 phyto +0.251 zoo + 0.385 0.495 zoo 0.005 residue

d[nitro,t,1] = 0.098 0.411 phyto + 0.005 residue

revised model

model RossSeaEcosystem

variables: phyto, zoo, nitro, residue

observables: phyto, nitro

d[phyto,t,1] = 0.307 phyto 0.495 zoo

+ 0.411 phyto

d[zoo,t,1] = 0.251 zoo + 0.615 0.495 zoo

d[residue,t,1] = 0.307 phyto +0.251 zoo

+ 0.385 0.495 zoo 0.005 residue

d[nitro,t,1] = 0.098 0.411 phyto + 0.005 residue

Revision

model RossSeaEcosystem

variables: phyto, zoo, nitro, residue

observables: phyto, nitro

d[phyto,t,1] = 0.307 phyto 0.495 zoo

+ 0.411 phyto

d[zoo,t,1] = 0.251 zoo + 0.615 0.495 zoo

d[residue,t,1] = 0.307 phyto +0.251 zoo

+ 0.385 0.495 zoo 0.005 residue

d[nitro,t,1] = 0.098 0.411 phyto + 0.005 residue

initial model

Model revision requires ways to constrain search through this space.

model RossSeaEcosystem

variables: phyto, zoo, nitro, residue

observables: phyto, nitro

d[phyto,t,1] = 0.307 phyto 0.495 zoo + 0.411 phyto

d[zoo,t,1] = 0.251 zoo + 0.615 0.495 zoo

d[residue,t,1] = 0.307 phyto +0.251 zoo + 0.385 0.495 zoo 0.005 residue

d[nitro,t,1] = 0.098 0.411 phyto + 0.005 residue

Phytoplankton loss is a process that affects two variables; no model should include one influence without the other.

model RossSeaEcosystem

variables: phyto, zoo, nitro, residue

observables: phyto, nitro

d[phyto,t,1] = 0.307 phyto 0.495 zoo + 0.411 phyto

d[zoo,t,1] = 0.251 zoo + 0.615 0.495 zoo

d[residue,t,1] = 0.307 phyto +0.251 zoo + 0.385 0.495 zoo 0.005 residue

d[nitro,t,1] = 0.098 0.411 phyto + 0.005 residue

We can view an ecosystem model as a set of processes that provide an alternative way to encode its assumptions.

variables: phyto, zoo, nitro, residue

observables: phyto, nitro

process phyto_loss

equations: d[phyto,t,1] = 0.307 phyto

d[residue,t,1] = 0.307 phyto

process zoo_loss

equations: d[zoo,t,1] = 0.251 zoo

d[residue,t,1] = 0.251 zoo

process zoo_phyto_grazing

equations: d[zoo,t,1] = 0.615 0.495 zoo

d[residue,t,1] = 0.385 0.495 zoo

d[phyto,t,1] = 0.495 zoo

process nitro_uptake

equations: d[phyto,t,1] = 0.411 phyto

d[nitro,t,1] = 0.098 0.411 phyto

process nitro_remineralization;

equations: d[nitro,t,1] = 0.005 residue

d[residue,t,1 ] = 0.005 residue

revised model

model RossSeaEcosystem

variables: phyto, zoo, nitro, residue

observables: phyto, nitro

d[phyto,t,1] = 0.307 phyto 0.495 zoo

+ 0.411 phyto

d[zoo,t,1] = 0.251 zoo + 0.615 0.495 zoo

d[residue,t,1] = 0.307 phyto +0.251 zoo

+ 0.385 0.495 zoo 0.005 residue

d[nitro,t,1] = 0.098 0.411 phyto + 0.005 residue

process exponential_growth

variables: P {population}

equations: d[P,t] = [0, 1,] P

process logistic_growth

variables: P {population}

equations: d[P,t] = [0, 1, ] P (1 P / [0, 1, ])

process constant_inflow

variables: I {inorganic_nutrient}

equations: d[I,t] = [0, 1, ]

process consumption

variables: P1 {population}, P2 {population}, nutrient_P2

equations: d[P1,t] = [0, 1, ] P1 nutrient_P2,

d[P2,t] = [0, 1, ] P1 nutrient_P2

process no_saturation

variables: P {number}, nutrient_P {number}

equations: nutrient_P = P

process saturation

variables: P {number}, nutrient_P {number}

equations: nutrient_P = P / (P + [0, 1, ])

Revision

model RossSeaEcosystem

variables: phyto, zoo, nitro, residue

observables: phyto, nitro

d[phyto,t,1] = 0.307 phyto 0.495 zoo

+ 0.411 phyto

d[zoo,t,1] = 0.251 zoo + 0.615 0.495 zoo

d[residue,t,1] = 0.307 phyto +0.251 zoo

+ 0.385 0.495 zoo 0.005 residue

d[nitro,t,1] = 0.098 0.411 phyto + 0.005 residue

initial model

generic processes

generic process exponential_loss generic process remineralization

variables: S{species}, D{detritus} variables: N{nutrient}, D{detritus}

parameters: [0, 1] parameters: [0, 1]

equations: d[S,t,1] = 1 S equations: d[N, t,1] = D

d[D,t,1] = S d[D, t,1] = 1 D

generic process grazing generic process constant_inflow

variables: S1{species}, S2{species}, D{detritus} variables: N{nutrient}

parameters: [0, 1], [0, 1] parameters: [0, 1]

equations: d[S1,t,1] = S1 equations: d[N,t,1] =

d[D,t,1] = (1 ) S1

d[S2,t,1] = 1 S1

generic process nutrient_uptake

variables: S{species}, N{nutrient}

parameters: [0, ], [0, 1], [0, 1]

conditions: N >

equations: d[S,t,1] = S

d[N,t,1] = 1 S

We have implemented RPM, an algorithm that revises an initial process model in four main stages:

1. Find all ways to instantiate available generic processes with specific variables, subject to type constraints;

2. Generate candidate model structures by deleting the current processes and adding new ones, subject to complexity limits;

3. For each generic model, carry out search through parameter space to find good coefficients [difficult];

4. Return a list of revised models ordered by their overall scores.

The evaluation metric can be squared error or description length based on error and distance from the initial model.

model RossSeaEcosystem

variables: phyto, zoo, nitro, residue, light, G, growth_rate, nitro_rate, light_rate

observables: phyto, nitro, light

d[phyto,t,1] = 0.307 phyto G zoo + growth_rate phyto

d[zoo,t,1] = 0.615 G zoo

d[residue,t,1] = 0.307 phyto +0.385 G zoo 0.083 residue

d[nitro,t,1] = 1 n_to_c growth_rate phyto + 0.083 n_to_c residue

G = 0.415 (1 – exp(– 1 0.27 phyto)

growth_rate = r_max min(nitro_rate, light_rate)

nitro_rate = nitro / (nitro + 4.33)

light_rate = light / (light + 11.67)

n_to_c = 0.251, r_max = 0.194, remin_rate = 0.0676

The best revised model reproduces the observations quite well.

But the model predicts nearly the same behavior for both years.

Refitting initial values for zooplankton gives better generalization.

Because few scientists want to be replaced, we are developing PROMETHEUS, an interactive environment that lets users:

specify a quantitative process model of the target system;

display and edit the model’s structure and details graphically;

simulate the model’s behavior over time and situations;

compare the model’s predicted behavior to observations;

invoke a revision module in response to detected anomalies.

The environment offers computational assistance in forming and evaluating models but lets the user retain control.

Our approach to computational discovery incorporates ideas from many traditions:

- computational scientific discovery (e.g., Langley et al., 1983);
- theory revision in machine learning (e.g., Towell, 1991);
- qualitative physics and simulation (e.g., Forbus, 1984);
- languages for scientific simulation (e.g., STELLA, MATLAB);
- interactive tools for data analysis (e.g., Schneiderman, 2001).

Our work combines ideas from machine learning, AI, programming languages, and human-computer interaction.

Despite our progress to date, we need further work in order to:

produce additional results on other ecosystem modeling tasks

develop improved methods for fitting model parameters

implement heuristic methods for searching the structure space

utilize knowledge of subsystems to further constrain search

augment the modeling environment to make it more usable

Process modeling has great potential to aid model development in environmental science.

In summary, our work on computational discovery has produced:

a new formalism for representing scientific process models;

an encoding for background knowledge as generic processes;

an algorithm for revising process models with time-series data;

an interactive environment for model construction/utilization.

We have demonstrated this approach to model revision on both ecosystem modeling and an environmental domain.

The PROMETHEUS modeling/revision environment is available at:

http://www.isle.org/process.html

Disciplines like Earth science differ from traditional disciplines by:

focusing on synthesis rather than analysis in their operation;

using computer modeling as one of their central methods;

developing system-level models with many variables / relations;

evaluating models on observational, not experimental, data.

Constructing such models are complex tasks that would benefit from computational aids, but existing methods are insufficient.

Process models are a crucial target for machine learning because:

they incorporate scientific formalisms rather than AI notations;

that are easily communicable to scientists and engineers;

they move beyond descriptive generalization to explanation;

while retaining the modularity needed to support induction.

These reasons point to process models as an ideal representation for scientific and engineering knowledge.

Process models are an important alternative to formalisms used currently in machine learning.

Process models offer scientists a promising framework because:

they embed quantitative relations within qualitative structure;

that refer to notations and mechanisms familiar to experts;

they provide dynamical predictions of changes over time;

they offer causal and explanatory accounts of phenomena;

while retaining the modularity needed to support induction.

Quantitative process models provide an important alternative to formalisms used currently in ecosystem modeling.

Our response is to design, construct, and evaluate computational methods for inductive process modeling, which:

represent scientific models as sets of quantitative processes;

use these models to predict and explain observational data;

search a space of process models to find good candidates;

utilize background knowledge to constrain this search.

This framework has great potential to aid environmental science, but it raises new computational challenges.

Process model induction differs from typical learning tasks in that:

process models characterize behavior of dynamical systems;

variables are continuous but can have discontinuous behavior;

observations are not independently and identically distributed;

models may contain unobservable processes and variables;

multiple processes can interact to produce complex behavior.

Compensating factors include a focus on deterministic systems and the availability of background knowledge.

To utilize or evaluate a given process model, we must simulate its behavior over time:

specify initial values for input variables and time step size;

on each time step, determine which processes are active;

solve active algebraic/differential equations with known values;

propagate values and recursively solve other active equations;

when multiple processes influence the same variable, assume their effects are additive.

This performance method makes specific predictions that we can compare to observations.

Our framework casts background knowledge as generic processes that specify:

the variables involved in a process and their types;

the parameters appearing in a process and their ranges;

the forms of conditions on the process; and

the forms of associated equations and their parameters.

Generic processes are building blocks from which one can compose a specific process model.

To estimate the parameters for each generic model structure, the IPM algorithm:

1. Selects random initial values that fall within ranges specified in the generic processes;

2. Improves these parameters using the Levenberg-Marquardt method until it reaches a local optimum;

3. Generates new candidate values through random jumps along dimensions of the parameter vector and continue search;

4. If no improvement occurs after N jumps, it restarts the search from a new random initial point.

This multi-level method gives reasonable fits to time-series data from a number of domains, but it is computationally intensive.

variables: phyto, nitro, residue, light, growth_rate, effective_light, ice_factor

observables: phyto, nitro, light, ice_factor

process phyto_loss

equations: d[phyto,t,1] = 0.1 phyto

d[residue,t,1] = 0.1 phyto

process phyto_growth

equations: d[phyto,t,1] = growth_rate phyto

process phyto_uptakes_nitro

conditions: nitro > 0

equations: d[nitro,t,1] = 1 0.204 growth_rate phyto

process growth_limitation

equations: growth_rate = 0.23 min(nitrate_rate, light_rate)

process nitrate_availability

equations: nitrate_rate = nitrate / (nitrate + 5)

process light_availability

equations: light_rate = effective_light / (effective_light + 50)

process light_attenuation

equations: effective_light = light ice_factor

generic process exponential_loss generic process remineralization

variables: S{species}, D{detritus} variables: N{nutrient}, D{detritus}

parameters: [0, 1] parameters: [0, 1]

equations: d[S,t,1] = 1 S equations: d[N, t,1] = D

d[D,t,1] = S d[D, t,1] = 1 D

generic process grazing generic process constant_inflow

variables: S1{species}, S2{species}, D{detritus} variables: N{nutrient}

parameters: [0, 1], [0, 1] parameters: [0, 1]

equations: d[S1,t,1] = S1 equations: d[N,t,1] =

d[D,t,1] = (1 ) S1

d[S2,t,1] = 1 S1

generic process nutrient_uptake

variables: S{species}, N{nutrient}

parameters: [0, ], [0, 1], [0, 1]

conditions: N >

equations: d[S,t,1] = S

d[N,t,1] = 1 S

process model

model AquaticEcosystem

variables: nitro, phyto, zoo, nutrient_nitro, nutrient_phyto

observables: nitro, phyto, zoo

process phyto_exponential_growth

equations: d[phyto,t] = 0.1 phyto

process zoo_logistic_growth

equations: d[zoo,t] = 0.1 zoo / (1 zoo / 1.5)

process phyto_nitro_consumption

equations: d[nitro,t] = 1 phyto nutrient_nitro,

d[phyto,t] = 1 phyto nutrient_nitro

process phyto_nitro_no_saturation

equations: nutrient_nitro = nitro

process zoo_phyto_consumption

equations: d[phyto,t] = 1 zoo nutrient_phyto,

d[zoo,t] = 1 zoo nutrient_phyto

process zoo_phyto_saturation

equations: nutrient_phyto = phyto / (phyto + 0.5)

Induction

process exponential_growth

variables: P {population}

equations: d[P,t] = [0, 1,] P

process logistic_growth

variables: P {population}

equations: d[P,t] = [0, 1, ] P (1 P / [0, 1, ])

process constant_inflow

variables: I {inorganic_nutrient}

equations: d[I,t] = [0, 1, ]

process consumption

variables: P1 {population}, P2 {population}, nutrient_P2

equations: d[P1,t] = [0, 1, ] P1 nutrient_P2,

d[P2,t] = [0, 1, ] P1 nutrient_P2

process no_saturation

variables: P {number}, nutrient_P {number}

equations: nutrient_P = P

process saturation

variables: P {number}, nutrient_P {number}

equations: nutrient_P = P / (P + [0, 1, ])

generic processes

NPPc = Smonthmax (E·IPAR, 0)

E = 0.56 · T1 · T2 · W

T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt2

T2 = 1.18 / [(1 + e0.2 · (Topt – Tempc – 10) ) · (1 + e0.3 · (Tempc – Topt – 10) )]

W = 0.5 + 0.5 · EET / PET

PET = 1.6 · (10 · Tempc / AHI)A · PET-TW-M if Tempc > 0

PET = 0 if Tempc < 0

A = 0.00000068 · AHI3 – 0.000077 · AHI2 + 0.018 · AHI + 0.49

IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver

FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG) , 0.95]

SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000)

E = 0.56 · T1 · T2 · W

T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )]

PET = 1.6 · (10 · Tempc / AHI)A · PET-TW-M

SR {3.06, 4.35, 4.35, 4.05, 5.09, 3.06, 4.05, 4.05, 4.05, 5.09, 4.05}

RMSE on training data = 465.212 and r2 = 0.799

Revised model:

E = 0.353 · T10.00 · T2 0.08 · W 0.00

T2 = 0.83 / [(1 + e 1.0 · (Topt – Tempc – 6.34) ) · (1 + e 1.0 · (Tempc – Topt – 11.52) )]

PET = 1.6 · (10 · Tempc / AHI)A · PET-TW-M

SR {0.61, 3.99, 2.44, 10.0, 2.21, 2.13, 2.04, 0.43, 1.35, 1.85, 1.61}

Cross-validated RMSE = 397.306 and r2 = 0.853 [ 15% reduction ]

•

•

•

generic process translation generic process transcription

variables: P{protein}, M{mRNA} variables: M{mRNA}, R{rate}

parameters: [0, 1] parameters:

equations: d[P,t,1] = M equations: d[M,t,1] = R

generic process regulate_one generic process regulate_two

variables: R{rate}, S{signal} variables: R{rate}, S{signal}

parameters: [1 , 1] parameters: [1 , 1], [0, 1]

equations: R = S equations: R = S

d[S, t,1] = 1 S

generic process automatic_degradation generic process controlled_degradation

variables: C{concentration} variables: D{concentration}, E{concentration}

conditions: C > 0 conditions: D > 0, E > 0

parameters: [0, 1] parameters: [0, 1]

equations: d[C,t,1] = 1 C equations: d[D,t,1] = 1 E

d[E,t,1] = 1 E

generic process photosynthesis

variables: L{light}, P{protein}, R{redox}, S{ROS}

parameters: [0, 1], [0, 1]

equations: d[R,t,1] = L P

d[S,t,1] = L P

variables: light, mRNA_protein, ROS, redox, transcription_rate

observables: light, mRNA

process photosynthesis;

equations: d[redox,t,1] = 0.0155 light protein

d[ROS,t,1] = 0.019 light protein

process protein_translation process mRNA_transcription

equations: d[protein,t,1] = 7.54 mRNA equations: d[mRNA,t,1] = transcription_rate

process regulate_one_1 process regulate_two_2

equations: transcription_rate = 0.99 light equations: transcription_rate = 1.203 redox

d[redox,t,1] = 0.0002 redox

process automatic_degradation_1 process controlled_degradation_1

conditions: protein > 0 conditions: redox > 0, ROS > 0

equations: d[protein,t,1] = 1.91 protein equations: d[redox,t,1] = 0.0003 ROS

d[ROS,t,1] = 0.0003 ROS

Download Presentation

Connecting to Server..