1 / 17

Discovering Causal Models

Discovering Causal Models. Lecture 20. Causal Discovery: Introduction. Causal models highlight interactions among variables, often without specifying mechanisms for those relationships. Previously, we discussed three approaches to representing, applying, and viewing causal interactions:.

Download Presentation

Discovering Causal Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovering Causal Models Lecture 20

  2. Causal Discovery: Introduction Causal models highlight interactions among variables, often without specifying mechanisms for those relationships. Previously, we discussed three approaches to representing, applying, and viewing causal interactions: • structural equation models, • Bayesian networks, and • qualitative causal models. In this lecture, we discuss informatics tools that learn these three kinds of models from scientific data. We also describe a system that discovers more general mathematical equations with causal interpretations.

  3. Structural Equation Models A structural equation models is a system of linear equations with error terms. The equations are often shown as graphs to highlight the causal relationships among the variables. x1 = 0.56x4 + 0.90x2 + N(0, 1.40) x2 = N(0, 1.11) x3 = 1.39x1 + N(0, 1.22) x4 = -0.52x2 + N(0, 1.07)

  4. TETRAD Recall that TETRAD is an informatics environment that supports structural equation modeling. In addition to creating the structure, researchers can either provide parameters or generate them probabilistically. Specifying the structure Specifying parameters

  5. TETRAD: Causal Discovery TETRAD can also discover the form and parameters of structural equation models from scientific data. This workflow tells Tetrad to infer a model structure from a data set and to estimate the parameters.

  6. Bayesian Networks Bayesian networks replace the equations of structural equation models with conditional probability tables. Note the conditional probability table for the coma node. Possible values for the node are presented in rows. Possible states for the node’s parents appear in columns.

  7. GeNIe GeNIe is an informatics environment that supports building, running, and learning Bayesian networks.

  8. GeNIe: Causal Discovery GeNIe learns the structure and parameters of Bayesian networks using techniques similar to those in TETRAD. Structural equation models are inferred from continuous data, but Bayesian networks require discrete values. GeNIe has interactive support for discretizing variables.

  9. Search in TETRAD and GeNIe TETRAD and GeNIe use similar search algorithms to discover causal models. The PC algorithm is the starting point for structural search: start with a complete, undirected graph; remove an edge between variables if they are independent given knowledge about other nodes; and give each edge a direction based on the connectivity in the trimmed undirected graph. The systems test for conditional independence (step 2) using statistical methods for determining partial correlation. This approach lets scientists specify domain knowledge in the form of potential edges and the direction of causality.

  10. NPPc E IPAR e_max W T2 T1 SOLAR FPAR A PET EET Topt SR AHI PETTWM Tempc NDVI VEG Mathematical Equation Networks Causality may also be encoded in one or more mathematical equations. In this case, the equations are written with a single dependent variable on the left hand side of the equation. NPPc = Smonthmax (E·IPAR, 0) E = 0.39 · T1 · T2 · W T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt2 T2 = 1.18 / [(1 + e0.2 · (Tdiff – 10) ) · (1 + e0.3 · (–Tdiff – 10) )] TDIFF = Topt – Tempc W = 0.5 + 0.5 · EET / PET PET = 1.6 · (10 · max(0, Tempc) / AHI)A · PET-TW-M A = 0.00000068 · AHI3 – 0.000077 · AHI2 + 0.018 · AHI + 0.49 IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG) , 0.95] SR-FAS = (1 + FAS-NDVI / 1000) / (1 – FAS-NDVI / 1000) The graph reflects a causal interpretation of the system of equations shown above.

  11. Lagramge • Lagramge is a discovery system that learns mathematical equations that describe a set of continuous data. • Scientists provide Lagramge with data and a grammar that defines the search space. • For example, the grammar • E ➞ E + F | E – F | F • F ➞ F * T | F / T | T • T ➞ constant | variable | (E) • defines a space of equations that where variables and constants may be combined using arithmetic operators.

  12. Lagramge: Causal Discovery Using the data and the grammar, Lagramge NPPc = Smonthmax (E·IPAR, 0) E = 0.39 · T1 · T2 · W T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt2 T2 = 1.18 / [(1 + e0.2 · (Tdiff – 10) ) · (1 + e0.3 · (–Tdiff – 10) )] TDIFF = Topt – Tempc W = 0.5 + 0.5 · EET / PET PET = 1.6 · (10 · max(0, Tempc) / AHI)A · PET-TW-M A = 0.00000068 · AHI3 – 0.000077 · AHI2 + 0.018 · AHI + 0.49 IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG) , 0.95] SR-FAS = (1 + FAS-NDVI / 1000) / (1 – FAS-NDVI / 1000) • creates a set of model structures allowed by the grammar; • estimates the numerical parameters guided by fit to the data; and • investigates alternative versions of the best models in that set. NPPc = Smonthmax (E·IPAR, 0) E = 0.402 · T10.624 · T20.215 · W0 T1 = 0.68 + 0.27 · Topt – 0 · Topt2 T2 = 0.16 + 0.012 · TDIFF + 0.02 · TDIFF2 – 0.00042 · TDIFF3 – 0.000081 · TDIFF4 + 0.00000018 · TDIFF5 TDIFF = Topt – Tempc W = 0.5 + 0.5 · EET / PET PET = 1.6 · (10 · max(0, Tempc) / AHI)A · PET-TW-M A = 0.00000068 · AHI3 – 0.000077 · AHI2 + 0.018 · AHI + 0.49 IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG) , 0.95] SR-FAS = (1 + fas_ndvi / 750) / (1 – fas_ndvi / 750) Lagramge’s revisions to the CASA model are in red.

  13. Qualitative Causal Models Another class of causal models instead encode qualitative influences between variables. These typically include a sign on each link that specifies the direction of influence. Such qualitative models can explain qualitative data, but they can also handle numeric observations. This framework appears to match more closely the way that many scientists think about causal models.

  14. GenePath: Model Construction GenePath is an interactive modeling system for qualitative model construction. Knowledge of relationships in the genetic network. The network states how certain genes affect aggregation in D. discoideum. This organism transitions from uni- to multi-cellular form. Graphical representation of the genetic network. Red lines are inhibition. Green lines are activation. Numbers indicate confidence.

  15. GenePath: Causal Discovery Adding data, experimental or observational, to GenePath results in an automatic revision of the qualitative model. Scientists can give new knowledge and data to GenePath, assign confidence levels, and investigate what-if cases. This model of D. discoideum aggregation synthesizes the knowledge and data.

  16. Causal Discovery: Summary The software presented in this lecture shared several common features, such as • they all treated discovery as a guided search through a problem space; • those with numeric parameters separated structure generation from parameter estimation; and • they all let scientists encode domain knowledge to guide model discovery. GenePath encouraged interactive exploration. TETRAD and GeNIe let scientists edit the graph structure before estimating parameters. None of the systems let the scientists interact directly with their search procedures.

  17. The CASA-NPPc Model NPPc E IPAR e_max W T2 T1 SOLAR FPAR A PET EET Topt SR AHI PETTWM Tempc NDVI VEG

More Related