1 / 58

Center for Causal Discovery: Summer Workshop - 2015

Join the Center for Causal Discovery at Carnegie Mellon University for a summer workshop on causal discovery in biomedical research. Learn about graphical causal models, Tetrad, search algorithms, and applications in fMRI, lung disease, cancer, and genetic regulatory networks.

merica
Download Presentation

Center for Causal Discovery: Summer Workshop - 2015

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Center for Causal Discovery: Summer Workshop - 2015 June 8-11, 2015 Carnegie Mellon University

  2. Goals • Working knowledge of graphical causal models • Basic working knowledge of Tetrad V • Basic understanding of search algorithms • Basic understanding of several applications: • fMRI • Lung Disease • Cancer • Genetic Regulatory Networks • Form community of researchers, users, and students interested in causal discovery in biomedical research

  3. Tetrad: Complete Causal Modeling Tool

  4. Tetrad • Main website: http://www.phil.cmu.edu/projects/tetrad/ • Download: http://www.phil.cmu.edu/projects/tetrad/current.html • Previous version you downloaded: tetrad-5.1.0-6 • Newer version with several bug-fixes: tetrad-5.2.1-0 • Data files: www.phil.cmu.edu/projects/tetrad_download/download/workshop/Data/

  5. Outline • Day 1: Graphical Causal Models, Tetrad • Introduction • Overview of Graphical Causal Models • Tetrad • Representing/Modeling Causal Systems • Parametric Models • Instantiated Models • Estimation, Inference, Updating and Model fit • Tiny Case Studies: Charity, Lead and IQ

  6. Outline • Day 2: Search • D-separation • Model Equivalence • Search Basics (PC, GES) • Latent Variable Model Search • FCI • MIMbuild • Examples

  7. Outline • Day 3: Examples • Overviews • fMRI • Cancer • Lung Disease • Genetic Regulatory Networks • Extra Issues • Measurement Error • Feedback and Time Series

  8. Outline • Day 4: Breakout Sessions • Morning • fMRI • Cancer • Lung Disease • Genetic Regulatory Networks • Afternoon • Overview of Algorithm Development (Systems Group) • Group Discussion on Data and Research Problems

  9. Causationand Statistics Judea Pearl Galileo Galilei Francis Bacon Udny Yule Graphical Causal Models Charles Spearman Sewall Wright Jamie Robins Sir Ronald A. Fisher Potential Outcomes Don Rubin Jerzy Neyman 1500 1600 ….. 1900 1930 1960 1990

  10. Modern Theory of Statistical Causal Models Graphical Models Intervention & Manipulation Potential Outcome Models Testable Constraints (e.g., Independence) Counterfactuals

  11. Causal Inference Requires More than Probability Prediction from Observation ≠ Prediction from Intervention P(Lung Cancer 1960 = y | Tar-stained fingers 1950 = no) ≠ P(Lung Cancer 1960 = y | Tar-stained fingers 1950set = no) In general: P(Y=y | X=x, Z=z) ≠ P(Y=y | Xset=x, Z=z) Causal Prediction vs. Statistical Prediction: Non-experimental data (observational study) P(Y=y | X=x, Z=z) P(Y,X,Z) P(Y=y | Xset=x, Z=z) Causal Structure Background Knowledge

  12. Estimation vs. Search • Estimation (Potential Outcomes) • Causal Question: Effect of Zidovudine on Survival among HIV-positive men (Hernan, et al., 2000) • Problem: confounders (CD4 lymphocyte count) vary over time, and they are dependent on previous treatment with Zidovudine • Estimation method discussed: marginal structural models • Assumptions: • Treatment measured reliably • Measured covariates sufficient to capture major sources of confounding • Model of treatment given the past is accurate • Output: Effect estimate with confidence intervals Fundamental Problem: estimation/inference is conditional on the model

  13. Estimation vs. Search • Search (Causal Graphical Models) • Causal Question: which genes regulate flowering in Arbidopsis • Problem: over 25,000 potential genes. • Method: graphical model search • Assumptions: • RNA microarray measurement reasonable proxy for gene expression • Causal Markov assumption • Etc. • Output: Suggestions for follow-up experiments Fundamental Problem: model space grows super-exponentially with the number of variables

  14. Causal Search Causal Search: Find/compute all the causal models that are indistinguishable given background knowledge and data Represent features common to all such models Multiple Regressionis often the wrong tool for Causal Search: Example: Foreign Investment & Democracy

  15. Does Foreign Investment in 3rd World Countries inhibit Democracy? Foreign Investment Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49, 141-146. N = 72 PO degree of political exclusivity CV lack of civil liberties EN energy consumption per capita (economic development) FI level of foreign investment

  16. Foreign Investment Correlations po fi en cv po 1.0 fi -.175 1.0 en -.480 0.330 1.0 cv 0.868 -.391 -.430 1.0

  17. Case Study: Foreign Investment Regression Results po = .227*fi - .176*en + .880*cv SE (.058) (.059) (.060) t 3.941 -2.99 14.6 P .0002 .0044 .0000 Interpretation: foreign investment increases political repression

  18. Case Study: Foreign Investment Alternative Models • There is no model with testable constraints (df > 0) that is not rejected by the data, in which FI has a positive effect on PO.

  19. A Few Causal Discovery Highlights

  20. fMRI (~44,000 voxels) (ROI)~10-20 Regions of Interest Clark Glymour, Joe Ramsey, Ruben Sanchez CMU Causal Discovery

  21. Autism Catherine Hanson, Rutgers ASD vs. NT Usual Approach: Search for differentialrecruitment of brain regions

  22. ASD vs. NT Causal Modeling Approach: Examine connectivity of ROIs • Face processing network • Theory of Mind network • Action understanding network

  23. Results FACE TOM ACTION

  24. What was Learned face processing: ASD NT Theory of Mind: ASD ≠NT action understanding: ASD ≠NT when faces involved

  25. Genetic Regulatory Networks Arbidopsis MarloesMaathuis ZTH (Zurich)

  26. Genetic Regulatory Networks Micro-array data ~25,000 variables Greenhouse experiments on flowering time Candidate Regulators of Flowering time Causal Discovery

  27. Genetic Regulatory Networks Which genes affect flowering time in Arabidopsis thaliana? (Stekhoven et al., Bioinformatics, 2012) • ~25,000 genes • Modification of PC (stability) • Among 25 genes in final ranking: • 5 known regulators of flowering • 20 remaining genes: • For 13 of 20, seeds available • 9 of 13 yielded replicates • 4 of 9 affected flowering time • Other techniques are little better than chance

  28. Other Applications • Educational Research: • Online Courses, • MOOCs, • Cog. Tutors • Economics: • Causes of Meat Prices, • Effects of International Trade • Lead and IQ • Stress, Depression, Religiosity • Climate Change Modeling • The Effects of Welfare Reform • Etc. !

  29. Outline • Representing/Modeling Causal Systems • Causal Graphs • Parametric Models • Bayes Nets • Structural Equation Models • Generalized SEMs

  30. Causal Graphs Causal Graph G = {V,E} Each edge X  Y represents a direct causal claim: X is a direct cause of Y relative to V Years of Education Income Years of Education Skills and Knowledge Income

  31. Causal Graphs Omitteed Causes Omitteed Common Causes Not Cause Complete Common Cause Complete Years of Education Years of Education Skills and Knowledge Skills and Knowledge Income Income

  32. Tetrad Demo & Hands-On Smoking LC YF Build and Save two acyclic causal graphs: Build the Smoking graph picture above Build your own graph with 4 variables

  33. Modeling Ideal Interventions Interventions on the Effect Post Pre-experimental System Room Temperature Sweaters On

  34. Modeling Ideal Interventions Interventions on the Cause Post Pre-experimental System Room Temperature Sweaters On

  35. Pre-intervention graph “Soft” Intervention “Hard” Intervention Interventions & Causal Graphs Model an ideal intervention by adding an “intervention” variable outside the original system as a direct cause of its target. Intervene on Income

  36. Interventions & Causal Graphs X5 X2 X1 Pre-intervention Graph X4 X3 X6 • Intervention: • hard intervention on both X1, X4 • Soft intervention on X3 X5 X2 I X1 X4 Post-Intervention Graph? S X3 X6 I

  37. Interventions & Causal Graphs X5 X2 X1 Pre-intervention Graph X4 X3 X6 • Intervention: • hard intervention on both X1, X4 • Soft intervention on X3 X5 X2 I X1 X4 Post-Intervention Graph? S X3 X6 I

  38. Interventions & Causal Graphs X5 X2 X1 Pre-intervention Graph X4 X3 X6 • Intervention: • hard intervention on X3 • Soft interventions on X6, X4 X5 X2 X1 X4 X3 Post-Intervention Graph? X6 S I S

  39. Parametric Models

  40. Instantiated Models

  41. Causal Bayes Networks The Joint Distribution Factors According to the Causal Graph, P(S,YF, L) = P(S) P(YF | S) P(LC | S)

  42. Causal Bayes Networks The Joint Distribution Factors According to the Causal Graph, P(S = 0) = 1 P(S = 1) = 1 - 1 P(YF = 0 | S = 0) = 2 P(LC = 0 | S = 0) = 4 P(YF = 1 | S = 0) = 1- 2 P(LC = 1 | S = 0) = 1- 4 P(YF = 0 | S = 1) = 3 P(LC = 0 | S = 1) = 5 P(YF = 1 | S = 1) = 1- 3 P(LC = 1 | S = 1) = 1- 5 P(S) P(YF | S) P(LC | S) = f() All variables binary [0,1]:  = {1, 2,3,4,5,}

  43. Causal Bayes Networks The Joint Distribution Factors According to the Causal Graph, P(S,YF, LC) = P(S) P(YF | S) P(LC | S) = f() All variables binary [0,1]:  = {1, 2,3,4,5,} P(S,YF, LC) = P(S) P(YF | S) P(LC | YF, S) = f() All variables binary [0,1]:  = {1, 2,3,4,5,6,7,}

  44. Causal Bayes Networks The Joint Distribution Factors According to the Causal Graph, P(S = 0) = .7 P(S = 1) = .3 P(YF = 0 | S = 0) = .99 P(LC = 0 | S = 0) = .95 P(YF = 1 | S = 0) = .01 P(LC = 1 | S = 0) = .05 P(YF = 0 | S = 1) = .20 P(LC = 0 | S = 1) = .80 P(YF = 1 | S = 1) = .80 P(LC = 1 | S = 1) = .20 P(S,YF, L) = P(S) P(YF | S) P(LC | S) P(S=1,YF=1, LC=1) =?

  45. Causal Bayes Networks The Joint Distribution Factors According to the Causal Graph, P(S = 0) = .7 P(S = 1) = .3 P(YF = 0 | S = 0) = .99 P(LC = 0 | S = 0) = .95 P(YF = 1 | S = 0) = .01 P(LC = 1 | S = 0) = .05 P(YF = 0 | S = 1) = .20 P(LC = 0 | S = 1) = .80 P(YF = 1 | S = 1) = .80 P(LC = 1 | S = 1) = .20 P(S,YF, L) = P(S) P(YF | S) P(LC | S) P(LC = 1 | S=1) P(YF=1 | S=1) P(S=1) P(S=1,YF=1, LC=1) = .20 = .048 P(S=1,YF=1, LC=1) = .3 * .80 *

  46. Calculating the effect of a hard interventions P(YF,S,L) = P(S) P(YF|S) P(L|S) P(S) P(L|S) Pm (YF,S,L) = P(YF|I)

  47. Calculating the effect of a hard intervention P(S,YF, L) = P(S) P(YF | S) P(LC | S) P(S=1,YF=1, LC=1) = .3 * .8 * .2 = .048 P(YF =1 | I ) = .5 Pm (S=1,YFset=1, LC=1) =? Pm (S=1,YFset=1, LC=1) = P(S) P(YF | I) P(LC | S) .3 * .5 * .2 = .03 Pm (S=1,YFset=1, LC=1) =

  48. Calculating the effect of a soft intervention P(YF,S,L) = P(S) P(YF|S) P(L|S) P(YF|S, Soft) Pm (YF,S,L) = P(S) P(L|S)

  49. Tetrad Demo & Hands-On Use the DAG you built for Smoking, YF, and LC Define the Bayes PM (# and values of categories for each variable) Attach a Bayes IM to the Bayes PM Fill in the Conditional Probability Tables (make the values plausible).

  50. Updating

More Related