Causal Discovery

Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour, and many others Dept. of Philosophy & CALD Carnegie Mellon Nov. 13th, 2003

Outline • Motivation • Representation • Connecting Causation to Probability (Independence) • Searching for Causal Models • Improving on Regression for Causal Inference Nov. 13th, 2003

1. Motivation Non-experimental Evidence TypicalPredictiveQuestions • Can we predict aggressiveness from the amount of violent TV watched • Can we predict crime rates from abortion rates 20 years ago Causal Questions: • Does watching violent TV cause Aggression? • I.e., if we change TV watching, will the level of Aggression change? Nov. 13th, 2003

Causal Estimation Manipulated Probability P(Y | X set= x, Z=z) from Unmanipulated Probability P(Y | X = x, Z=z) When and how can we use non-experimental data to tell us about the effect of an intervention? Nov. 13th, 2003

2. Representation • Association & causal structure -qualitatively • Interventions • StatisticalCausal Models • Bayes Networks • Structural Equation Models Nov. 13th, 2003

Causation & Association X and Y are associated (X _||_ Y) iff x1  x2 P(Y | X = x1)  P(Y | X = x2) Association is symmetric: X _||_ Y  Y _||_ X X is a cause of Y iff x1  x2 P(Y | X set= x1)  P(Y | X set= x2) Causation is asymmetric: X Y  Y X Nov. 13th, 2003

Direct Causation X is a direct cause of Y relative to S, iff z,x1  x2 P(Y | X set= x1 , Zset=z)  P(Y | X set= x2 , Zset=z) where Z = S - {X,Y} Nov. 13th, 2003

Causal Graphs Causal Graph G = {V,E} Each edge X  Y represents a direct causal claim: X is a direct cause of Y relative to V Chicken Pox Nov. 13th, 2003

Causal Graphs Not Cause Complete Common Cause Complete Nov. 13th, 2003

Modeling Ideal Interventions • Ideal Interventions (on a variable X): • Completely determine the value or distribution of a variable X • Directly Target only X • (no “fat hand”) • E.g., Variables: Confidence, Athletic Performance • Intervention 1: hypnosis for confidence • Intervention 2: anti-anxiety drug (also muscle relaxer) Nov. 13th, 2003

Modeling Ideal Interventions Interventions on the Effect Post Pre-experimental System Room Temperature Sweaters On Nov. 13th, 2003

Modeling Ideal Interventions Interventions on the Cause Post Pre-experimental System Sweaters On Room Temperature Nov. 13th, 2003

Interventions & Causal Graphs • Model an ideal intervention by adding an “intervention” variable outside the original system • Erase all arrows pointing into the variable intervened upon Intervene to change Inf Post-intervention graph? Pre-intervention graph Nov. 13th, 2003

Conditioningvs. Intervening P(Y | X = x1)vs. P(Y | X set= x1)Teeth Slides Nov. 13th, 2003

Structural Equation Models 1. Structural Equations 2. Statistical Constraints Causal Graph Statistical Model Nov. 13th, 2003

Structural Equation Models • Structural Equations: One Equation for each variable V in the graph: V = f(parents(V), errorV) for SEM (linear regression) f is a linear function • Statistical Constraints: Joint Distribution over the Error terms Causal Graph Nov. 13th, 2003

Causal Graph SEM Graph (path diagram) Structural Equation Models Equations: Education = ed Income =Educationincome Longevity =EducationLongevity Statistical Constraints: (ed, Income,Income ) ~N(0,2) 2diagonal - no variance is zero Nov. 13th, 2003

3. ConnectingCausation to Probability Nov. 13th, 2003

The Markov Condition Causal Structure Statistical Predictions Causal Markov Axiom Independence X _||_ Z | Y i.e., P(X | Y) = P(X | Y, Z) Causal Graphs Nov. 13th, 2003

Causal Markov Axiom If G is a causal graph, and P a probability distribution over the variables in G, then in P: every variable V is independent of its non-effects, conditional on its immediate causes. Nov. 13th, 2003

Causal Markov Condition Two Intuitions: 1) Immediate causes make effectsindependent of remote causes (Markov). 2) Common causes make their effectsindependent (Salmon). Nov. 13th, 2003

Causal Markov Condition 1) Immediate causes make effectsindependent of remote causes (Markov). E = Exposure to Chicken Pox I = Infected S = Symptoms Markov Cond. E || S | I Nov. 13th, 2003

Causal Markov Condition 2) Effects are independent conditional on their commoncauses. YF || LC | S Markov Cond. Nov. 13th, 2003

Causal Structure Statistical Data Nov. 13th, 2003

Causal Markov Axiom In SEMs, d-separation follows from assuming independence among error terms that have no connection in the path diagram - i.e., assuming that the model is common cause complete. Nov. 13th, 2003

Causal Markov and D-Separation • In acyclic graphs: equivalent • Cyclic Linear SEMs with uncorrelated errors: • D-separation correct • Markov condition incorrect • Cyclic Discrete Variable Bayes Nets: • If equilibrium --> d-separation correct • Markov incorrect Nov. 13th, 2003

P(X3 | X2)  P(X3 | X2, X1) X3 _||_ X1 | X2 P(X3 | X2 set= ) = P(X3 | X2 set=, X1) X3 _||_ X1 | X2set= D-separation: Conditioning vs. Intervening Nov. 13th, 2003

4. Search From Statistical Datato Probability toCausation Nov. 13th, 2003

Statistical Inference Background Knowledge - X2 before X3 - no unmeasured common causes Causal DiscoveryStatistical DataCausal Structure Nov. 13th, 2003

Representations ofD-separation Equivalence Classes We want the representations to: • Characterize the Independence Relations Entailed by the Equivalence Class • Represent causal features that are shared by every member of the equivalence class Nov. 13th, 2003

Patterns & PAGs • Patterns(Verma and Pearl, 1990): graphical representation of an acyclic d-separation equivalence - no latent variables. • PAGs: (Richardson 1994) graphical representation of an equivalence class including latent variable models and sample selection bias that are d-separation equivalent over a set of measured variables X Nov. 13th, 2003

Patterns Nov. 13th, 2003

Patterns: What the Edges Mean Nov. 13th, 2003

Patterns Nov. 13th, 2003

PAGs: Partial Ancestral Graphs What PAG edges mean. Nov. 13th, 2003

PAGs: Partial Ancestral Graph Nov. 13th, 2003

Tetrad 4 Demo www.phil.cmu.edu/projects/tetrad_download/ Nov. 13th, 2003

Overview of Search Methods • Constraint Based Searches • TETRAD • Scoring Searches • Scores: BIC, AIC, etc. • Search: Hill Climb, Genetic Alg., Simulated Annealing • Very difficult to extend to latent variable models Heckerman, Meek and Cooper (1999). “A Bayesian Approach to Causal Discovery” chp. 4 in Computation, Causation, and Discovery, ed. by Glymour and Cooper, MIT Press, pp. 141-166 Nov. 13th, 2003

5. Regessionand Causal Inference Nov. 13th, 2003

Regression to estimate Causal Influence • Let V = {X,Y,T}, where - Y : measured outcome - measured regressors: X = {X1, X2, …, Xn}- latent common causes of pairs in X U Y: T = {T1, …, Tk} • Let the true causal model over V be a Structural Equation Model in which each V  V is a linear combination of its direct causes and independent, Gaussian noise. Nov. 13th, 2003

Regression to estimate Causal Influence • Consider the regression equation: Y = b0 + b1X1 + b2X2 + ..…bnXn • Let the OLS regression estimate bi be the estimatedcausal influence of Xi on Y. • That is, holding X/Xi experimentally constant, bi is an estimate of the change in E(Y) that results from an intervention that changes Xi by 1 unit. • Let the realCausalInfluence Xi Y = bi • When is the OLS estimate bi an unbiased estimate of bi ? Nov. 13th, 2003

Linear Regression Let the other regressors O = {X1, X2,....,Xi-1, Xi+1,...,Xn} bi = 0 if and only if Xi,Y.O = 0 In a multivariate normal distribuion, Xi,Y.O = 0if and only if Xi _||_ Y | O Nov. 13th, 2003

Linear Regression So in regression: bi = 0 Xi _||_ Y | O But provably : bi = 0 S  O, Xi _||_ Y | S So S  O, Xi _||_ Y | S  bi = 0 ~ S  O, Xi _||_ Y | S  don’t know(unless we’re lucky) Nov. 13th, 2003

X1 _||_ Y | X2 Regression Example b1 0 b2 = 0 X2 _||_ Y | X1 Don’t know ~S  {X2} X1 _||_ Y | S b2 = 0 S  {X1} X2 _||_ Y | {X1} Nov. 13th, 2003

X1 _||_ Y | {X2,X3} X2 _||_ Y | {X1,X3} X3 _||_ Y | {X1,X2} Regression Example b1 0 b2 0 b3 0 DK ~S  {X2,X3}, X1 _||_ Y | S b2 = 0 S  {X1,X3}, X2 _||_ Y | {X1} DK ~S  {X1,X2}, X3 _||_ Y | S Nov. 13th, 2003

X2 X1 X3 Y PAG Regression Example Nov. 13th, 2003

Regression Bias If • Xi is d-separated from Y conditional on X/Xi in the true graph after removing Xi Y, and • X contains no descendant of Y, then: biis an unbiased estimate ofbi See “Using Path Diagrams ….” Nov. 13th, 2003

Applications • Rock Classification • Spartina Grass • College Plans • Political Exclusion • Satellite Calibration • Naval Readiness • Parenting among Single, Black Mothers • Pneumonia • Photosynthesis • Lead - IQ • College Retention • Corn Exports Nov. 13th, 2003

Causation, Prediction, and Search, 2nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press) Causality: Models, Reasoning, and Inference, (2000), Judea Pearl, Cambridge Univ. Press Computation, Causation, & Discovery (1999), edited by C. Glymour and G. Cooper, MIT Press Causality in Crisis?, (1997) V. McKim and S. Turner (eds.), Univ. of Notre Dame Press. TETRAD IV: www.phil.cmu.edu/projects/tetrad Web Course on Causal and Statistical Reasoning : www.phil.cmu.edu/projects/csr/ References Nov. 13th, 2003

Causal Discovery

Causal Discovery

Presentation Transcript

Causal inferences

Automatic Causal Discovery

Ockham’s Razor in Causal Discovery: A New Explanation

Causal Inference

Causal Reasoning

Causal essays

Feature Selection and Causal discovery

Causal inference

Causal Reasoning

Causal Discovery Algorithms based on Y Structures

Case Study: Causal Discovery Methods Using Causal Probabilistic Networks

Discovery of a putative causal mutation for Angus dwarfism

Causal discovery, Bayesian networks, and structural equation models

Alexander Statnikov, Ph.D. Director, Computational Causal Discovery Laboratory

Causal discovery of biomedical knowledge from big data

Causal Discovery

Causal Inference

Feature Selection and Causal discovery

Causal Inference