1 / 72

CAUSES AND COUNTERFACTUALS

CAUSES AND COUNTERFACTUALS. Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea/jsm09). THEORETICAL DEVELOPMENTS IN CAUSAL INFERENCE. Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea/jsm09). OUTLINE. Inference: Statistical vs. Causal,

diana-cole
Download Presentation

CAUSES AND COUNTERFACTUALS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CAUSES AND COUNTERFACTUALS Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea/jsm09)

  2. THEORETICAL DEVELOPMENTS IN CAUSAL INFERENCE Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea/jsm09)

  3. OUTLINE • Inference: Statistical vs. Causal, • distinctions, and mental barriers • Unified conceptualization of counterfactuals, structural-equations, and graphs • Inference to three types of claims: • Effect of potential interventions • Attribution (Causes of Effects) • Direct and indirect effects • Frills

  4. P Joint Distribution Q(P) (Aspects of P) Data Inference TRADITIONAL STATISTICAL INFERENCE PARADIGM e.g., Infer whether customers who bought product A would also buy product B. Q = P(B | A)

  5. FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES Probability and statistics deal with static relations P Joint Distribution P Joint Distribution Q(P) (Aspects of P) Data change Inference What happens when P changes? e.g., Infer whether customers who bought product A would still buy Aif we were to double the price.

  6. FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES What remains invariant when Pchanges say, to satisfy P(price=2)=1 P Joint Distribution P Joint Distribution Q(P) (Aspects of P) Data change Inference Note: P(v)P (v | price = 2) P does not tell us how it ought to change e.g. Curing symptoms vs. curing diseases e.g. Analogy: mechanical deformation

  7. Causal and statistical concepts do not mix. CAUSAL Spurious correlation Randomization / Intervention Confounding / Effect Instrumental variable Strong Exogeneity Explanatory variables STATISTICAL Regression Association / Independence “Controlling for” / Conditioning Odd and risk ratios Collapsibility / Granger causality Propensity score FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES (CONT)

  8. Causal and statistical concepts do not mix. CAUSAL Spurious correlation Randomization / Intervention Confounding / Effect Instrumental variable Strong Exogeneity Explanatory variables STATISTICAL Regression Association / Independence “Controlling for” / Conditioning Odd and risk ratios Collapsibility / Granger causality Propensity score • No causes in – no causes out (Cartwright, 1989) } statistical assumptions + data causal assumptions  causal conclusions FROM STATISTICAL TO CAUSAL ANALYSIS: 2. MENTAL BARRIERS • Causal assumptions cannot be expressed in the mathematical language of standard statistics.

  9. Causal and statistical concepts do not mix. CAUSAL Spurious correlation Randomization / Intervention Confounding / Effect Instrumental variable Strong Exogeneity Explanatory variables STATISTICAL Regression Association / Independence “Controlling for” / Conditioning Odd and risk ratios Collapsibility / Granger causality Propensity score • No causes in – no causes out (Cartwright, 1989) } statistical assumptions + data causal assumptions  causal conclusions • Non-standard mathematics: • Structural equation models (Wright, 1920; Simon, 1960) • Counterfactuals (Neyman-Rubin (Yx), Lewis (xY)) FROM STATISTICAL TO CAUSAL ANALYSIS: 2. MENTAL BARRIERS • Causal assumptions cannot be expressed in the mathematical language of standard statistics.

  10. Correct notation: Y := 2X X = 1 Y = 2 The solution WHY CAUSALITY NEEDS SPECIAL MATHEMATICS Scientific Equations (e.g., Hooke’s Law) are non-algebraic e.g., Length (Y) equals a constant (2) times the weight (X) Y = 2X X = 1 Process information Had X been 3, Y would be 6. If we raise X to 3, Y would be 6. Must “wipe out” X = 1.

  11. (or) Y 2X X = 1 Y = 2 The solution WHY CAUSALITY NEEDS SPECIAL MATHEMATICS Scientific Equations (e.g., Hooke’s Law) are non-algebraic e.g., Length (Y) equals a constant (2) times the weight (X) Correct notation: X = 1 Process information Had X been 3, Y would be 6. If we raise X to 3, Y would be 6. Must “wipe out” X = 1.

  12. THE STRUCTURAL MODEL PARADIGM Joint Distribution Data Generating Model Q(M) (Aspects of M) Data M Inference • M – Invariant strategy (mechanism, recipe, law, protocol) by which Nature assigns values to variables in the analysis. “Think Nature, not experiment!”

  13. FAMILIAR CAUSAL MODEL ORACLE FOR MANIPILATION X Y Z INPUT OUTPUT

  14. e.g., STRUCTURAL CAUSAL MODELS • Definition: A structural causal model is a 4-tuple • V,U, F, P(u), where • V = {V1,...,Vn} are endogeneas variables • U={U1,...,Um} are background variables • F = {f1,...,fn} are functions determining V, • vi = fi(v, u) • P(u) is a distribution over U • P(u) and F induce a distribution P(v) over observable variables

  15. I W Q P STRUCTURAL MODELS AND CAUSAL DIAGRAMS The functions vi = fi(v,u) define a graph vi = fi(pai,ui)PAi V \ ViUi U Example: Price – Quantity equations in economics U1 U2 PAQ

  16. I W Q STRUCTURAL MODELS AND INTERVENTION Let X be a set of variables in V. The action do(x) sets X to constants x regardless of the factors which previously determined X. do(x) replaces all functions fi determining X with the constant functions X=x, to create a mutilated modelMx U1 U2 P

  17. I W Q STRUCTURAL MODELS AND INTERVENTION Let X be a set of variables in V. The action do(x) sets X to constants x regardless of the factors which previously determined X. do(x) replaces all functions fi determining X with the constant functions X=x, to create a mutilated modelMx Mp U1 U2 P P = p0

  18. The Fundamental Equation of Counterfactuals: CAUSAL MODELS AND COUNTERFACTUALS Definition: The sentence: “Y would be y (in situation u), had X beenx,” denoted Yx(u) = y, means: The solution for Y in a mutilated model Mx, (i.e., the equations for X replaced by X = x) with input U=u, is equal to y.

  19. Joint probabilities of counterfactuals: In particular: CAUSAL MODELS AND COUNTERFACTUALS Definition: The sentence: “Y would be y (in situation u), had X beenx,” denoted Yx(u) = y, means: The solution for Y in a mutilated model Mx, (i.e., the equations for X replaced by X = x) with input U=u, is equal to y.

  20. STRUCTURAL AND SIMILARITY-BASED COUNTERFACTUALS w A B • Lewis’s account (1973): The counterfactual A B • (read: “B if it were A”) is true in a world w just in case B • is true in all the closest A-worlds to w. • Structural account (1995): The counterfactual Yx(u)=y • (read: “Y=y if X were x”) is true in situation u just in case YMx(u)=y.

  21. BAYESIAN AND IMAGING CONDITIONALIZATIONS Imaging Bayes w w The do(x) operator is an imaging operator provided: Provision 1: Worlds with equal histories should be considered equally similar to any given world. Provision 2: Equally-similar worlds should receive mass in proportion to their prior probabilities (Bayesian tie-breaking) X x X x Sx(w) X =x X =x Sx(w ) w' w'

  22. AXIOMS OF STRUCTURAL COUNTERFACTUALS Yx(u)=y: Y would bey, hadXbeenx(in stateU = u) (Galles, Pearl, Halpern, 1998): • Definiteness • Uniqueness • Effectiveness • Composition (generalized consistency) • Reversibility

  23.  The mothers of all questions: Q. When would b equal a? A. When all back-door paths are blocked, (uY X) REGRESSION VS. STRUCTURAL EQUATIONS (THE CONFUSION OF THE CENTURY) Regression (claimless, nonfalsifiable): Y = ax + Y Structural (empirical, falsifiable): Y = bx + uY Claim: (regardless of distributions): E(Y | do(x)) = E(Y | do(x), do(z)) = bx Q. When is b estimable by regression methods? A. Graphical criteria available

  24. THE FOUR NECESSARY STEPS OF CAUSAL ANALYSIS Define: Assume: Identify: Estimate: Express the target quantity Q as a function Q(M) that can be computed from any model M. Formulate causal assumptions using ordinary scientific language and represent their structural part in graphical form. Determine if Q is identifiable. Estimate Q if it is identifiable; approximate it, if it is not.

  25. THE FOUR NECESSARY STEPS FOR EFFECT ESTIMATION Define: Assume: Identify: Estimate: Express the target quantity Q as a function Q(M) that can be computed from any model M. Formulate causal assumptions using ordinary scientific language and represent their structural part in graphical form. Determine if Q is identifiable. Estimate Q if it is identifiable; approximate it, if it is not.

  26. THE FOUR NECESSARY STEPS FOR EFFECT ESTIMATION Define: Assume: Identify: Estimate: Express the target quantity Q as a function Q(M) that can be computed from any model M. Formulate causal assumptions using ordinary scientific language and represent their structural part in graphical form. Determine if Q is identifiable. Estimate Q if it is identifiable; approximate it, if it is not.

  27. THE FOUR NECESSARY STEPS FOR POLICY ANALYSIS Define: Assume: Identify: Estimate: Express the target quantity Q as a function Q(M) that can be computed from any model M. Formulate causal assumptions using ordinary scientific language and represent their structural part in graphical form. Determine if Q is identifiable. Estimate Q if it is identifiable; approximate it, if it is not.

  28. THE FOUR NECESSARY STEPS FOR POLICY ANALYSIS Define: Assume: Identify: Estimate: Express the target quantity Q as a function Q(M) that can be computed from any model M. Formulate causal assumptions using ordinary scientific language and represent their structural part in graphical form. Determine if Q is identifiable. Estimate Q if it is identifiable; approximate it, if it is not.

  29. INFERRING THE EFFECT OF INTERVENTIONS • The problem:To predict the impact of a proposed intervention using data obtained prior to the intervention. • The solution (conditional): Causal Assumptions + Data   Policy Claims • Mathematical tools for communicating causal assumptions formally and transparently. • Deciding (mathematically) whether the assumptions communicated are sufficient for obtaining consistent estimates of the prediction required. • Deriving (if (2) is affirmative)a closed-form expression for the predicted impact • Suggesting (if (2) is negative)a set of measurements and experiments that, if performed, would render a consistent estimate feasible.

  30. THE FOUR NECESSARY STEPS FROM DEFINITION TO ASSUMPTIONS Define: Assume: Identify: Estimate: Express the target quantity Q as a function Q(M) that can be computed from any model M. Formulate causal assumptions using ordinary scientific language and represent their structural part in graphical form. Determine if Q is identifiable. Estimate Q if it is identifiable; approximate it, if it is not.

  31. 2. Counterfactuals: 3. Structural: X Z Y FORMULATING ASSUMPTIONS THREE LANGUAGES 1. English: Smoking (X), Cancer (Y), Tar (Z), Genotypes (U) Not too friendly: Consistent?, complete?, redundant?, arguable?

  32. IDENTIFIABILITY Definition: Let Q(M) be any quantity defined on a causal model M, andlet A be a set of assumption. Q is identifiable relative to A iff P(M1) = P(M2) ÞQ(M1) = Q(M2) for all M1, M2, that satisfy A.

  33. IDENTIFIABILITY Definition: Let Q(M) be any quantity defined on a causal model M, andlet A be a set of assumption. Q is identifiable relative to A iff P(M1) = P(M2) ÞQ(M1) = Q(M2) for all M1, M2, that satisfy A. In other words, Q can be determined uniquely from the probability distribution P(v) of the endogenous variables, V, and assumptions A. A is displayed in graph G.

  34. G THE PROBLEM OF CONFOUNDING Find the effect ofXonY, P(y|do(x)), given the causal assumptions shown inG, whereZ1,..., Zk are auxiliary variables. Z1 Z2 Z3 Z4 Z5 X Z6 Y CanP(y|do(x)) be estimated if only a subset,Z, can be measured?

  35. G Gx Moreover, P(y | do(x)) =åP(y | x,z) P(z) (“adjusting” for Z) z ELIMINATING CONFOUNDING BIAS THE BACK-DOOR CRITERION P(y | do(x)) is estimable if there is a set Z of variables such thatZd-separates X from YinGx. Z1 Z1 Z2 Z2 Z Z3 Z3 Z4 Z5 Z5 Z4 X X Z6 Y Y Z6

  36. EFFECT OF INTERVENTION BEYOND ADJUSTMENT Theorem (Tian-Pearl 2002) We can identifyP(y|do(x)) if there is no child Z of X connected to X by a confounding path. G Z1 Z2 Z3 Z4 Z5 X Y Z6

  37. Watch out! No, no! EFFECT OF WARM-UP ON INJURY (After Shrier & Platt, 2008) ??? Front Door Warm-up Exercises (X) Injury (Y)

  38. EFFECT OF INTERVENTION COMPLETE IDENTIFICATION • Complete calculus for reducing P(y|do(x), z) to expressions void of do-operators. • Complete graphical criterion for identifying causal effects (Shpitser and Pearl, 2006). • Complete graphical criterion for empirical testability of counterfactuals (Shpitser and Pearl, 2007).

  39. COUNTERFACTUALS AT WORK ETT – EFFECT OF TREATMENT ON THE TREATED • Regret: • I took a pill to fall asleep. • Perhaps I should not have? • Program evaluation: • What would terminating a program do to • those enrolled?

  40. THE FOUR NECESSARY STEPS EFFECT OF TREATMENT ON THE TREATED Define: Assume: Identify: Estimate: Express the target quantity Q as a function Q(M) that can be computed from any model M. Formulate causal assumptions using ordinary scientific language and represent their structural part in graphical form. Determine if Q is identifiable. Estimate Q if it is identifiable; approximate it, if it is not.

  41. W ETT - IDENTIFICATION Theorem (Shpitser-Pearl, 2009) ETT is identifiable in G iff P(y | do(x),w) is identifiable in G X Y Moreover, Complete graphical criterion

  42. G Gx Moreover, ETT “Standardized morbidity” ETT - THE BACK-DOOR CRITERION is identifiable in G if there is a set Z of variables such that Zd-separates X from Y in Gx. Z1 Z1 Z2 Z2 Z Z3 Z3 Z4 Z5 Z5 Z4 X X Z6 Y Y Z6

  43. FROM IDENTIFICATION TO ESTIMATION Define: Assume: Identify: Estimate: Express the target quantity Q as a function Q(M) that can be computed from any model M. Formulate causal assumptions using ordinary scientific language and represent their structural part in graphical form. Determine if Q is identifiable. Estimate Q if it is identifiable; approximate it, if it is not.

  44. L Theorem: Adjustment for L replaces Adjustment for Z PROPENSITY SCORE ESTIMATOR (Rosenbaum & Rubin, 1983) Z1 Z2 P(y | do(x)) = ? Z4 Z3 Z5 X Z6 Y

  45. WHAT PROPENSITY SCORE (PS) PRACTITIONERS NEED TO KNOW • The assymptotic bias of PS is EQUAL to that of ordinary adjustment (for same Z). • Including an additional covariate in the analysis CAN SPOIL the bias-reduction potential of others. • Choosing sufficient set for PS, requires knowledge about the model.

  46. WHICH COVARIATES MAY / SHOULD BE ADJUSTED FOR? B1 Assignment Hygiene Age Treatment Outcome M B2 Cost Follow-up Question: Which of these eight covariates may be included in the propensity score function (for matching) and which should be excluded. Answer:Must include: Must exclude: May include:   AgeB1, M, B2, Follow-up, Assignment without Age Cost, Hygiene, {Assignment + Age},   {Hygiene + Age + B1} , more . . .

  47. WHICH COVARIATES MAY / SHOULD BE ADJUSTED FOR? B1 Assignment Hygiene Age Treatment Outcome M B2 Cost Follow-up Question: Which of these eight covariates may be included in the propensity score function (for matching) and which should be excluded. Answer:Must include: Must exclude: May include:   AgeB1, M, B2, Follow-up, Assignment without Age Cost, Hygiene, {Assignment + Age},   {Hygiene + Age + B1} , more . . .

  48. WHAT PROPENSITY SCORE (PS) PRACTITIONERS NEED TO KNOW • The assymptotic bias of PS is EQUAL to that of ordinary adjustment (for same Z). • Including an additional covariate in the analysis CAN SPOIL the bias-reduction potential of others. • Choosing sufficient set for PS, requires knowledge about the model. • That any empirical test of the bias-reduction potential of PS, can only be generalized to cases where the causal relationships among covariates, observed and unobserved is the same.

  49. TWO PARADIGMS FOR CAUSAL INFERENCE Observed: P(X, Y, Z,...) Conclusions needed: P(Yx=y), P(Xy=x | Z=z)... How do we connect observables, X,Y,Z,… to counterfactuals Yx, Xz, Zy,… ? N-R model Counterfactuals are primitives, new variables Super-distribution Structural model Counterfactuals are derived quantities Subscripts modify the model and distribution

  50. inconsistency: x = 0 Yx=0 = Y Y = xY1 + (1-x) Y0 “SUPER” DISTRIBUTION IN N-R MODEL X 0 0 0 1 Y 0 1 0 0 Z 0 1 0 0 Yx=0 0 1 1 1 Yx=1 1 0 0 0 Xz=0 0 1 0 0 Xz=1 0 0 1 1 Xy=0 0 1 1 0 U u1 u2 u3 u4

More Related