- 58 Views
- Uploaded on
- Presentation posted in: General

CAUSAL MODELING AND THE LOGIC OF SCIENCE

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

CAUSAL MODELING

AND THE

LOGIC OF SCIENCE

Judea Pearl

Computer Science and Statistics

UCLA

www.cs.ucla.edu/~judea/

OVERVIEW

Scope and Language in Scientific Theories

- Statistical models
- (observtions, PL)

- 2.1 Stochastic causal model
- (interventions, PL + modality)
- 2.2 Functional causal models
- (counterfactuals, PL + subjunctives)

- (explicit interventions, PL)
- • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •

- (objects-properties, FOL-SOL ...)

OUTLINE

- Modeling: Statistical vs. Causal

- Causal models and identifiability

- Inference to three types of claims:

- Effects of potential interventions,

- Claims about attribution (responsibility)

- Claims about direct and indirect effects

- Falsifiability and Corroboration

TRADITIONAL STATISTICAL

INFERENCE PARADIGM

P

Joint

Distribution

Q(P)

(Aspects of P)

Data

Inference

e.g.,

Infer whether customers who bought product A

would also buy product B.

Q = P(B|A)

THE CAUSAL INFERENCE

PARADIGM

M

Data-generating

Model

Q(M)

(Aspects of M)

Data

Inference

Some Q(M) cannot be inferred from P.

e.g.,

Infer whether customers who bought product A

would still buy A if we double the price.

Probability and statistics deal with static relations

Statistics

Probability

inferences

from passive

observations

joint

distribution

Data

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES

Probability and statistics deal with static relations

Statistics

Probability

inferences

from passive

observations

joint

distribution

Data

- Causal analysis deals with changes (dynamics)
- i.e. What remains invariant when P changes.
- P does not tell us how it ought to change
- e.g. Curing symptoms vs. curing diseases
- e.g. Analogy: mechanical deformation

Probability and statistics deal with static relations

Statistics

Probability

inferences

from passive

observations

joint

distribution

Data

Causal analysis deals with changes (dynamics)

- Effects of
- interventions

Data

Causal

Model

- Causes of
- effects

Causal

assumptions

- Explanations

Experiments

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES

- Causal and statistical concepts do not mix.

CAUSAL

Spurious correlation

Randomization

Confounding / Effect

Instrument

Holding constant

Explanatory variables

STATISTICAL

Regression

Association / Independence

“Controlling for” / Conditioning

Odd and risk ratios

Collapsibility

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES (CONT)

- Causal and statistical concepts do not mix.

CAUSAL

Spurious correlation

Randomization

Confounding / Effect

Instrument

Holding constant

Explanatory variables

STATISTICAL

Regression

Association / Independence

“Controlling for” / Conditioning

Odd and risk ratios

Collapsibility

- No causes in – no causes out (Cartwright, 1989)

}

statistical assumptions + data

causal assumptions

causal conclusions

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES (CONT)

- Causal assumptions cannot be expressed in the mathematical language of standard statistics.

- Causal and statistical concepts do not mix.

CAUSAL

Spurious correlation

Randomization

Confounding / Effect

Instrument

Holding constant

Explanatory variables

STATISTICAL

Regression

Association / Independence

“Controlling for” / Conditioning

Odd and risk ratios

Collapsibility

- No causes in – no causes out (Cartwright, 1989)

}

statistical assumptions + data

causal assumptions

causal conclusions

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES (CONT)

- Causal assumptions cannot be expressed in the mathematical language of standard statistics.

- Non-standard mathematics:
- Structural equation models (SEM)
- Counterfactuals (Neyman-Rubin)
- Causal Diagrams (Wright, 1920)

WHAT'SIN A CAUSAL MODEL?

Oracle that assigns truth value to causal

sentences:

Action sentences:B if wedoA.

Counterfactuals:B would be different if

Awere true.

Explanation:B occurredbecauseof A.

Optional:with whatprobability?

FAMILIAR CAUSAL MODEL

ORACLE FOR MANIPILATION

X

Y

Z

INPUT

OUTPUT

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i)V = {V1…,Vn} endogenous variables,

(ii)U = {U1,…,Um} background variables

(iii)F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

CAUSAL MODELS AND

CAUSAL DIAGRAMS

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i)V = {V1…,Vn} endogenous variables,

(ii)U = {U1,…,Um} background variables

(iii)F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

I

W

Q

P

CAUSAL MODELS AND

CAUSAL DIAGRAMS

U1

U2

PAQ

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i)V = {V1…,Vn} endogenous variables,

(ii)U = {U1,…,Um} background variables

(iii)F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

CAUSAL MODELS AND

MUTILATION

(iv)Mx= U,V,Fx, X V, x X

where Fx = {fi: Vi X } {X = x}

(Replace all functions ficorresponding to X with the constant functions X=x)

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i)V = {V1…,Vn} endogenous variables,

(ii)U = {U1,…,Um} background variables

(iii)F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

I

W

Q

CAUSAL MODELS AND

MUTILATION

(iv)

U1

U2

P

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i)V = {V1…,Vn} endogenous variables,

(ii)U = {U1,…,Um} background variables

(iii)F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

I

W

Q

CAUSAL MODELS AND

MUTILATION

(iv)

Mp

U1

U2

P

P = p0

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i)V = {V1…,Vn} endogenous variables,

(ii)U = {U1,…,Um} background variables

(iii)F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

PROBABILISTIC

CAUSAL MODELS

(iv)Mx= U,V,Fx, X V, x X

where Fx = {fi: Vi X } {X = x}

(Replace all functions ficorresponding to X with the constant functions X=x)

Definition (Probabilistic Causal Model):

M, P(u)

P(u) is a probability assignment to the variables in U.

CAUSAL MODELS AND COUNTERFACTUALS

Definition: Potential Response

The sentence: “Y would be y (in unit u), had X been x,”

denoted Yx(u) = y, is the solution for Y in a mutilated model

Mx, with the equations for X replaced by X = x.

(“unit-based potential outcome”)

CAUSAL MODELS AND COUNTERFACTUALS

Joint probabilities of counterfactuals:

Definition: Potential Response

The sentence: “Y would be y (in unit u), had X been x,”

denoted Yx(u) = y, is the solution for Y in a mutilated model

Mx, with the equations for X replaced by X = x.

(“unit-based potential outcome”)

CAUSAL MODELS AND COUNTERFACTUALS

In particular:

Definition: Potential Response

The sentence: “Y would be y (in unit u), had X been x,”

denoted Yx(u) = y, is the solution for Y in a mutilated model

Mx, with the equations for X replaced by X = x.

(“unit-based potential outcome”)

Joint probabilities of counterfactuals:

3-STEPS TO COMPUTING

COUNTERFACTUALS

U

U

TRUE

TRUE

C

C

FALSE

FALSE

A

B

A

B

D

D

TRUE

TRUE

S5.If the prisoner is dead, he would still be dead

if A were not to have shot. DDA

Abduction

Action

Prediction

U

TRUE

C

A

B

D

COMPUTING PROBABILITIES

OF COUNTERFACTUALS

U

U

P(u|D)

P(u)

P(u|D)

P(u|D)

C

C

FALSE

FALSE

A

B

A

B

D

D

TRUE

P(DA|D)

P(S5).The prisoner is dead. How likely is it that he would be dead

if A were not to have shot. P(DA|D) = ?

Abduction

Action

Prediction

U

C

A

B

D

CAUSAL INFERENCE

MADE EASY (1985-2000)

- Inference with Nonparametric Structural Equations
- made possible through Graphical Analysis.

- Mathematical underpinning of counterfactuals
- through nonparametric structural equations

- Graphical-Counterfactuals symbiosis

IDENTIFIABILITY

Definition:

Let Q(M) be any quantity defined on a causal

model M, andlet A be a set of assumption.

Q is identifiable relative to A iff

P(M1) = P(M2) ÞQ(M1) = Q(M2)

for all M1, M2, that satisfy A.

IDENTIFIABILITY

Definition:

Let Q(M) be any quantity defined on a causal

model M, andlet A be a set of assumption.

Q is identifiable relative to A iff

P(M1) = P(M2) ÞQ(M1) = Q(M2)

for all M1, M2, that satisfy A.

In other words, Q can be determined uniquely

from the probability distribution P(v) of the

endogenous variables, V, and assumptions A.

IDENTIFIABILITY

Definition:

Let Q(M) be any quantity defined on a causal

model M, andlet A be a set of assumption.

Q is identifiable relative to A iff

P(M1) = P(M2)ÞQ(M1) = Q(M2)

for all M1, M2, that satisfy A.

In this talk:

A: Assumptions encoded in the diagram

Q1: P(y|do(x)) Causal Effect (= P(Yx=y))

Q2: P(Yx =y | x, y) Probability of necessity

Q3: Direct Effect

THE FUNDAMENTAL THEOREM

OF CAUSAL INFERENCE

Causal Markov Theorem:

Any distribution generated by Markovian structural model M

(recursive, with independent disturbances) can be factorized as

Where pai are the (values of) the parents of Viin the causal

diagram associated with M.

Corollary: (Truncated factorization, Manipulation Theorem)

The distribution generated by an intervention do(X=x)

(in a Markovian model M) is given by the truncated factorization

THE FUNDAMENTAL THEOREM

OF CAUSAL INFERENCE

Causal Markov Theorem:

Any distribution generated by Markovian structural model M

(recursive, with independent disturbances) can be factorized as

Where pai are the (values of) the parents of Viin the causal

diagram associated with M.

Given P(x,y,z),should we ban smoking?

U (unobserved)

U (unobserved)

X = x

Y

Z

X

Y

Z

Smoking

Tar in

Lungs

Cancer

Smoking

Tar in

Lungs

Cancer

RAMIFICATIONS OF THE FUNDAMENTAL THEOREM

Given P(x,y,z),should we ban smoking?

U (unobserved)

U (unobserved)

X = x

Y

Z

X

Y

Z

Smoking

Tar in

Lungs

Cancer

Smoking

Tar in

Lungs

Cancer

RAMIFICATIONS OF THE FUNDAMENTAL THEOREM

Pre-intervention

Post-intervention

Given P(x,y,z),should we ban smoking?

U (unobserved)

U (unobserved)

X = x

Y

Z

X

Y

Z

Smoking

Tar in

Lungs

Cancer

Smoking

Tar in

Lungs

Cancer

RAMIFICATIONS OF THE FUNDAMENTAL THEOREM

Pre-intervention

Post-intervention

To compute P(y,z|do(x)), wemust eliminate u. (graphical problem).

G

Gx

THE BACK-DOOR CRITERION

Graphical test of identification

P(y | do(x)) is identifiable in G if there is a set Z of

variables such that Zd-separates X from Y in Gx.

Z1

Z1

Z2

Z2

Z

Z3

Z3

Z4

Z5

Z5

Z4

X

X

Z6

Y

Y

Z6

G

Gx

Moreover, P(y | do(x)) = åP(y | x,z) P(z)

(“adjusting” for Z)

z

THE BACK-DOOR CRITERION

Graphical test of identification

P(y | do(x)) is identifiable in G if there is a set Z of

variables such that Zd-separates X from Y in Gx.

Z1

Z1

Z2

Z2

Z

Z3

Z3

Z4

Z5

Z5

Z4

X

X

Z6

Y

Y

Z6

RULES OF CAUSAL CALCULUS

- Rule 1:Ignoring observations
- P(y |do{x},z, w) = P(y | do{x},w)

- Rule 2:Action/observation exchange
- P(y |do{x}, do{z}, w) = P(y|do{x},z,w)

- Rule 3: Ignoring actions
- P(y |do{x},do{z},w) = P(y|do{x},w)

DERIVATION IN CAUSAL CALCULUS

Genotype (Unobserved)

Smoking

Tar

Cancer

Probability Axioms

P (c |do{s})=tP (c |do{s},t) P (t |do{s})

Rule 2

= tP (c |do{s},do{t})P (t |do{s})

Rule 2

= tP (c |do{s},do{t})P (t | s)

Rule 3

= tP (c |do{t})P (t | s)

Probability Axioms

= stP (c |do{t},s) P (s|do{t})P(t |s)

Rule 2

= stP (c | t, s) P (s|do{t})P(t |s)

Rule 3

= stP (c | t, s) P (s) P(t |s)

OUTLINE

- Modeling: Statistical vs. Causal

- Causal models and identifiability

- Inference to three types of claims:

- Effects of potential interventions,

- Claims about attribution (responsibility)

DETERMINING THE CAUSES OF EFFECTS

(The Attribution Problem)

- Your Honor! My client (Mr. A) died BECAUSE
- he used that drug.

DETERMINING THE CAUSES OF EFFECTS

(The Attribution Problem)

- Your Honor! My client (Mr. A) died BECAUSE
- he used that drug.

- Court to decide if it is MORE PROBABLE THAN
- NOT that A would be alive BUT FOR the drug!
- P(? | A is dead, took the drug) > 0.50

THE PROBLEM

- Theoretical Problems:
- What is the meaning of PN(x,y):
- “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.”

THE PROBLEM

- Theoretical Problems:
- What is the meaning of PN(x,y):
- “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.”
- Answer:

THE PROBLEM

- Theoretical Problems:
- What is the meaning of PN(x,y):
- “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.”

- Under what condition can PN(x,y) be learned from statistical data, i.e., observational, experimental and combined.

WHAT IS INFERABLE FROM EXPERIMENTS?

Simple Experiment:

Q = P(Yx= y | z)

Z nondescendants of X.

Compound Experiment:

Q = P(YX(z) = y | z)

Multi-Stage Experiment:

etc…

CAN FREQUENCY DATA DECIDE LEGAL RESPONSIBILITY?

ExperimentalNonexperimental

do(x) do(x) xx

Deaths (y) 16 14 2 28

Survivals (y) 984 986 998 972

1,0001,0001,0001,000

- Nonexperimental data: drug usage predicts longer life
- Experimental data: drug has negligible effect on survival

- Plaintiff: Mr. A is special.

- He actually died

- He used the drug by choice

- Court to decide (given both data):
- Is it more probable than not that A would be alive
- but for the drug?

TYPICAL THEOREMS

(Tian and Pearl, 2000)

- Identifiability under monotonicity (Combined data)

- corrected Excess-Risk-Ratio

- Bounds given combined nonexperimental and experimental data

- WITH PROBABILITY ONE P(yx | x,y) =1

SOLUTION TO THE ATTRIBUTION PROBLEM (Cont)

- From population data to individual case

- Combined data tell more that each study alone

OUTLINE

- Modeling: Statistical vs. Causal

- Causal models and identifiability

- Inference to three types of claims:

- Effects of potential interventions,

- Claims about attribution (responsibility)

- Claims about direct and indirect effects

QUESTIONS ADDRESSED

- What is the semantics of direct and indirect effects?
- Can we estimate them from data? Experimental data?

TOTAL, DIRECT, AND INDIRECT EFFECTS HAVE SIMPLE SEMANTICS

IN LINEAR MODELS

b

X

Z

z = bx + 1

y = ax + cz + 2

a

c

Y

a+bc

a

bc

SEMANTICS BECOMES NONTRIVIAL

IN NONLINEAR MODELS

(even when the model is completely specified)

X

Z

z = f (x, 1)

y = g (x, z, 2)

Y

Dependent on z?

Void of operational meaning?

THE OPERATIONAL MEANING OF

DIRECT EFFECTS

X

Z

z = f (x, 1)

y = g (x, z, 2)

Y

“Natural” Direct Effect of X on Y:

The expected change in Y per unit change of X, when we keep Z constant at whatever value it attains before the change.

In linear models, NDE = Controlled Direct Effect

GENDER

QUALIFICATION

HIRING

POLICY IMPLICATIONS

(Who cares?)

indirect

What is the direct effect of X on Y?

The effect of Gender on Hiring if sex discrimination

is eliminated.

X

Z

IGNORE

f

Y

THE OPERATIONAL MEANING OF

INDIRECT EFFECTS

X

Z

z = f (x, 1)

y = g (x, z, 2)

Y

“Natural” Indirect Effect of X on Y:

The expected change in Y when we keep X constant, say at x0, and let Z change to whatever value it would have under a unit change in X.

In linear models, NIE = TE - DE

LEGAL DEFINITIONS TAKE THE NATURAL CONCEPTION

(FORMALIZING DISCRIMINATION)

``The central question in any employment-discrimination case is whether the employer would have taken the same action had the employee been of different race (age, sex, religion, national origin etc.) and everything else had been the same’’

[Carson versus Bethlehem Steel Corp. (70 FEP Cases 921, 7th Cir. (1996))]

x = male, x = female

y = hire, y = not hire

z = applicant’s qualifications

NO DIRECT EFFECT

YxZx= Yx, YxZx = Yx

SEMANTICS AND IDENTIFICATION OF NESTED COUNTERFACTUALS

Consider the quantity

Given M, P(u), Q is well defined

Given u, Zx*(u) is the solution for Z in Mx*,call it z

is the solution for Y in Mxz

Can Q be estimated from data?

ANSWERS TO QUESTIONS

- Graphical conditions for estimability from
- experimental / nonexperimental data.

- Graphical conditions hold in Markovian models

ANSWERS TO QUESTIONS

- Graphical conditions for estimability from
- experimental / nonexperimental data.

- Graphical conditions hold in Markovian models

- Useful in answering new type of policy questions
- involving mechanism blocking instead of variable fixing.

THE OVERRIDING THEME

- Define Q(M) as a counterfactual expression
- Determine conditions for the reduction
- If reduction is feasible, Q is inferable.

Q1: P(y|do(x)) Causal Effect (= P(Yx=y))

Q2: P(Yx = y | x, y) Probability of necessity

Q3: Direct Effect

w

x

y

z

FALSIFIABILITY and CORROBORATION

P*

P*(M)

Falsifiability: P*(M) P*

D (Data)

Constraints implied by M

Data Dcorroborates model M if M is (i) falsifiable

and (ii) compatible with D.

Types of constraints:1. conditional independencies2. inequalities (for restricted domains)3. functional

e.g.,

OTHER TESTABLE CLAIMS

Changes under interventions

For all causal models:

For all semi-Markovian models:

For Markovian models (and ):

For a given Markovian model:

FROM CORROBORATING MODELS

TO CORROBORATING CLAIMS

A corroborated model can imply identifiable yet

uncorroborated claims.

e.g.,

x

x

y

y

z

z

x

y

z

a

a

b

Some claims can be more corroborated than others.

Definition:

An identifiable claim C is corroborated by data if some minimal set of assumptions in M sufficient for identifying C is corroborated by the data.

Graphical criterion: minimal submodel = maximal supergraph

FROM CORROBORATING MODELS

TO CORROBORATING CLAIMS

A corroborated model can imply identifiable yet

uncorroborated claims.

e.g.,

x

x

y

y

z

z

x

y

z

a

a

b

Some claims can be more corroborated than others.

Definition:

An identifiable claim C is corroborated by data if some minimal set of assumptions in M sufficient for identifying C is corroborated by the data.

Graphical criterion: minimal submodel = maximal supergraph

OVERVIEW

Scope and Language in Scientific Theories

- Statistical models
- (observtions, PL)

- 2.1 Stochastic causal model
- (interventions, PL + modality)
- 2.2 Functional causal models
- (counterfactuals, PL + subjunctives)

- (explicit interventions, PL)
- • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •

- (objects-properties, FOL-SOL ...)