1 / 64

# CAUSAL MODELING AND THE LOGIC OF SCIENCE - PowerPoint PPT Presentation

CAUSAL MODELING AND THE LOGIC OF SCIENCE. Judea Pearl Computer Science and Statistics UCLA www.cs.ucla.edu/~judea/. OVERVIEW Scope and Language in Scientific Theories. Statistical models ( observtions , PL ) Causal models 2.1 Stochastic causal model

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'CAUSAL MODELING AND THE LOGIC OF SCIENCE' - zuwena

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

AND THE

LOGIC OF SCIENCE

Judea Pearl

Computer Science and Statistics

UCLA

www.cs.ucla.edu/~judea/

Scope and Language in Scientific Theories

• Statistical models

• (observtions, PL)

• Causal models

• 2.1 Stochastic causal model

• (interventions, PL + modality)

• 2.2 Functional causal models

• (counterfactuals, PL + subjunctives)

• General equational models

• (explicit interventions, PL)

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •

• General Scientific theories

• (objects-properties, FOL-SOL ...)

• Modeling: Statistical vs. Causal

• Causal models and identifiability

• Inference to three types of claims:

• Effects of potential interventions,

• Claims about direct and indirect effects

• Falsifiability and Corroboration

P

Joint

Distribution

Q(P)

(Aspects of P)

Data

Inference

e.g.,

Infer whether customers who bought product A

would also buy product B.

Q = P(B|A)

M

Data-generating

Model

Q(M)

(Aspects of M)

Data

Inference

Some Q(M) cannot be inferred from P.

e.g.,

Infer whether customers who bought product A

would still buy A if we double the price.

Statistics

Probability

inferences

from passive

observations

joint

distribution

Data

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES

1. THE DIFFERENCES

Probability and statistics deal with static relations

Statistics

Probability

inferences

from passive

observations

joint

distribution

Data

• Causal analysis deals with changes (dynamics)

• i.e. What remains invariant when P changes.

• P does not tell us how it ought to change

• e.g. Curing symptoms vs. curing diseases

• e.g. Analogy: mechanical deformation

Statistics

Probability

inferences

from passive

observations

joint

distribution

Data

Causal analysis deals with changes (dynamics)

• Effects of

• interventions

Data

Causal

Model

• Causes of

• effects

Causal

assumptions

• Explanations

Experiments

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES

CAUSAL

Spurious correlation

Randomization

Confounding / Effect

Instrument

Holding constant

Explanatory variables

STATISTICAL

Regression

Association / Independence

“Controlling for” / Conditioning

Odd and risk ratios

Collapsibility

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES (CONT)

CAUSAL

Spurious correlation

Randomization

Confounding / Effect

Instrument

Holding constant

Explanatory variables

STATISTICAL

Regression

Association / Independence

“Controlling for” / Conditioning

Odd and risk ratios

Collapsibility

• No causes in – no causes out (Cartwright, 1989)

}

statistical assumptions + data

causal assumptions

causal conclusions

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES (CONT)

• Causal assumptions cannot be expressed in the mathematical language of standard statistics.

CAUSAL

Spurious correlation

Randomization

Confounding / Effect

Instrument

Holding constant

Explanatory variables

STATISTICAL

Regression

Association / Independence

“Controlling for” / Conditioning

Odd and risk ratios

Collapsibility

• No causes in – no causes out (Cartwright, 1989)

}

statistical assumptions + data

causal assumptions

causal conclusions

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES (CONT)

• Causal assumptions cannot be expressed in the mathematical language of standard statistics.

• Non-standard mathematics:

• Structural equation models (SEM)

• Counterfactuals (Neyman-Rubin)

• Causal Diagrams (Wright, 1920)

WHAT'SIN A CAUSAL MODEL?

Oracle that assigns truth value to causal

sentences:

Action sentences:B if wedoA.

Counterfactuals:B would be different if

Awere true.

Explanation:B occurredbecauseof A.

Optional:with whatprobability?

ORACLE FOR MANIPILATION

X

Y

Z

INPUT

OUTPUT

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i) V = {V1…,Vn} endogenous variables,

(ii) U = {U1,…,Um} background variables

(iii) F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

CAUSAL MODELS AND

CAUSAL DIAGRAMS

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i) V = {V1…,Vn} endogenous variables,

(ii) U = {U1,…,Um} background variables

(iii) F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

I

W

Q

P

CAUSAL MODELS AND

CAUSAL DIAGRAMS

U1

U2

PAQ

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i) V = {V1…,Vn} endogenous variables,

(ii) U = {U1,…,Um} background variables

(iii) F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

CAUSAL MODELS AND

MUTILATION

(iv) Mx= U,V,Fx, X  V, x  X

where Fx = {fi: Vi X }  {X = x}

(Replace all functions ficorresponding to X with the constant functions X=x)

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i) V = {V1…,Vn} endogenous variables,

(ii) U = {U1,…,Um} background variables

(iii) F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

I

W

Q

CAUSAL MODELS AND

MUTILATION

(iv)

U1

U2

P

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i) V = {V1…,Vn} endogenous variables,

(ii) U = {U1,…,Um} background variables

(iii) F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

I

W

Q

CAUSAL MODELS AND

MUTILATION

(iv)

Mp

U1

U2

P

P = p0

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i) V = {V1…,Vn} endogenous variables,

(ii) U = {U1,…,Um} background variables

(iii) F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

PROBABILISTIC

CAUSAL MODELS

(iv) Mx= U,V,Fx, X  V, x  X

where Fx = {fi: Vi X }  {X = x}

(Replace all functions ficorresponding to X with the constant functions X=x)

Definition (Probabilistic Causal Model):

M, P(u)

P(u) is a probability assignment to the variables in U.

Definition: Potential Response

The sentence: “Y would be y (in unit u), had X been x,”

denoted Yx(u) = y, is the solution for Y in a mutilated model

Mx, with the equations for X replaced by X = x.

(“unit-based potential outcome”)

Joint probabilities of counterfactuals:

Definition: Potential Response

The sentence: “Y would be y (in unit u), had X been x,”

denoted Yx(u) = y, is the solution for Y in a mutilated model

Mx, with the equations for X replaced by X = x.

(“unit-based potential outcome”)

In particular:

Definition: Potential Response

The sentence: “Y would be y (in unit u), had X been x,”

denoted Yx(u) = y, is the solution for Y in a mutilated model

Mx, with the equations for X replaced by X = x.

(“unit-based potential outcome”)

Joint probabilities of counterfactuals:

COUNTERFACTUALS

U

U

TRUE

TRUE

C

C

FALSE

FALSE

A

B

A

B

D

D

TRUE

TRUE

S5. If the prisoner is dead, he would still be dead

if A were not to have shot. DDA

Abduction

Action

Prediction

U

TRUE

C

A

B

D

OF COUNTERFACTUALS

U

U

P(u|D)

P(u)

P(u|D)

P(u|D)

C

C

FALSE

FALSE

A

B

A

B

D

D

TRUE

P(DA|D)

P(S5). The prisoner is dead. How likely is it that he would be dead

if A were not to have shot. P(DA|D) = ?

Abduction

Action

Prediction

U

C

A

B

D

• Inference with Nonparametric Structural Equations

• made possible through Graphical Analysis.

• Mathematical underpinning of counterfactuals

• through nonparametric structural equations

• Graphical-Counterfactuals symbiosis

Definition:

Let Q(M) be any quantity defined on a causal

model M, andlet A be a set of assumption.

Q is identifiable relative to A iff

P(M1) = P(M2) ÞQ(M1) = Q(M2)

for all M1, M2, that satisfy A.

Definition:

Let Q(M) be any quantity defined on a causal

model M, andlet A be a set of assumption.

Q is identifiable relative to A iff

P(M1) = P(M2) ÞQ(M1) = Q(M2)

for all M1, M2, that satisfy A.

In other words, Q can be determined uniquely

from the probability distribution P(v) of the

endogenous variables, V, and assumptions A.

Definition:

Let Q(M) be any quantity defined on a causal

model M, andlet A be a set of assumption.

Q is identifiable relative to A iff

P(M1) = P(M2)ÞQ(M1) = Q(M2)

for all M1, M2, that satisfy A.

In this talk:

A: Assumptions encoded in the diagram

Q1: P(y|do(x)) Causal Effect (= P(Yx=y))

Q2: P(Yx =y | x, y) Probability of necessity

Q3: Direct Effect

OF CAUSAL INFERENCE

Causal Markov Theorem:

Any distribution generated by Markovian structural model M

(recursive, with independent disturbances) can be factorized as

Where pai are the (values of) the parents of Viin the causal

diagram associated with M.

Corollary: (Truncated factorization, Manipulation Theorem)

The distribution generated by an intervention do(X=x)

(in a Markovian model M) is given by the truncated factorization

THE FUNDAMENTAL THEOREM

OF CAUSAL INFERENCE

Causal Markov Theorem:

Any distribution generated by Markovian structural model M

(recursive, with independent disturbances) can be factorized as

Where pai are the (values of) the parents of Viin the causal

diagram associated with M.

Given P(x,y,z),should we ban smoking?

U (unobserved)

U (unobserved)

X = x

Y

Z

X

Y

Z

Smoking

Tar in

Lungs

Cancer

Smoking

Tar in

Lungs

Cancer

RAMIFICATIONS OF THE FUNDAMENTAL THEOREM

Given P(x,y,z),should we ban smoking?

U (unobserved)

U (unobserved)

X = x

Y

Z

X

Y

Z

Smoking

Tar in

Lungs

Cancer

Smoking

Tar in

Lungs

Cancer

RAMIFICATIONS OF THE FUNDAMENTAL THEOREM

Pre-intervention

Post-intervention

Given P(x,y,z),should we ban smoking?

U (unobserved)

U (unobserved)

X = x

Y

Z

X

Y

Z

Smoking

Tar in

Lungs

Cancer

Smoking

Tar in

Lungs

Cancer

RAMIFICATIONS OF THE FUNDAMENTAL THEOREM

Pre-intervention

Post-intervention

To compute P(y,z|do(x)), wemust eliminate u. (graphical problem).

Gx

THE BACK-DOOR CRITERION

Graphical test of identification

P(y | do(x)) is identifiable in G if there is a set Z of

variables such that Zd-separates X from Y in Gx.

Z1

Z1

Z2

Z2

Z

Z3

Z3

Z4

Z5

Z5

Z4

X

X

Z6

Y

Y

Z6

Gx

Moreover, P(y | do(x)) = åP(y | x,z) P(z)

z

THE BACK-DOOR CRITERION

Graphical test of identification

P(y | do(x)) is identifiable in G if there is a set Z of

variables such that Zd-separates X from Y in Gx.

Z1

Z1

Z2

Z2

Z

Z3

Z3

Z4

Z5

Z5

Z4

X

X

Z6

Y

Y

Z6

• Rule 1:Ignoring observations

• P(y |do{x},z, w) = P(y | do{x},w)

• Rule 2:Action/observation exchange

• P(y |do{x}, do{z}, w) = P(y|do{x},z,w)

• Rule 3: Ignoring actions

• P(y |do{x},do{z},w) = P(y|do{x},w)

Genotype (Unobserved)

Smoking

Tar

Cancer

Probability Axioms

P (c |do{s})=tP (c |do{s},t) P (t |do{s})

Rule 2

= tP (c |do{s},do{t})P (t |do{s})

Rule 2

= tP (c |do{s},do{t})P (t | s)

Rule 3

= tP (c |do{t})P (t | s)

Probability Axioms

= stP (c |do{t},s) P (s|do{t})P(t |s)

Rule 2

= stP (c | t, s) P (s|do{t})P(t |s)

Rule 3

= stP (c | t, s) P (s) P(t |s)

• Modeling: Statistical vs. Causal

• Causal models and identifiability

• Inference to three types of claims:

• Effects of potential interventions,

• Your Honor! My client (Mr. A) died BECAUSE

• he used that drug.

• Your Honor! My client (Mr. A) died BECAUSE

• he used that drug.

• Court to decide if it is MORE PROBABLE THAN

• NOT that A would be alive BUT FOR the drug!

• P(? | A is dead, took the drug) > 0.50

• Theoretical Problems:

• What is the meaning of PN(x,y):

• “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.”

• Theoretical Problems:

• What is the meaning of PN(x,y):

• “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.”

• Theoretical Problems:

• What is the meaning of PN(x,y):

• “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.”

• Under what condition can PN(x,y) be learned from statistical data, i.e., observational, experimental and combined.

Simple Experiment:

Q = P(Yx= y | z)

Z nondescendants of X.

Compound Experiment:

Q = P(YX(z) = y | z)

Multi-Stage Experiment:

etc…

ExperimentalNonexperimental

do(x) do(x) xx

Deaths (y) 16 14 2 28

Survivals (y) 984 986 998 972

1,000 1,000 1,000 1,000

• Nonexperimental data: drug usage predicts longer life

• Experimental data: drug has negligible effect on survival

• Plaintiff: Mr. A is special.

• He actually died

• He used the drug by choice

• Court to decide (given both data):

• Is it more probable than not that A would be alive

• but for the drug?

(Tian and Pearl, 2000)

• Identifiability under monotonicity (Combined data)

• corrected Excess-Risk-Ratio

• Bounds given combined nonexperimental and experimental data

SOLUTION TO THE ATTRIBUTION PROBLEM (Cont)

• From population data to individual case

• Combined data tell more that each study alone

• Modeling: Statistical vs. Causal

• Causal models and identifiability

• Inference to three types of claims:

• Effects of potential interventions,

• Claims about direct and indirect effects

• What is the semantics of direct and indirect effects?

• Can we estimate them from data? Experimental data?

TOTAL, DIRECT, AND INDIRECT EFFECTS HAVE SIMPLE SEMANTICS

IN LINEAR MODELS

b

X

Z

z = bx + 1

y = ax + cz + 2

a

c

Y

a+bc

a

bc

IN NONLINEAR MODELS

(even when the model is completely specified)

X

Z

z = f (x, 1)

y = g (x, z, 2)

Y

Dependent on z?

Void of operational meaning?

DIRECT EFFECTS

X

Z

z = f (x, 1)

y = g (x, z, 2)

Y

“Natural” Direct Effect of X on Y:

The expected change in Y per unit change of X, when we keep Z constant at whatever value it attains before the change.

In linear models, NDE = Controlled Direct Effect

QUALIFICATION

HIRING

POLICY IMPLICATIONS

(Who cares?)

indirect

What is the direct effect of X on Y?

The effect of Gender on Hiring if sex discrimination

is eliminated.

X

Z

IGNORE

f

Y

INDIRECT EFFECTS

X

Z

z = f (x, 1)

y = g (x, z, 2)

Y

“Natural” Indirect Effect of X on Y:

The expected change in Y when we keep X constant, say at x0, and let Z change to whatever value it would have under a unit change in X.

In linear models, NIE = TE - DE

(FORMALIZING DISCRIMINATION)

``The central question in any employment-discrimination case is whether the employer would have taken the same action had the employee been of different race (age, sex, religion, national origin etc.) and everything else had been the same’’

[Carson versus Bethlehem Steel Corp. (70 FEP Cases 921, 7th Cir. (1996))]

x = male, x = female

y = hire, y = not hire

z = applicant’s qualifications

NO DIRECT EFFECT

YxZx= Yx, YxZx = Yx

Consider the quantity

Given M, P(u), Q is well defined

Given u, Zx*(u) is the solution for Z in Mx*,call it z

is the solution for Y in Mxz

Can Q be estimated from data?

• Graphical conditions for estimability from

• experimental / nonexperimental data.

• Graphical conditions hold in Markovian models

• Graphical conditions for estimability from

• experimental / nonexperimental data.

• Graphical conditions hold in Markovian models

• Useful in answering new type of policy questions

• involving mechanism blocking instead of variable fixing.

• Define Q(M) as a counterfactual expression

• Determine conditions for the reduction

• If reduction is feasible, Q is inferable.

• Demonstrated on three types of queries:

• Q1: P(y|do(x)) Causal Effect (= P(Yx=y))

Q2: P(Yx = y | x, y) Probability of necessity

Q3: Direct Effect

x

y

z

FALSIFIABILITY and CORROBORATION

P*

P*(M)

Falsifiability: P*(M) P*

D (Data)

Constraints implied by M

Data Dcorroborates model M if M is (i) falsifiable

and (ii) compatible with D.

Types of constraints:1. conditional independencies2. inequalities (for restricted domains)3. functional

e.g.,

Changes under interventions

For all causal models:

For all semi-Markovian models:

For Markovian models (and ):

For a given Markovian model:

TO CORROBORATING CLAIMS

A corroborated model can imply identifiable yet

uncorroborated claims.

e.g.,

x

x

y

y

z

z

x

y

z

a

a

b

Some claims can be more corroborated than others.

Definition:

An identifiable claim C is corroborated by data if some minimal set of assumptions in M sufficient for identifying C is corroborated by the data.

Graphical criterion: minimal submodel = maximal supergraph

TO CORROBORATING CLAIMS

A corroborated model can imply identifiable yet

uncorroborated claims.

e.g.,

x

x

y

y

z

z

x

y

z

a

a

b

Some claims can be more corroborated than others.

Definition:

An identifiable claim C is corroborated by data if some minimal set of assumptions in M sufficient for identifying C is corroborated by the data.

Graphical criterion: minimal submodel = maximal supergraph

Scope and Language in Scientific Theories

• Statistical models

• (observtions, PL)

• Causal models

• 2.1 Stochastic causal model

• (interventions, PL + modality)

• 2.2 Functional causal models

• (counterfactuals, PL + subjunctives)

• General equational models

• (explicit interventions, PL)

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •

• General Scientific theories

• (objects-properties, FOL-SOL ...)