slide1
Download
Skip this Video
Download Presentation
CAUSAL MODELING AND THE LOGIC OF SCIENCE

Loading in 2 Seconds...

play fullscreen
1 / 64

CAUSAL MODELING AND THE LOGIC OF SCIENCE - PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on

CAUSAL MODELING AND THE LOGIC OF SCIENCE. Judea Pearl Computer Science and Statistics UCLA www.cs.ucla.edu/~judea/. OVERVIEW Scope and Language in Scientific Theories. Statistical models ( observtions , PL ) Causal models 2.1 Stochastic causal model

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CAUSAL MODELING AND THE LOGIC OF SCIENCE' - zuwena


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

CAUSAL MODELING

AND THE

LOGIC OF SCIENCE

Judea Pearl

Computer Science and Statistics

UCLA

www.cs.ucla.edu/~judea/

slide2

OVERVIEW

Scope and Language in Scientific Theories

  • Statistical models
      • (observtions, PL)
  • Causal models
    • 2.1 Stochastic causal model
    • (interventions, PL + modality)
    • 2.2 Functional causal models
    • (counterfactuals, PL + subjunctives)
  • General equational models
    • (explicit interventions, PL)
    • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
  • General Scientific theories
    • (objects-properties, FOL-SOL ...)
slide3

OUTLINE

  • Modeling: Statistical vs. Causal
  • Causal models and identifiability
  • Inference to three types of claims:
  • Effects of potential interventions,
  • Claims about attribution (responsibility)
  • Claims about direct and indirect effects
  • Falsifiability and Corroboration
slide4

TRADITIONAL STATISTICAL

INFERENCE PARADIGM

P

Joint

Distribution

Q(P)

(Aspects of P)

Data

Inference

e.g.,

Infer whether customers who bought product A

would also buy product B.

Q = P(B|A)

slide5

THE CAUSAL INFERENCE

PARADIGM

M

Data-generating

Model

Q(M)

(Aspects of M)

Data

Inference

Some Q(M) cannot be inferred from P.

e.g.,

Infer whether customers who bought product A

would still buy A if we double the price.

slide6

Probability and statistics deal with static relations

Statistics

Probability

inferences

from passive

observations

joint

distribution

Data

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES

slide7

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES

Probability and statistics deal with static relations

Statistics

Probability

inferences

from passive

observations

joint

distribution

Data

  • Causal analysis deals with changes (dynamics)
  • i.e. What remains invariant when P changes.
  • P does not tell us how it ought to change
  • e.g. Curing symptoms vs. curing diseases
  • e.g. Analogy: mechanical deformation
slide8

Probability and statistics deal with static relations

Statistics

Probability

inferences

from passive

observations

joint

distribution

Data

Causal analysis deals with changes (dynamics)

  • Effects of
    • interventions

Data

Causal

Model

  • Causes of
    • effects

Causal

assumptions

  • Explanations

Experiments

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES

slide9

Causal and statistical concepts do not mix.

CAUSAL

Spurious correlation

Randomization

Confounding / Effect

Instrument

Holding constant

Explanatory variables

STATISTICAL

Regression

Association / Independence

“Controlling for” / Conditioning

Odd and risk ratios

Collapsibility

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES (CONT)

slide10

Causal and statistical concepts do not mix.

CAUSAL

Spurious correlation

Randomization

Confounding / Effect

Instrument

Holding constant

Explanatory variables

STATISTICAL

Regression

Association / Independence

“Controlling for” / Conditioning

Odd and risk ratios

Collapsibility

  • No causes in – no causes out (Cartwright, 1989)

}

statistical assumptions + data

causal assumptions

causal conclusions

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES (CONT)

  • Causal assumptions cannot be expressed in the mathematical language of standard statistics.
slide11

Causal and statistical concepts do not mix.

CAUSAL

Spurious correlation

Randomization

Confounding / Effect

Instrument

Holding constant

Explanatory variables

STATISTICAL

Regression

Association / Independence

“Controlling for” / Conditioning

Odd and risk ratios

Collapsibility

  • No causes in – no causes out (Cartwright, 1989)

}

statistical assumptions + data

causal assumptions

causal conclusions

FROM STATISTICAL TO CAUSAL ANALYSIS:

1. THE DIFFERENCES (CONT)

  • Causal assumptions cannot be expressed in the mathematical language of standard statistics.
  • Non-standard mathematics:
    • Structural equation models (SEM)
    • Counterfactuals (Neyman-Rubin)
    • Causal Diagrams (Wright, 1920)
slide12

WHAT\'SIN A CAUSAL MODEL?

Oracle that assigns truth value to causal

sentences:

Action sentences:B if wedoA.

Counterfactuals:B would be different if

Awere true.

Explanation:B occurredbecauseof A.

Optional:with whatprobability?

slide13

FAMILIAR CAUSAL MODEL

ORACLE FOR MANIPILATION

X

Y

Z

INPUT

OUTPUT

slide14

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i) V = {V1…,Vn} endogenous variables,

(ii) U = {U1,…,Um} background variables

(iii) F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

CAUSAL MODELS AND

CAUSAL DIAGRAMS

slide15

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i) V = {V1…,Vn} endogenous variables,

(ii) U = {U1,…,Um} background variables

(iii) F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

I

W

Q

P

CAUSAL MODELS AND

CAUSAL DIAGRAMS

U1

U2

PAQ

slide16

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i) V = {V1…,Vn} endogenous variables,

(ii) U = {U1,…,Um} background variables

(iii) F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

CAUSAL MODELS AND

MUTILATION

(iv) Mx= U,V,Fx, X  V, x  X

where Fx = {fi: Vi X }  {X = x}

(Replace all functions ficorresponding to X with the constant functions X=x)

slide17

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i) V = {V1…,Vn} endogenous variables,

(ii) U = {U1,…,Um} background variables

(iii) F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

I

W

Q

CAUSAL MODELS AND

MUTILATION

(iv)

U1

U2

P

slide18

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i) V = {V1…,Vn} endogenous variables,

(ii) U = {U1,…,Um} background variables

(iii) F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

I

W

Q

CAUSAL MODELS AND

MUTILATION

(iv)

Mp

U1

U2

P

P = p0

slide19

Definition: A causal model is a 3-tuple

M = V,U,F

with a mutilation operator do(x): MMx where:

(i) V = {V1…,Vn} endogenous variables,

(ii) U = {U1,…,Um} background variables

(iii) F = set of n functions, fi : V \ ViU Vi

vi = fi(pai,ui)PAi V \ ViUi U

PROBABILISTIC

CAUSAL MODELS

(iv) Mx= U,V,Fx, X  V, x  X

where Fx = {fi: Vi X }  {X = x}

(Replace all functions ficorresponding to X with the constant functions X=x)

Definition (Probabilistic Causal Model):

M, P(u)

P(u) is a probability assignment to the variables in U.

slide20

CAUSAL MODELS AND COUNTERFACTUALS

Definition: Potential Response

The sentence: “Y would be y (in unit u), had X been x,”

denoted Yx(u) = y, is the solution for Y in a mutilated model

Mx, with the equations for X replaced by X = x.

(“unit-based potential outcome”)

slide21

CAUSAL MODELS AND COUNTERFACTUALS

Joint probabilities of counterfactuals:

Definition: Potential Response

The sentence: “Y would be y (in unit u), had X been x,”

denoted Yx(u) = y, is the solution for Y in a mutilated model

Mx, with the equations for X replaced by X = x.

(“unit-based potential outcome”)

slide22

CAUSAL MODELS AND COUNTERFACTUALS

In particular:

Definition: Potential Response

The sentence: “Y would be y (in unit u), had X been x,”

denoted Yx(u) = y, is the solution for Y in a mutilated model

Mx, with the equations for X replaced by X = x.

(“unit-based potential outcome”)

Joint probabilities of counterfactuals:

slide23

3-STEPS TO COMPUTING

COUNTERFACTUALS

U

U

TRUE

TRUE

C

C

FALSE

FALSE

A

B

A

B

D

D

TRUE

TRUE

S5. If the prisoner is dead, he would still be dead

if A were not to have shot. DDA

Abduction

Action

Prediction

U

TRUE

C

A

B

D

slide24

COMPUTING PROBABILITIES

OF COUNTERFACTUALS

U

U

P(u|D)

P(u)

P(u|D)

P(u|D)

C

C

FALSE

FALSE

A

B

A

B

D

D

TRUE

P(DA|D)

P(S5). The prisoner is dead. How likely is it that he would be dead

if A were not to have shot. P(DA|D) = ?

Abduction

Action

Prediction

U

C

A

B

D

slide25

CAUSAL INFERENCE

MADE EASY (1985-2000)

  • Inference with Nonparametric Structural Equations
    • made possible through Graphical Analysis.
  • Mathematical underpinning of counterfactuals
    • through nonparametric structural equations
  • Graphical-Counterfactuals symbiosis
slide26

IDENTIFIABILITY

Definition:

Let Q(M) be any quantity defined on a causal

model M, andlet A be a set of assumption.

Q is identifiable relative to A iff

P(M1) = P(M2) ÞQ(M1) = Q(M2)

for all M1, M2, that satisfy A.

slide27

IDENTIFIABILITY

Definition:

Let Q(M) be any quantity defined on a causal

model M, andlet A be a set of assumption.

Q is identifiable relative to A iff

P(M1) = P(M2) ÞQ(M1) = Q(M2)

for all M1, M2, that satisfy A.

In other words, Q can be determined uniquely

from the probability distribution P(v) of the

endogenous variables, V, and assumptions A.

slide28

IDENTIFIABILITY

Definition:

Let Q(M) be any quantity defined on a causal

model M, andlet A be a set of assumption.

Q is identifiable relative to A iff

P(M1) = P(M2)ÞQ(M1) = Q(M2)

for all M1, M2, that satisfy A.

In this talk:

A: Assumptions encoded in the diagram

Q1: P(y|do(x)) Causal Effect (= P(Yx=y))

Q2: P(Yx =y | x, y) Probability of necessity

Q3: Direct Effect

slide29

THE FUNDAMENTAL THEOREM

OF CAUSAL INFERENCE

Causal Markov Theorem:

Any distribution generated by Markovian structural model M

(recursive, with independent disturbances) can be factorized as

Where pai are the (values of) the parents of Viin the causal

diagram associated with M.

slide30

Corollary: (Truncated factorization, Manipulation Theorem)

The distribution generated by an intervention do(X=x)

(in a Markovian model M) is given by the truncated factorization

THE FUNDAMENTAL THEOREM

OF CAUSAL INFERENCE

Causal Markov Theorem:

Any distribution generated by Markovian structural model M

(recursive, with independent disturbances) can be factorized as

Where pai are the (values of) the parents of Viin the causal

diagram associated with M.

slide31

Given P(x,y,z),should we ban smoking?

U (unobserved)

U (unobserved)

X = x

Y

Z

X

Y

Z

Smoking

Tar in

Lungs

Cancer

Smoking

Tar in

Lungs

Cancer

RAMIFICATIONS OF THE FUNDAMENTAL THEOREM

slide32

Given P(x,y,z),should we ban smoking?

U (unobserved)

U (unobserved)

X = x

Y

Z

X

Y

Z

Smoking

Tar in

Lungs

Cancer

Smoking

Tar in

Lungs

Cancer

RAMIFICATIONS OF THE FUNDAMENTAL THEOREM

Pre-intervention

Post-intervention

slide33

Given P(x,y,z),should we ban smoking?

U (unobserved)

U (unobserved)

X = x

Y

Z

X

Y

Z

Smoking

Tar in

Lungs

Cancer

Smoking

Tar in

Lungs

Cancer

RAMIFICATIONS OF THE FUNDAMENTAL THEOREM

Pre-intervention

Post-intervention

To compute P(y,z|do(x)), wemust eliminate u. (graphical problem).

slide34

G

Gx

THE BACK-DOOR CRITERION

Graphical test of identification

P(y | do(x)) is identifiable in G if there is a set Z of

variables such that Zd-separates X from Y in Gx.

Z1

Z1

Z2

Z2

Z

Z3

Z3

Z4

Z5

Z5

Z4

X

X

Z6

Y

Y

Z6

slide35

G

Gx

Moreover, P(y | do(x)) = åP(y | x,z) P(z)

(“adjusting” for Z)

z

THE BACK-DOOR CRITERION

Graphical test of identification

P(y | do(x)) is identifiable in G if there is a set Z of

variables such that Zd-separates X from Y in Gx.

Z1

Z1

Z2

Z2

Z

Z3

Z3

Z4

Z5

Z5

Z4

X

X

Z6

Y

Y

Z6

slide36

RULES OF CAUSAL CALCULUS

  • Rule 1:Ignoring observations
    • P(y |do{x},z, w) = P(y | do{x},w)
  • Rule 2:Action/observation exchange
    • P(y |do{x}, do{z}, w) = P(y|do{x},z,w)
  • Rule 3: Ignoring actions
    • P(y |do{x},do{z},w) = P(y|do{x},w)
slide37

DERIVATION IN CAUSAL CALCULUS

Genotype (Unobserved)

Smoking

Tar

Cancer

Probability Axioms

P (c |do{s})=tP (c |do{s},t) P (t |do{s})

Rule 2

= tP (c |do{s},do{t})P (t |do{s})

Rule 2

= tP (c |do{s},do{t})P (t | s)

Rule 3

= tP (c |do{t})P (t | s)

Probability Axioms

= stP (c |do{t},s) P (s|do{t})P(t |s)

Rule 2

= stP (c | t, s) P (s|do{t})P(t |s)

Rule 3

= stP (c | t, s) P (s) P(t |s)

slide38

OUTLINE

  • Modeling: Statistical vs. Causal
  • Causal models and identifiability
  • Inference to three types of claims:
  • Effects of potential interventions,
  • Claims about attribution (responsibility)
slide39

DETERMINING THE CAUSES OF EFFECTS

(The Attribution Problem)

  • Your Honor! My client (Mr. A) died BECAUSE
    • he used that drug.
slide40

DETERMINING THE CAUSES OF EFFECTS

(The Attribution Problem)

  • Your Honor! My client (Mr. A) died BECAUSE
    • he used that drug.
  • Court to decide if it is MORE PROBABLE THAN
    • NOT that A would be alive BUT FOR the drug!
    • P(? | A is dead, took the drug) > 0.50
slide41

THE PROBLEM

  • Theoretical Problems:
  • What is the meaning of PN(x,y):
  • “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.”
slide42

THE PROBLEM

  • Theoretical Problems:
  • What is the meaning of PN(x,y):
  • “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.”
  • Answer:
slide43

THE PROBLEM

  • Theoretical Problems:
  • What is the meaning of PN(x,y):
  • “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.”
  • Under what condition can PN(x,y) be learned from statistical data, i.e., observational, experimental and combined.
slide44

WHAT IS INFERABLE FROM EXPERIMENTS?

Simple Experiment:

Q = P(Yx= y | z)

Z nondescendants of X.

Compound Experiment:

Q = P(YX(z) = y | z)

Multi-Stage Experiment:

etc…

slide45

CAN FREQUENCY DATA DECIDE LEGAL RESPONSIBILITY?

ExperimentalNonexperimental

do(x) do(x) xx

Deaths (y) 16 14 2 28

Survivals (y) 984 986 998 972

1,000 1,000 1,000 1,000

  • Nonexperimental data: drug usage predicts longer life
  • Experimental data: drug has negligible effect on survival
  • Plaintiff: Mr. A is special.
  • He actually died
  • He used the drug by choice
  • Court to decide (given both data):
    • Is it more probable than not that A would be alive
    • but for the drug?
slide46

TYPICAL THEOREMS

(Tian and Pearl, 2000)

  • Identifiability under monotonicity (Combined data)
  • corrected Excess-Risk-Ratio
  • Bounds given combined nonexperimental and experimental data
slide47

WITH PROBABILITY ONE P(yx | x,y) =1

SOLUTION TO THE ATTRIBUTION PROBLEM (Cont)

  • From population data to individual case
  • Combined data tell more that each study alone
slide48

OUTLINE

  • Modeling: Statistical vs. Causal
  • Causal models and identifiability
  • Inference to three types of claims:
  • Effects of potential interventions,
  • Claims about attribution (responsibility)
  • Claims about direct and indirect effects
slide49

QUESTIONS ADDRESSED

  • What is the semantics of direct and indirect effects?
  • Can we estimate them from data? Experimental data?
slide50

TOTAL, DIRECT, AND INDIRECT EFFECTS HAVE SIMPLE SEMANTICS

IN LINEAR MODELS

b

X

Z

z = bx + 1

y = ax + cz + 2

a

c

Y

a+bc

a

bc

slide51

SEMANTICS BECOMES NONTRIVIAL

IN NONLINEAR MODELS

(even when the model is completely specified)

X

Z

z = f (x, 1)

y = g (x, z, 2)

Y

Dependent on z?

Void of operational meaning?

slide52

THE OPERATIONAL MEANING OF

DIRECT EFFECTS

X

Z

z = f (x, 1)

y = g (x, z, 2)

Y

“Natural” Direct Effect of X on Y:

The expected change in Y per unit change of X, when we keep Z constant at whatever value it attains before the change.

In linear models, NDE = Controlled Direct Effect

slide53

GENDER

QUALIFICATION

HIRING

POLICY IMPLICATIONS

(Who cares?)

indirect

What is the direct effect of X on Y?

The effect of Gender on Hiring if sex discrimination

is eliminated.

X

Z

IGNORE

f

Y

slide54

THE OPERATIONAL MEANING OF

INDIRECT EFFECTS

X

Z

z = f (x, 1)

y = g (x, z, 2)

Y

“Natural” Indirect Effect of X on Y:

The expected change in Y when we keep X constant, say at x0, and let Z change to whatever value it would have under a unit change in X.

In linear models, NIE = TE - DE

slide55

LEGAL DEFINITIONS TAKE THE NATURAL CONCEPTION

(FORMALIZING DISCRIMINATION)

``The central question in any employment-discrimination case is whether the employer would have taken the same action had the employee been of different race (age, sex, religion, national origin etc.) and everything else had been the same’’

[Carson versus Bethlehem Steel Corp. (70 FEP Cases 921, 7th Cir. (1996))]

x = male, x = female

y = hire, y = not hire

z = applicant’s qualifications

NO DIRECT EFFECT

YxZx= Yx, YxZx = Yx

slide56

SEMANTICS AND IDENTIFICATION OF NESTED COUNTERFACTUALS

Consider the quantity

Given M, P(u), Q is well defined

Given u, Zx*(u) is the solution for Z in Mx*,call it z

is the solution for Y in Mxz

Can Q be estimated from data?

slide57

ANSWERS TO QUESTIONS

  • Graphical conditions for estimability from
    • experimental / nonexperimental data.
  • Graphical conditions hold in Markovian models
slide58

ANSWERS TO QUESTIONS

  • Graphical conditions for estimability from
    • experimental / nonexperimental data.
  • Graphical conditions hold in Markovian models
  • Useful in answering new type of policy questions
    • involving mechanism blocking instead of variable fixing.
slide59

THE OVERRIDING THEME

    • Define Q(M) as a counterfactual expression
    • Determine conditions for the reduction
    • If reduction is feasible, Q is inferable.
  • Demonstrated on three types of queries:

Q1: P(y|do(x)) Causal Effect (= P(Yx=y))

Q2: P(Yx = y | x, y) Probability of necessity

Q3: Direct Effect

slide60

w

x

y

z

FALSIFIABILITY and CORROBORATION

P*

P*(M)

Falsifiability: P*(M) P*

D (Data)

Constraints implied by M

Data Dcorroborates model M if M is (i) falsifiable

and (ii) compatible with D.

Types of constraints:1. conditional independencies2. inequalities (for restricted domains)3. functional

e.g.,

slide61

OTHER TESTABLE CLAIMS

Changes under interventions

For all causal models:

For all semi-Markovian models:

For Markovian models (and ):

For a given Markovian model:

slide62

FROM CORROBORATING MODELS

TO CORROBORATING CLAIMS

A corroborated model can imply identifiable yet

uncorroborated claims.

e.g.,

x

x

y

y

z

z

x

y

z

a

a

b

Some claims can be more corroborated than others.

Definition:

An identifiable claim C is corroborated by data if some minimal set of assumptions in M sufficient for identifying C is corroborated by the data.

Graphical criterion: minimal submodel = maximal supergraph

slide63

FROM CORROBORATING MODELS

TO CORROBORATING CLAIMS

A corroborated model can imply identifiable yet

uncorroborated claims.

e.g.,

x

x

y

y

z

z

x

y

z

a

a

b

Some claims can be more corroborated than others.

Definition:

An identifiable claim C is corroborated by data if some minimal set of assumptions in M sufficient for identifying C is corroborated by the data.

Graphical criterion: minimal submodel = maximal supergraph

slide64

OVERVIEW

Scope and Language in Scientific Theories

  • Statistical models
      • (observtions, PL)
  • Causal models
    • 2.1 Stochastic causal model
    • (interventions, PL + modality)
    • 2.2 Functional causal models
    • (counterfactuals, PL + subjunctives)
  • General equational models
    • (explicit interventions, PL)
    • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
  • General Scientific theories
    • (objects-properties, FOL-SOL ...)
ad