slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Tutorial: Causal Model Search PowerPoint Presentation
Download Presentation
Tutorial: Causal Model Search

Loading in 2 Seconds...

play fullscreen
1 / 93

Tutorial: Causal Model Search - PowerPoint PPT Presentation


  • 702 Views
  • Uploaded on

Tutorial: Causal Model Search. Richard Scheines Carnegie Mellon University. Peter Spirtes, Clark Glymour , Joe Ramsey, others. Goals. Convey rudiments of graphical causal models Basic working knowledge of Tetrad IV. Tetrad: Complete Causal Modeling Tool. Tetrad.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Tutorial: Causal Model Search' - piera


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Tutorial:

Causal Model Search

Richard Scheines

Carnegie Mellon University

Peter Spirtes, Clark Glymour, Joe Ramsey,

others

goals
Goals

Convey rudiments of graphical causal models

Basic working knowledge of Tetrad IV

tetrad
Tetrad
  • Main website: http://www.phil.cmu.edu/projects/tetrad/
  • Download: http://www.phil.cmu.edu/projects/tetrad/current.html
  • Data files: www.phil.cmu.edu/projects/tetrad_download/download/workshop/Data/
  • Download from Data directory:
    • tw.txt
    • Charity.txt
    • Optional:
      • estimation1.tet, estimation2.tet
      • search1.tet, search2.tet, search3.tet
outline
Outline

Motivation

Representing/Modeling Causal Systems

Estimation and Model fit

Causal Model Search

statistical causal models goals
StatisticalCausal Models: Goals
  • Policy, Law, and Science: How can we use data to answer
    • subjunctive questions (effects of future policy interventions), or
    • counterfactual questions (what would have happened had things been done differently (law)?
    • scientific questions (what mechanisms run the world)
  • Rumsfeld Problem: Do we know what we do and don’t know: Can we tell when there is or is not enough information in the data to answer causal questions?
causal inference requires more than probability
Causal Inference Requires More than Probability

Prediction from Observation ≠ Prediction from Intervention

P(Lung Cancer 1960 = y | Tar-stained fingers 1950 = no)

P(Lung Cancer 1960 = y | Tar-stained fingers 1950set = no)

In general: P(Y=y | X=x, Z=z) ≠ P(Y=y | Xset=x, Z=z)

Causal Prediction vs. Statistical Prediction:

Non-experimental data

(observational study)

P(Y=y | X=x, Z=z)

P(Y,X,Z)

P(Y=y | Xset=x, Z=z)

Causal Structure

Background Knowledge

causal search
Causal Search

Causal Search:

Find/compute all the causal models that are indistinguishable given background knowledge and data

Represent features common to all such models

Multiple Regressionis often the wrong tool for Causal Search:

Example: Foreign Investment & Democracy

foreign investment
Does Foreign Investment in 3rd World Countries inhibit Democracy?Foreign Investment

Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49, 141-146.

N = 72

PO degree of political exclusivity

CV lack of civil liberties

EN energy consumption per capita (economic development)

FI level of foreign investment

correlations

Foreign Investment

Correlations

po fi en cv

po 1.0

fi -.175 1.0

en -.480 0.330 1.0

cv 0.868 -.391 -.430 1.0

regression results

Case Study: Foreign Investment

Regression Results

po = .227*fi - .176*en + .880*cv

SE (.058) (.059) (.060)

t 3.941 -2.99 14.6

Interpretation: foreign investment increases political repression

alternatives

Case Study: Foreign Investment

Alternatives
  • There is no model with testable constraints (df > 0) that is not rejected by the data, in which FI has a positive effect on PO.
tetrad demo
Tetrad Demo
  • Load tw.txt data
  • Estimate regression
  • Search for alternatives
  • Estimate alternative
tetrad hands on
Tetrad Hands-On
  • Load tw.txt data
  • Estimate regression
outline1
Outline
  • Motivation
  • Representing/Modeling Causal Systems
    • Causal Graphs
    • Standard Parametric Models
      • Bayes Nets
      • Structural Equation Models
    • Other Parametric Models
      • Generalized SEMs
      • Time Lag models
slide16

Causal Graphs

Causal Graph G = {V,E}

Each edge X  Y represents a direct causal claim:

X is a direct cause of Y relative to V

Years of Education

Income

Years of Education

Skills and Knowledge

Income

causal graphs
Causal Graphs

Omitteed Causes

Omitteed Common Causes

Not Cause Complete

Common Cause Complete

Years of Education

Years of Education

Skills and Knowledge

Skills and Knowledge

Income

Income

slide18

Modeling Ideal Interventions

Interventions on the Effect

Post

Pre-experimental System

Room Temperature

Sweaters

On

modeling ideal interventions
Modeling Ideal Interventions

Interventions on the Cause

Post

Pre-experimental System

Room

Temperature

Sweaters

On

interventions causal graphs

Pre-intervention graph

“Soft” Intervention

“Hard” Intervention

Interventions & Causal Graphs

Model an ideal intervention by adding an “intervention” variable outside the original system as a direct cause of its target.

Intervene on Income

tetrad demo hands on
Tetrad Demo & Hands-On

Build and Save an acyclic causal graph:

with 3 measured variables, no latents

with 5 variables, and at least 1 latent

causal bayes networks
Causal Bayes Networks

The Joint Distribution Factors

According to the Causal Graph,

P(S,YF, L) = P(S) P(YF | S) P(LC | S)

causal bayes networks1
Causal Bayes Networks

The Joint Distribution Factors

According to the Causal Graph,

P(S = 0) = 1

P(S = 1) = 1 - 1

P(YF = 0 | S = 0) = 2 P(LC = 0 | S = 0) = 4

P(YF = 1 | S = 0) = 1- 2 P(LC = 1 | S = 0) = 1- 4

P(YF = 0 | S = 1) = 3 P(LC = 0 | S = 1) = 5

P(YF = 1 | S = 1) = 1- 3 P(LC = 1 | S = 1) = 1- 5

P(S) P(YF | S) P(LC | S) = f()

All variables binary [0,1]:  = {1, 2,3,4,5,}

tetrad demo hands on1
Tetrad Demo & Hands-On

Attach a Bayes PM to your 3-variable graph

Define the Bayes PM (# and values of categories for each variable)

Attach an IM to the Bayes PM

Fill in the Conditional Probability Tables.

structural equation models
Structural Equation Models

Causal Graph

  • Structural Equations

For each variable X  V, an assignment equation:

X := fX(immediate-causes(X), eX)

  • Exogenous Distribution: Joint distribution over the exogenous vars : P(e)
slide28

Linear Structural Equation Models

Causal Graph

Path diagram

Equations:

Education := Education

Income :=Educationincome

Longevity :=EducationLongevity

Exogenous Distribution:

P(ed, Income,Income)

- i≠jei ej(pairwise independence)

- no variance is zero

E.g.

(ed, Income,Income) ~N(0,2)

2 diagonal,

- no variance is zero

Structural Equation Model:

V = BV + E

tetrad demo hands on2
Tetrad Demo & Hands-On

Attach a SEM PM to your 3-variable graph

Attach a SEM IM to the SEM PM

Change the coefficient values.

Simulate Data from both your SEM IM and your Bayes IM

outline2
Outline

Motivation

Representing/Modeling Causal Systems

Estimation and Model fit

Model Search

tetrad demo and hands on
Tetrad Demo and Hands-on

Select Template: “Estimate from Simulated Data”

Build the SEM shown below – all error standard deviations = 1.0 (go into the Tabular Editor)

Generate simulated data N=1000

Estimate model.

Save session as “Estimate1”

coefficient inference vs model fit
Coefficient inference vs. Model Fit

Coefficient Inference: Null: coefficient = 0

p-value = p(Estimated value bX1 X3 ≥ .4788 | bX1 X3 = 0 & rest of model correct)

Reject null (coefficient is “significant”) when p-value < a, a usually = .05

coefficient inference vs model fit1
Coefficient inference vs. Model Fit

Coefficient Inference: Null: coefficient = 0

p-value = p(Estimated value bX1 X3 ≥ .4788 | bX1 X3 = 0& rest of model correct)

Reject null (coefficient is “significant”) when p-value < < a, a usually = .05,

Model fit: Null: Model is correctly specified (constraints true in population)

p-value = p(f(Deviation(Sml,S))≥ 5.7137 | Model correctly specified)

tetrad demo and hands on1
Tetrad Demo and Hands-on
  • Create two DAGs with the same variables – each with one edge flipped, and attach a SEM PM to each new graph (copy and paste by selecting nodes, Ctl-C to copy, and then Ctl-V to paste)
  • Estimate each new model on the data produced by original graph
  • Check p-values of:
    • Edge coefficients
    • Model fit
  • Save session as: “session2”
charitable giving
What influences giving? Sympathy? Impact? Charitable Giving

"The Donor is in the Details", Organizational Behavior and Human Decision Processes, Issue 1, 15-23, with G. Loewenstein, R. Scheines.

N = 94

TangibilityCondition [1,0] Randomly assigned experimental condition

Imaginability [1..7]How concrete scenario I

Sympathy [1..7] How much sympathy for target

Impact [1..7] How much impact will my donation have

AmountDonated [0..5] How much actually donated

tetrad demo and hands on2
Tetrad Demo and Hands-on

Load charity.txt (tabular – not covariance data)

Build graph of theoretical hypothesis

Build SEM PM from graph

Estimate PM, check results

outline3
Outline
  • Motivation
  • Representing/Modeling Causal Systems
  • Estimation and Model fit
  • Model Search
    • Bridge Principles (Causal GraphsProbability Constraints):
      • Markov assumption
      • Faithfulness assumption
      • D-separation
    • Equivalence classes
    • Search
slide44

Statistical Inference

Background Knowledge

e.g., X2prior in time to X3

Constraint Based Search

X1_||_ X2means:P(X1, X2) =P(X1)P(X2)

X1_||_X2| X3means:P(X1, X2 | X3) =P(X1| X3)P(X2| X3)

slide45

Score Based Search

Model Score

Background Knowledge

e.g., X2prior in time to X3

independence equivalence classes patterns pags
Independence Equivalence Classes:Patterns & PAGs
  • Patterns(Verma and Pearl, 1990): graphical representation of d-separation equivalence among models with no latent common causes
  • PAGs: (Richardson 1994) graphical representation of a d-separation equivalence class that includes models with latent common causes and sample selection bias that are d-separation equivalent over a set of measured variables X
tetrad demo and hands on4
Tetrad Demo and Hands-on

Go to “session2”

Add Search node (from Data1)- Choose and execute one of the “Pattern searches”

Add a “Graph Manipulation” node to searchresult: “choose Dag in Pattern”

Add a PM to GraphManip

Estimate the PM on the data

Compare model-fit to model fit for true model

graphical characterization of model equivalence
Graphical Characterization of Model Equivalence

Why do some changes to the true model result in an equivalent model, but some do not?

d separation independence equivalence
d-separation/Independence Equivalence

D-separation Equivalence Theorem (Verma and Pearl, 1988)

Two acyclic graphs over the same set of variables are d-separation equivalent iff they have:

  • the same adjacencies
  • the same unshielded colliders
colliders
Colliders

Y: Non-Collider

Z

Z

X

X

Z

X

X

Z

Y: Collider

Y

Y

Y

Y

Shielded

Unshielded

X

X

Z

Z

Y

Y

slide55

Statistical Inference

Background Knowledge

e.g., X2prior in time to X3

Constraint Based Search

X1_||_ X2means:P(X1, X2) =P(X1)P(X2)

X1_||_X2| X3means:P(X1, X2 | X3) =P(X1| X3)P(X2| X3)

backround knowledge tetrad demo and hands on
Backround KnowledgeTetrad Demo and Hands-on

Create new session

Select “Search from Simulated Data” from Template menu

Build graph below, PM, IM, and generate sample data N=1,000.

Execute PC search, a = .05

backround knowledge tetrad demo and hands on1
Backround KnowledgeTetrad Demo and Hands-on

Add “Knowledge” node – as below

Create “Tiers” as shown below.

Execute PC search again, a = .05

Compare results (Search2) to previous search (Search1)

backround knowledge direct and indirect consequences
Backround KnowledgeDirect and Indirect Consequences

True Graph

PC Output

No Background Knowledge

PC Output

Background Knowledge

backround knowledge direct and indirect consequences1
Backround KnowledgeDirect and Indirect Consequences

True Graph

Direct Consequence

Of Background Knowledge

PC Output

No Background Knowledge

PC Output

Background Knowledge

Indirect Consequence

Of Background Knowledge

independence equivalence classes patterns pags1
Independence Equivalence Classes:Patterns & PAGs
  • Patterns(Verma and Pearl, 1990): graphical representation of d-separation equivalence among models with no latent common causes
  • PAGs: (Richardson 1994) graphical representation of a d-separation equivalence class that includes models with latent common causes and sample selection bias that are d-separation equivalent over a set of measured variables X
interesting cases
Interesting Cases

L1

L2

L1

L

X2

X1

M1

M2

Z

Y1

Y2

Y

X

Z1

X

Z2

Y

M3

pags partial ancestral graphs2
PAGs: Partial Ancestral Graphs

What PAG edges mean.

tetrad demo and hands on5
Tetrad Demo and Hands-on

Create new session

Select “Search from Simulated Data” from Template menu

Build graph below, SEM PM, IM, and generate sample data N=1,000.

Execute PC search, a = .05

Execute FCI search, a = .05

Estimate multiple regression,Y as response, Z1, X, Z2 as Predictors

search methods
Search Methods
  • Constraint Based Searches
    • PC, FCI
    • Very fast – capable of handling >5,000 variables
    • Pointwise, but not uniformly consistent
  • Scoring Searches
    • Scores: BIC, AIC, etc.
    • Search: Hill Climb, Genetic Alg., Simulated Annealing
    • Difficult to extend to latent variable models
    • Meek and Chickering Greedy Equivalence Class (GES)
    • Slower than constraint based searches – but now capable of 1,000 vars
    • Pointwise, but not uniformly consistent
  • Latent Variable Psychometric Model Search
    • BPC, MIMbuild, etc.
  • Linear non-Gaussian models (Lingam)
  • Models with cycles
  • And more!!!
tetrad demo and hands on6
Tetrad Demo and Hands-on

Load charity.txt (tabular – not covariance data)

Build graph of theoretical hypothesis

Build SEM PM from graph

Estimate PM, check results

tetrad demo and hands on7
Tetrad Demo and Hands-on

Create background knowledge: Tangibility exogenous (uncaused)

Search for models

Estimate one model from the output of search

Check model fit, check parameter estimates, esp. their sign

constraint based search
Constraint-based Search

1) Adjacency

2) Orientation

constraint based search adjacency
Constraint-based Search: Adjacency

X and Y are adjacent if they are dependent conditional on all subsets that don’t include them

X and Y are not adjacent if they are independent conditional on any subset that doesn’t include them

search orientation
Search: Orientation

Patterns

Y Unshielded

Z

X

Y

X _||_ Z | Y

X _||_ Z | Y

Non-Collider

Collider

Z

X

Y

Z

X

Y

Z

X

Y

Z

X

Y

Z

X

Y

search orientation1
Search: Orientation

PAGs

Y Unshielded

Z

X

Y

X _||_ Z | Y

X _||_ Z | Y

Non-Collider

Collider

Z

X

Y

Z

X

Y

search orientation2
Search: Orientation

Away from Collider

search orientation3
Search: Orientation

After Orientation Phase

X1 || X2

X1 || X4 | X3

X2 || X4 | X3

slide78

Bridge Principles: AcyclicCausal Graph over V  Constraints on P(V)

  • Weak Causal Markov AssumptionV1,V2causally disconnectedV1 _||_ V2
  • V1,V2causally disconnected
  • V1 not a cause of V2, and
  • V2 not a cause of V1, and
  • No common cause Z of V1andV2

V1 _||_ V2 P(V1,V2) =P(V1)P(V2)

slide79

Bridge Principles: AcyclicCausal Graph over V  Constraints on P(V)

  • Determinism
  • (Structural Equations)
  • Weak Causal Markov AssumptionV1,V2causally disconnectedV1 _||_ V2
    • Causal Markov Axiom
  • If G is a causal graph, and P a probability distribution over the variables in G, then in <G,P> satisfy the Markov Axiom iff:
  • every variable V is independent of its non-effects,
  • conditional on its immediate causes.
slide80

Bridge Principles: AcyclicCausal Graph over V  Constraints on P(V)

  • Causal Markov Axiom
  • Acyclicity
  • d-separation criterion
  • Causal Graph
  • Independence Oracle
  • Z _||_ Y1 | X Z _||_ Y2 | X
  • Z _||_ Y1 | X,Y2 Z _||_ Y2 | X,Y1
  • Y1 _||_ Y2 | X Y1 _||_ Y2 | X,Z
  • Z
  • X
  • Y1
  • Y2
slide81

Faithfulness

Constraints on aprobability distribution P generated by a causal structure G hold for all parameterizations of G.

Tax Rate

b3

Revenues := b1Rate + b2Economy + eRev

Economy := b3Rate + eEcon

b1

Economy

b2

Tax Revenues

Faithfulness:

b1≠ -b3b2

b2≠ -b3b1

colliders1
Colliders

Y: Non-Collider

Z

Z

X

X

Z

X

X

Z

Y: Collider

Y

Y

Y

Y

Shielded

Unshielded

X

X

Z

Z

Y

Y

colliders induce association

Non-Colliders screen-off Association

Colliders induce Association

Gas

[y,n]

Battery

[live, dead]

Exp

[y,n]

Symptoms

[live, dead]

Car Starts

[y,n]

Infection

[y,n]

Gas _||_ Battery

Exp_||_ Symptoms

Exp_||_ Symptoms | Infection

Gas _||_ Battery | Car starts = no

d separation
D-separation

X is d-separated from Y by Z inGiff

  • Every undirected path between X and Y inG is inactive relative to Z

An undirected path is inactive relative to Z iff

any node on the path is inactive relative to Z

A node N (on a path) is active

relative to Z iff

a) N is a non-collider not in Z, or

b) N is a collider that is in Z, or has a descendant in Z

A node N (on a path) is inactive

relative to Z iff

a) N is a non-collider in Z, or

b) N is a collider that is not in Z, and has no descendant in Z

Undirected Paths between X , Y:

V

  • 1) X --> Z1 <-- W --> Y

X

Y

W

Z1

  • 2) X <-- V --> Y

Z2

d separation1
D-separation

X is d-separated from Y by Z inGiff

  • Every undirected path between X and Y inG is inactive relative to Z

An undirected path is inactive relative to Z iff

any node on the path is inactive relative to Z

A node N is inactive relative to Z iff

a) N is a non-collider in Z, or

b) N is a collider that is not in Z, and has no descendant in Z

V

X

Y

Undirected Paths between X , Y:

W

Z1

  • 1) X --> Z1 <-- W --> Y

No

X d-sep Y relative to Z =  ?

Z2

  • 2) X <-- V --> Y

Yes

  • X d-sep Y relative to Z = {V} ?

No

  • X d-sep Y relative to Z = {V, Z1 } ?

Yes

  • X d-sep Y relative to Z = {W, Z2 } ?
d separation2
D-separation

X3 and X1 d-sep by X2?

Yes: X3_||_ X1 | X2

X3 and X1 d-sep by X2?

No: X3_||_ X1 | X2

statistical control experimental control

X3 _||_ X1 | X2

Statistical Control≠ Experimental Control

Statistically control for X2

Experimentally control for X2

X3 _||_ X1 | X2(set)

statistical control experimental control1
Statistical Control≠ Experimental Control

Learning Gain

Exp. Condition

Behavior

Disposition

Exp Learning

Exp Learning is

Mediated by Behavior

Exp. Cond _||_ Learning Gain | Behavior, Disposition

Exp Learning is

Mediated by Behavior

Exp. Cond _||_ Learning Gain | Behavior set

Exp Learning is not

Mediated by Behavior

or

Unmeasured Confounder

Exp. Cond _||_ Learning Gain | Behavior observed

Exp. Cond _||_ Learning Gain

regression causal inference1
Regression & Causal Inference
  • Typical (non-experimental) strategy:
    • Establish a prima facie case (X associated with Y)
  • But, omitted variable bias
  • So, identifiyand measure potential confounders Z:
    • prior to X,
    • associated with X,
    • associated with Y
  • 3. Statistically adjust for Z (multiple regression)
regression causal inference2
Regression & Causal Inference
  • Strategy threatened by measurement error – ignore this for now
  • Multiple regression is provably unreliable
  • for causalinference unless:
  • X prior to Y
  • X, Z, and Y are causally sufficient (no confounding)
slide92

Truth

Regression

Y: outcome

X, Z, Explanatory

Alternative?

bX = 0

bZ ≠ 0

bX ≠ 0

bZ ≠ 0

bX ≠ 0

bZ1 ≠ 0

bZ2 ≠ 0

better methods exist
Better Methods Exist
  • Causal Model Search (since 1988):
  • Provably Reliable
  • Provably Rumsfeld

Tetrad Demo