outline n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Outline PowerPoint Presentation
Download Presentation
Outline

Loading in 2 Seconds...

play fullscreen
1 / 50

Outline - PowerPoint PPT Presentation


  • 197 Views
  • Uploaded on

Outline. Motivation Representing/Modeling Causal Systems Estimation and Updating Model Search Linear Latent Variable Models Case Study: fMRI. Outline. Search I: Causal Bayes Nets Bridge Principles: Causal Structure  Testable Statistical Constraints Equivalence Classes

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Outline' - idalee


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline

Motivation

Representing/Modeling Causal Systems

Estimation and Updating

Model Search

Linear Latent Variable Models

Case Study: fMRI

outline1
Outline
  • Search I: Causal Bayes Nets
    • Bridge Principles: Causal Structure  Testable Statistical Constraints
    • Equivalence Classes
    • Pattern Search
    • PAG Search
    • Variants
    • Simulation Studies on the Tetrad workbench
slide3

Bridge Principles: AcyclicCausal Graph over V  Constraints on P(V)

  • Weak Causal Markov AssumptionV1,V2causally disconnectedV1 _||_ V2

V1 _||_ V2 v1,v2 P(V1=v1 |V2 = v2) =P(V1=v1)

slide4

Bridge Principles: AcyclicCausal Graph over V  Constraints on P(V)

  • Determinism
  • (Structural Equations)
  • Weak Causal Markov AssumptionV1,V2causally disconnectedV1 _||_ V2
    • Causal Markov Axiom
  • If G is a causal graph, and P a probability distribution over the variables in G, then in <G,P> satisfy the Markov Axiom iff:
  • every variable V is independent of its non-effects,
  • conditional on its immediate causes.
slide5

Bridge Principles: AcyclicCausal Graph over V  Constraints on P(V)

  • Causal Markov Axiom
  • Acyclicity
  • d-separation criterion
  • Causal Graph
  • Independence Oracle
  • Z _||_ Y1 | X Z _||_ Y2 | X
  • Z _||_ Y1 | X,Y2 Z _||_ Y2 | X,Y1
  • Y1 _||_ Y2 | X Y1 _||_ Y2 | X,Z
  • Z
  • X
  • Y1
  • Y2
slide6

Faithfulness

Constraints on a probability distribution P generated by a causal structure G hold for all parameterizations of G.

Tax Rate

b3

Revenues := b1Rate + b2Economy + eRev

Economy := b3Rate + eEcon

b1

Economy

b2

Tax Revenues

Faithfulness:

b1≠ -b3b2

b2≠ -b3b1

equivalence classes
Equivalence Classes
  • Equivalence:
  • Independence Equivalence: M1╞ (X _||_ Y | Z)  M2╞ (X _||_ Y | Z)
  • Distribution Equivalence: q1q2M1(q1) = M2(q2), and vice versa)
  • Independence (d-separation equivalence)
    • DAGs : Patterns
    • PAGs : Partial Ancestral Graphs
    • Intervention Equivalence Classes
  • Measurement Model Equivalence Classes
  • Linear Non-Gaussian Model Equivalence Classes
  • Etc.
d separation independence equivalence
d-separation/Independence Equivalence

D-separation Equivalence Theorem (Verma and Pearl, 1988)

Two acyclic graphs over the same set of variables are d-separation equivalent iff they have:

  • the same adjacencies
  • the same unshielded colliders
colliders
Colliders

Y: Non-Collider

Z

Z

X

X

Z

X

X

Z

Y: Collider

Y

Y

Y

Y

Shielded

Unshielded

X

X

Z

Z

Y

Y

d separation
D-separation

X is d-separated from Y by Z inGiff

  • Every undirected path between X and Y inG is inactive relative to Z

An undirected path is inactive relative to Z iff

any node on the path is inactive relative to Z

A node N is inactive relative to Z iff

a) N is a non-collider in Z, or

b) N is a collider that is not in Z, and has no descendant in Z

V

Undirected Paths between X , Y:

X

Y

W

Z1

  • 1) X --> Z1 <-- W --> Y

Z2

  • 2) X <-- V --> Y
d separation1
D-separation

X is d-separated from Y by Z inGiff

  • Every undirected path between X and Y inG is inactive relative to Z

An undirected path is inactive relative to Z iff

any node on the path is inactive relative to Z

A node N is inactive relative to Z iff

a) N is a non-collider in Z, or

b) N is a collider that is not in Z, and has no descendant in Z

V

Undirected Paths between X , Y:

X

Y

W

Z1

  • 1) X --> Z1 <-- W --> Y

No

X d-sep Y relative to Z =  ?

Z2

  • 2) X <-- V --> Y

Yes

  • X d-sep Y relative to Z = {V} ?

No

  • X d-sep Y relative to Z = {V, Z1} ?

Yes

  • X d-sep Y relative to Z = {W, Z2} ?
d separation2

X3 and X1 d-sep by X2?

X3 and X1 d-sep by X2?

Yes: X3_||_ X1 | X2

No: X3_||_ X1 | X2

D-separation
independence equivalence classes patterns pags
Independence Equivalence Classes:Patterns & PAGs
  • Patterns(Verma and Pearl, 1990): graphical representation of d-separation equivalence among models with no latent common causes (i.e., causally sufficient models)
  • PAGs: (Richardson 1994) graphical representation of a d-separation equivalence class that includes models with latent common causes and sample selection bias that are Markov equivalent over a set of measured variables X
tetrad demo
Tetrad Demo

Load Session: patterns1.tet

Change Graph3 minimally to reduce number of equivalent DAGs maximally

Compute the DAGs that are equivalent to your original 3 variable DAG

slide19

Statistical Inference

Background Knowledge

e.g., X2prior in time to X3

Constraint Based Search

slide20

Score Based Search

Model Score

Background Knowledge

e.g., X2prior in time to X3

overview of search methods
Overview of Search Methods

Constraint Based Searches

  • TETRAD (PC, FCI)
  • Very fast – capable of handling 1,000 variables
  • Pointwise, but not uniformly consistent

Scoring Searches

  • Scores: BIC, AIC, etc.
  • Search: Hill Climb, Genetic Alg., Simulated Annealing
  • Difficult to extend to latent variable models
  • Meek and Chickering Greedy Equivalence Class (GES)
  • Very slow – max N ~ 30-40
  • Pointwise, but not uniformly consistent
tetrad demo1
Tetrad Demo

Open new session

Template: Search from Simulated Data

Create Graph, parameterize, instantiate, generate data N=50

Choose PC search, execute

Attach new search node, choose GES, execute

Play (sample size, parameters, alpha value, etc.)

tetrad demo2
Tetrad Demo
  • Open new session
  • Load Charity.txt
  • Create Knowledge:
    • Tangibility is exogenous
    • AmountDonate is Last
    • Tangibility direct cause of Imaginability
  • Perform Search
  • Estimate output
pags partial ancestral graphs2
PAGs: Partial Ancestral Graphs

What PAG edges mean.

constraint based search
Constraint-based Search

1) Adjacency

2) Orientation

constraint based search adjacency
Constraint-based Search: Adjacency

X and Y are adjacent if they are dependent conditional on all subsets that don’t include them

X and Y are not adjacent if they are independent conditional on any subset that doesn’t include them

search orientation
Search: Orientation

Patterns

Y Unshielded

Z

X

Y

X _||_ Z | Y

X _||_ Z | Y

Non-Collider

Collider

Z

X

Y

Z

X

Y

Z

X

Y

Z

X

Y

Z

X

Y

search orientation1
Search: Orientation

PAGs

Y Unshielded

Z

X

Y

X _||_ Z | Y

X _||_ Z | Y

Non-Collider

Collider

Z

X

Y

Z

X

Y

search orientation2
Search: Orientation

Away from Collider

search orientation3
Search: Orientation

After Orientation Phase

X1 || X2

X1 || X4 | X3

X2 || X4 | X3

interesting cases
Interesting Cases

L1

L2

L1

L

X2

X1

M1

M2

Z

Y1

Y2

Y

X

Z1

X

Z2

Y

M3

tetrad demo3
Tetrad Demo

Open new session

Create graph for M1, M2, M3 on previous slide

Search with PC and FCI on each graph, compare results

tetrad demo4
Tetrad Demo

Open new session

Load data: regression_data

X is “putative cause”, Y is putative effect, Z1,Z2 prior to both (potential confounders)

Use regression to estimate effect of X on Y

Apply FCI search to data

variants
Variants

CPC, CFCI

Lingam

lingam
LiNGAM
  • Most of the algorithms included in Tetrad (other than KPC) assume causal graphs are to be inferred from conditional independence tests.
  • Usually tests that assume linearity and Gaussianity.
  • LiNGAM uses a different approach.
  • Assumes linearity and non-Gaussianity.
  • Runs Independent Components Analysis (ICA) to estimate the coefficient matrix.
  • Rearranges the coefficient matrix to get a causal order.
  • Prunes weak coefficients by setting them to zero.
slide39
ICA
  • Although complicated, the basic idea is very simple.
      • a11 X1 + ... + a1n Xn = e1
      • ...
      • an1 X1 + ... + annXn = en
  • Assume e1,...,en are i.i.d.
  • Try to maximize the non-Gaussianityof w1 X1 + ... + wnXn= ?
  • There are n ways to do it up to symmetry! (Cf. Central Limit Theorem, Hyavarinen et al., 2002)
    • You can use the coefficients for e1, or for e2, or for...
    • All other linear combinations of e1,...,en are more Gaussian.
slide40
ICA
  • This equation is usually denoted Wx = s
      • But also X = BX + s where B is the coefficient matrix
      • So Wx = (I – B)x = e
    • s is the vector of independent components
    • x is the vector of variables
  • Just showed that under strong conditions we can estimate W.
    • So we can estimate B! (But with unknown row order)
    • Using assumptions of linearity and non-Gaussianity (of all but one variable) alone.
  • More sophisticated analyses allow errors to be non-i.i.d.
lingam1
LiNGAM
  • LiNGAM runs ICA to estimate the coefficient matrix B.
  • The order of the errors is not fixed by ICA, so some rearranging of the B matrix needs to be done.
  • Rows of the B matrix are swapped so the it is lower triangular.
    • a[i][j] should be non-zero (representing an edge) just in case ij
    • Typically, a cutoff is used to determine if a matrix element is zero.
    • The rearranged matrix corresponds to the idea of a causal order.
lingam2
LiNGAM
  • Once you know which nodes are adjacent in the graph and what the causal order is, you can infer a complete DAG.
  • Review:
      • Use data from a linear non-Gaussian model (all but one variable non-Gaussian)
      • Infer a complete DAG (more than a pattern!)
hands on
Hands On
  • Attach a Generalized SEM IM.
  • Attach a data set, simulate 1000 points.
  • Attach a Search box and run LiNGAM.
  • Attach another search box to Data and run PC.
  • Compare PC to LiNGAM.
special variants of algorithms
Special Variants of Algorithms
  • PC Pattern
    • PC Pattern enforces the requirement that the output of the algorithm will be a pattern.
  • PCD
    • PCD adds corrective code to PC for the case where some variables stand in deterministic relationships.
    • This results in fewer edges being removed from the graph.
      • For example, if X _||_ Y | Z but Z determines Y, X---Y is not taken out.
special variants of algorithms1
Special Variants of Algorithms
  • CPC
    • The PC algorithm may jump too quickly to the conclusion that a collider and noncolliders should be oriented, X->Y<-Z, X---Y---Z
    • The CPC algorithm uses a much more conservative test for colliders and noncolliders, double and triple checking to make sure they should be oriented, against different adjacents to X and to Z.
    • The result is a graph with fewer but more accurate orientations.
hands on1
Hands On
  • Simulate data from a “complicated” DAG using a SEM IM.
    • Choose the Search from Simulated Dataitem from the Templates menu.
    • Make a random 20 node 20 edge DAG.
    • Parameterize as a linear SEM, accepting defaults.
    • Run CPC.
    • Attach another search box to data.
    • Run PC.
    • Layout the PC graph using Fruchterman-Reingold.
    • Copy the layout to the CPC graph.
    • Open PC and CPC simultaneously and note the differences.
special variants of algorithms2
Special Variants of Algorithms
  • CFCI
    • Same idea as for CPC but for FCI instead.
  • KPC
    • The PC algorithm typically uses independence tests that assume linearity.
    • The KPC algorithm makes two changes:
      • It uses a non-parametric independence test.
      • It adds some steps to orient edges that are unoriented in the PC pattern.
special variants of algorithms3
Special Variants of Algorithms
  • PcLiNGAM
    • If some variables are Gaussian (more than one), others non-Gaussian, this algorithm applies.
    • Runs PC, then orients the unoriented edges (if possible) using non-Gaussianity.
  • LiNG
    • Extends LiNGAM to orient cycles using non-Gaussianity
special variants of algorithms4
Special Variants of Algorithms
  • JCPC
    • Uses a Markov blanket style test to add/remove individual edges, using CPC style orientation.
    • Allows individual adjacencies in the graph to be revised from the initial estimate using the PC adjacency search.