Discovering cyclic causal models by independent components analysis
This presentation is the property of its rightful owner.
Sponsored Links
1 / 33

Discovering Cyclic Causal Models by Independent Components Analysis PowerPoint PPT Presentation


  • 129 Views
  • Uploaded on
  • Presentation posted in: General

Discovering Cyclic Causal Models by Independent Components Analysis. Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer. x1. x1. x2. x2. x3. x3. x4. x4. Structural Equation Models (SEMs). Graphical models that represent causal relationships.

Download Presentation

Discovering Cyclic Causal Models by Independent Components Analysis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Discovering cyclic causal models by independent components analysis

Discovering Cyclic Causal Models by Independent Components Analysis

Gustavo Lacerda

Peter Spirtes

Joseph Ramsey

Patrik O. Hoyer


Structural equation models sems

x1

x1

x2

x2

x3

x3

x4

x4

Structural Equation Models (SEMs)

  • Graphical models that represent causal relationships.

  • Manipulating x3 to a fixed value…

M(do (x3 = k)):

M:

x3 = x4 = f4(x3)

f3(x1, x2)

k


Structural equation models sems1

x1

x2

x3

x4

Structural Equation Models (SEMs)

  • Can be acyclic

  • …or cyclic

  • The data produced bycyclic models can beinterpreted as equilibriumpoints of dynamical systems


Linear structural equation models sems deterministic example

x1

x2

1.2

0.9

x3

-5

x4

Linear Structural Equation Models (SEMs) (deterministic example)

  • The structural equations are linear

  • e.g.: x3 = 1.2 x1 + 0.9 x2 - 3 x4 = -5 x3 + 1

  • Each edge weight tells usthe corresponding coefficient


Linear structural equation models sems with randomness

e2

e1

x1

x2

1.2

0.9

x3

e3

-5

e4

x4

Linear Structural Equation Models (SEMs) (with randomness)

  • Now, each variable has an additive noise term with non-zero variance.

  • x1 = e1x2 = e2 x3 = 1.2 x1 + 0.9 x2 – 3 + e3x4 = -5 x3 + 1 + e4

  • x = B x + e


Linear structural equation models sems with randomness1

e2

e1

x1

x2

1.2

0.9

x3

e3

-5

e4

x4

Linear Structural Equation Models (SEMs) (with randomness)

  • x = B x + e

  • Solving for x, we get:x = (I – B)-1e

  • Let A = (I – B)-1then x = A e

  • A is called the “mixing matrix”.


Linear structural equation models sems with randomness2

Linear Structural Equation Models (SEMs) (with randomness)

  • The “mixing matrix” shows how the noise propagates:

e2

e1

x1

x2

1.2

0.9

x3

e3

-5

e4

x4


Linear structural equation models sems with randomness3

e2

e1

x1

x2

0.9

1.2

1

1

1

1

e3

x3

-5

e4

x4

Linear Structural Equation Models (SEMs) (with randomness)

  • The “mixing matrix” shows how the noise propagates:

  • Done.

Let’s make it:

e1

e2

e3

e4

-6

-4.5

0.9

1.2

-5

x1

x2

x3

x4


What can we learn from observational data alone

x1

x1

x2

x2

What can we learn from observational data alone?

  • Until recently, the best we could do was identify the d-separation equivalence class

  • We couldn’t tell the difference between:

M1:

M2:


Why not

x1

x1

x2

x2

Why not?

  • Because it was assumed that the error terms are Gaussian

  • …and when they are Gaussian, these two graphs are distribution-equivalent

M1:

M2:


Independent components analysis ica

e1

e2

x1

x2

Independent Components Analysis (ICA)

  • Cocktail party problem

  • You want to get back the original signals, but all you have are the mixtures. What can you do?

x = A e


Independent components analysis ica1

e1

e2

x1

x2

Independent Components Analysis (ICA)

  • Cocktail party problem

  • This equation has infinitely many solutions! For any invertible A, there is a solution!

  • But if you assume that the signals are independent, it is possible to estimate A and e from just x.

  • How?

x = A e


Independent components analysis ica2

e1

e2

x1

x2

Independent Components Analysis (ICA)

  • Cocktail party problem

  • Any choice of A implies a list of samples of e

  • Each list of implied samples of e has a degree of independence

  • We want the A for which the implied e’s are maximally independent

  • e’s maximally independent ↔ e’s maximally non-Gaussian

  • Intuition: Central Limit Theorem

x = A e


Independent components analysis ica3

Independent Components Analysis (ICA)

  • We don’t know which source signal is which, i.e. which is Alex and which is Bob

  • Scaling: when used with SEMs, the variance of each error term is confounded with its coefficients on each x.


The lingam approach shimizu et al 2006

e1

x1

1.1

1.5

e2

x2

x3

e3

-2

e4

x4

The LiNGAM approach(Shimizu et al, 2006)

  • What happens if we generate data from this linear SEM

  • … and then run ICA?


The lingam approach

1

1

1

1

e1

e2

e3

e4

-2

1.1

1.5

-3

x1

x2

x3

x4

The LiNGAM approach

  • We would expect to see:

  • Except that ICA doesn’t know the scaling


The lingam approach1

The LiNGAM approach

  • So we should expect to see something like:

  • …and we’d need to normalize by dividing all children of e1 by 2

e1

e2

e3

e4

-2

2

2.2

3

-6

x1

x2

x3

x4


The lingam approach2

e1

e2

e3

e4

x1

x2

x3

x4

The LiNGAM approach

  • getting us:

  • Except that ICA doesn’t know the order of the e’s, i.e. which e’s go with which x’s…

-2

1

1.1

1.5

-3


The lingam approach3

e…

e…

e…

e…

e1

e1

e2

e2

e3

e3

e4

e4

-2

-2

-2

2

2.2

1

2

2.2

1.1

3

-6

1.5

3

-3

-6

x1

x2

x3

x4

x1

x1

x2

x2

x3

x3

x4

x4

The LiNGAM approach

  • really, ICA gives us something like:

  • So first we need to find the right permutation of the e’s

  • And then do the scaling

  • Note that, since the model is a DAG, there is exactly one valid way to permute the error terms.


The lingam approach4

e1

x1

1.1

1.5

e2

x2

x3

e3

-2

e4

x4

The LiNGAM approach

  • After some matrix magic, we get back:

B = I – A-1


The lingam approach5

M1:

M2:

x1

x1

x2

x2

The LiNGAM approach

  • Discovers the full structure of the DAG

  • … by assuming causal sufficiency (i.e. independence of the error terms)

    • “causal sufficiency”: no latent variable is a cause of more than one observed variable

    • linear case, causal sufficiency ↔ independence of the error terms

  • In particular, now M1 and M2 can be distinguished!


The lingam approach6

The LiNGAM approach

Gaussian Uniform

Images by Patrik Hoyer et al, used with permissionfrom “Estimation of causal effects using linear non-Gaussian causal models with hidden variables”


The lingam approach7

e1

e2

e3

e4

x1

x2

x3

x4

The LiNGAM approach

  • Note that, once the valid permutation was found, there were no left-pointing arrows. This is because:

    • the generating model was a DAG.

    • we wrote down the x’s in an order compatible with it

  • But it is possible for ICA to return a matrix that does not satisfy the acyclicity assumption

  • LiNGAM will pretend the red edge is not there


The lingam approach8

The LiNGAM approach

  • LiNGAM cannot discover cyclic models…

  • because:

    • since it assumes the data was generated by a DAG,

    • it searches for a single valid permutation

  • If we search for any number of valid permutations…

  • then we can discover cyclic models too.

  • That’s exactly what we did!


The ling dg approach

The LiNG-DG approach

  • When the data looks acyclic, it works just like LiNGAM, and returns a single model.

  • When the data looks cyclic, more than one permutation is considered valid. Thus, it returns a distribution-equivalent set containing more than one model.

  • “distribution-equivalent” means you can’t do better, at least without experimental data or further assumptions.


The ling dg approach1

e1

x1

e4

x4

1.2

-0.3

e2

x2

-1

2

e3

x3

3

e5

x5

The LiNG-DG approach

  • Let’s simulate usingthis model:

  • Error terms are generatedby sampling from aGaussian and squaring

  • 15000 data points

  • We test which ICAcoefficients are zeroby using bootstrapsampling followed bya quantile test

  • Ready?


The ling dg approach2

The LiNG-DG approach

LiNG-DG returns a set with 2 models:

#1

#2


Ling dg the stability assumption

LiNG-DG + the stability assumption

  • Note that only one of these models is stable.

  • If our data is a set of equilibria, then the true model must be stable.

  • Under what conditions are we guaranteed to have a unique stable model?


Ling dg the stability assumption1

LiNG-DG + the stability assumption

  • Theorem: if the true model’s cycles don’t intersect, then only one model is stable.

  • For simple cycle models, cycle-products are inverted: c1 = 1/c2.

  • So at least one cycle will be > 1 (in modulus) and thus unstable.

  • each cycle works independently, and any valid permutation* will invert at least one cycle, creating an unstable model.

*except for the identity permutation


What should one use

What should one use?

Check out

Hoyer, Hyvärinen, Glymour, Spirtes, Scheines,Ramsey,

Lacerda, Shimizu

(submitted)

Constraint-based methods

e.g. PC, CPC, SGS

(or Geiger and Heckerman 1994 for a Bayesian alternative)

LiNGAM

unique model

d-separation equivalence class

LiNG-DG

2 cases

Richardson’s CCD

?

very large class: not even

covariance equivalent


Uai is due soon

UAI is due soon!

Please send me your comments:[email protected]


Appendix 1 self loops

Appendix 1: self-loops

  • Equilibrium equations usually correspond with the dynamical equations.

  • EXCEPT if a self-loop has coefficient 1, we will get the wrong structure, and the predicted results of intervention will be wrong!

  • self-loop coefficients are underdetermined.

  • Our stability results only hold if we assume no self-loops.


Appendix 2 search and pruning

Appendix 2: search and pruning

  • Testing zeros: local vs non-local methods

  • To estimate the variance of the estimated coefficients, we use bootstrap sampling, carefully.

  • How to find row-permutations of W that have a zeroless diagonal:

    • Acyclic: Hungarian algorithm

    • General: k-best linear assignments, or constrained n-Rooks (put rooks on the non-zero entries)


  • Login