Discovering Cyclic Causal Models by Independent Components Analysis

Download Presentation

Discovering Cyclic Causal Models by Independent Components Analysis

Loading in 2 Seconds...

- 166 Views
- Uploaded on
- Presentation posted in: General

Discovering Cyclic Causal Models by Independent Components Analysis

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Discovering Cyclic Causal Models by Independent Components Analysis

Gustavo Lacerda

Peter Spirtes

Joseph Ramsey

Patrik O. Hoyer

x1

x1

x2

x2

x3

x3

x4

x4

- Graphical models that represent causal relationships.
- Manipulating x3 to a fixed value…

M(do (x3 = k)):

M:

x3 = x4 = f4(x3)

f3(x1, x2)

k

x1

x2

x3

x4

- Can be acyclic
- …or cyclic
- The data produced bycyclic models can beinterpreted as equilibriumpoints of dynamical systems

x1

x2

1.2

0.9

x3

-5

x4

- The structural equations are linear
- e.g.: x3 = 1.2 x1 + 0.9 x2 - 3 x4 = -5 x3 + 1
- Each edge weight tells usthe corresponding coefficient

e2

e1

x1

x2

1.2

0.9

x3

e3

-5

e4

x4

- Now, each variable has an additive noise term with non-zero variance.
- x1 = e1x2 = e2 x3 = 1.2 x1 + 0.9 x2 – 3 + e3x4 = -5 x3 + 1 + e4
- x = B x + e

e2

e1

x1

x2

1.2

0.9

x3

e3

-5

e4

x4

- x = B x + e
- Solving for x, we get:x = (I – B)-1e
- Let A = (I – B)-1then x = A e
- A is called the “mixing matrix”.

- The “mixing matrix” shows how the noise propagates:

e2

e1

x1

x2

1.2

0.9

x3

e3

-5

e4

x4

e2

e1

x1

x2

0.9

1.2

1

1

1

1

e3

x3

-5

e4

x4

- The “mixing matrix” shows how the noise propagates:
- Done.

Let’s make it:

e1

e2

e3

e4

-6

-4.5

0.9

1.2

-5

x1

x2

x3

x4

x1

x1

x2

x2

- Until recently, the best we could do was identify the d-separation equivalence class
- We couldn’t tell the difference between:

M1:

M2:

x1

x1

x2

x2

- Because it was assumed that the error terms are Gaussian
- …and when they are Gaussian, these two graphs are distribution-equivalent

M1:

M2:

e1

e2

x1

x2

- Cocktail party problem
- You want to get back the original signals, but all you have are the mixtures. What can you do?

x = A e

e1

e2

x1

x2

- Cocktail party problem
- This equation has infinitely many solutions! For any invertible A, there is a solution!
- But if you assume that the signals are independent, it is possible to estimate A and e from just x.
- How?

x = A e

e1

e2

x1

x2

- Cocktail party problem
- Any choice of A implies a list of samples of e
- Each list of implied samples of e has a degree of independence
- We want the A for which the implied e’s are maximally independent
- e’s maximally independent ↔ e’s maximally non-Gaussian
- Intuition: Central Limit Theorem

x = A e

- We don’t know which source signal is which, i.e. which is Alex and which is Bob
- Scaling: when used with SEMs, the variance of each error term is confounded with its coefficients on each x.

e1

x1

1.1

1.5

e2

x2

x3

e3

-2

e4

x4

- What happens if we generate data from this linear SEM
- … and then run ICA?

1

1

1

1

e1

e2

e3

e4

-2

1.1

1.5

-3

x1

x2

x3

x4

- We would expect to see:
- Except that ICA doesn’t know the scaling

- So we should expect to see something like:
- …and we’d need to normalize by dividing all children of e1 by 2

e1

e2

e3

e4

-2

2

2.2

3

-6

x1

x2

x3

x4

e1

e2

e3

e4

x1

x2

x3

x4

- getting us:
- Except that ICA doesn’t know the order of the e’s, i.e. which e’s go with which x’s…

-2

1

1.1

1.5

-3

e…

e…

e…

e…

e1

e1

e2

e2

e3

e3

e4

e4

-2

-2

-2

2

2.2

1

2

2.2

1.1

3

-6

1.5

3

-3

-6

x1

x2

x3

x4

x1

x1

x2

x2

x3

x3

x4

x4

- really, ICA gives us something like:
- So first we need to find the right permutation of the e’s
- And then do the scaling
- Note that, since the model is a DAG, there is exactly one valid way to permute the error terms.

e1

x1

1.1

1.5

e2

x2

x3

e3

-2

e4

x4

- After some matrix magic, we get back:

B = I – A-1

M1:

M2:

x1

x1

x2

x2

- Discovers the full structure of the DAG
- … by assuming causal sufficiency (i.e. independence of the error terms)
- “causal sufficiency”: no latent variable is a cause of more than one observed variable
- linear case, causal sufficiency ↔ independence of the error terms

- In particular, now M1 and M2 can be distinguished!

Gaussian Uniform

Images by Patrik Hoyer et al, used with permissionfrom “Estimation of causal effects using linear non-Gaussian causal models with hidden variables”

e1

e2

e3

e4

x1

x2

x3

x4

- Note that, once the valid permutation was found, there were no left-pointing arrows. This is because:
- the generating model was a DAG.
- we wrote down the x’s in an order compatible with it

- But it is possible for ICA to return a matrix that does not satisfy the acyclicity assumption
- LiNGAM will pretend the red edge is not there

- LiNGAM cannot discover cyclic models…
- because:
- since it assumes the data was generated by a DAG,
- it searches for a single valid permutation

- If we search for any number of valid permutations…
- then we can discover cyclic models too.
- That’s exactly what we did!

- When the data looks acyclic, it works just like LiNGAM, and returns a single model.
- When the data looks cyclic, more than one permutation is considered valid. Thus, it returns a distribution-equivalent set containing more than one model.
- “distribution-equivalent” means you can’t do better, at least without experimental data or further assumptions.

e1

x1

e4

x4

1.2

-0.3

e2

x2

-1

2

e3

x3

3

e5

x5

- Let’s simulate usingthis model:
- Error terms are generatedby sampling from aGaussian and squaring
- 15000 data points
- We test which ICAcoefficients are zeroby using bootstrapsampling followed bya quantile test
- Ready?

LiNG-DG returns a set with 2 models:

#1

#2

- Note that only one of these models is stable.
- If our data is a set of equilibria, then the true model must be stable.
- Under what conditions are we guaranteed to have a unique stable model?

- Theorem: if the true model’s cycles don’t intersect, then only one model is stable.
- For simple cycle models, cycle-products are inverted: c1 = 1/c2.
- So at least one cycle will be > 1 (in modulus) and thus unstable.
- each cycle works independently, and any valid permutation* will invert at least one cycle, creating an unstable model.

*except for the identity permutation

Check out

Hoyer, Hyvärinen, Glymour, Spirtes, Scheines,Ramsey,

Lacerda, Shimizu

(submitted)

Constraint-based methods

e.g. PC, CPC, SGS

(or Geiger and Heckerman 1994 for a Bayesian alternative)

LiNGAM

unique model

d-separation equivalence class

LiNG-DG

2 cases

Richardson’s CCD

?

very large class: not even

covariance equivalent

Please send me your comments:gusl@cs.cmu.edu

- Equilibrium equations usually correspond with the dynamical equations.
- EXCEPT if a self-loop has coefficient 1, we will get the wrong structure, and the predicted results of intervention will be wrong!
- self-loop coefficients are underdetermined.
- Our stability results only hold if we assume no self-loops.

- Testing zeros: local vs non-local methods
- To estimate the variance of the estimated coefficients, we use bootstrap sampling, carefully.
- How to find row-permutations of W that have a zeroless diagonal:
- Acyclic: Hungarian algorithm
- General: k-best linear assignments, or constrained n-Rooks (put rooks on the non-zero entries)