- By
**edda** - Follow User

- 76 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Course files' - edda

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Course files

http://www.andrew.cmu.edu/~ddanks/NASSLLI/

Fundamental problem

- Why is this slogan correct?
- Causalhypotheses make implicit claims about the effects of intervening (manipulating) one or more variables
- Hypotheses about association or correlation make no such claims
- Correlation or probabilistic dependence can be produced in many ways

Fundamental problem

- Some of the possible reasons why X and Y might be associated are:
- Sheer chance
- X causes Y
- Y causes X
- Some third variable Z influences X and Y
- The value of X (or a cause of X) and the value of Y (or a cause of Y) can be causes/reasons for whether an individual is in the sample (sample selection bias)

Fundamental problem

- Fundamental problem of causal search:
- For any particular set of data,there are often many different causal structures that could have produced that data
- Causation → Association map is many → one

Fundamental problem

- Okay, so what can we do about this?
- Use the data to figure out as much as possible (though it usually won’t be everything)
- Requires developing search procedures
- And then try to narrow the possibilities
- Use other knowledge (e.g., time order, interventions)
- Get better / different data (e.g., run an experiment)

Always remember…

Even if we cannot discoverthe whole truth,

we might be able to find some of the truth!

Markov equivalence

- Formally, we say that:
- Two causal graphs are members of the same Markov Equivalence Class iff they imply the exact same (un)conditional independence relations among the observed variables
- By the Markov and Faithfulness assumptions
- Remember that d-separation gives a purely graphical criterion for determining all of the (un)conditional independencies

Markov equivalence

- The “Fundamental Problem of Causal Inference”can be restated as:
- For some sets of independence relations, the Markov equivalence class is not a singleton
- Markov equivalence classes give a precise characterization of what can be inferred from independencies alone

Y

Z

X

Y

Z

X

Y

Z

X

Y

Z

Markov equivalence- Two more examples:
- Are these graphs Markov equivalent?
- Are these two graphs?

Shared structure

- What is shared by all of the graphs in a Markov equivalence class?
- Same “skeleton”
- I.e., they all have the same adjacency relations
- Same “unshielded colliders”
- I.e., X→ Y ← Z with no edge between X and Z
- Sometimes, other edges have same direction
- In these last two cases, we can infer that the true graph contains the shared directed edges.

Shared structure as patterns

- Since every Markov equivalent graph has the same adjacencies, we can represent the whole class using a pattern
- A pattern is itself a graph, but the edges represent edges in other graphs

Shared structure as patterns

- A pattern can have directed and undirected edges
- It represents all graphs that can be created by adding arrowheads to the undirected edges without creating either: (i) a cycle; or (ii) an unshielded collider
- Let’s try some examples…

Shared structure as patterns

Nitrogen — PlantGrowth — Bees

Nitrogen→PlantGrowth →Bees

Nitrogen←PlantGrowth →Bees

Nitrogen←PlantGrowth ←Bees

Formal problem of search

- Given some dataset D, find:
- Markov equivalence class, represented as a pattern P, that predicts exactly the independence relations found in the data
- More colloquially, find the causal graphs that could have produced data like this

Hard to find a pattern

- “Gee, how hard could this be? Just test all of the associations, find the Markov equivalence class, then write down the pattern for it. Voila! We’re doing causal learning!”
- Big problem: # of independencies to test is super-exponential in # of variables:
- 2 variables ⇒ 1 test 5 variables ⇒ 80 tests
- 3 variables ⇒ 6 tests 6 variables ⇒ 240 tests
- 4 variables ⇒ 24 tests and so on…

General features of causal search

- Huge model and parameter spaces
- Even when we (necessarily) use prior information about the family of probability distributions.
- Relevant statistics must be rapidly computed
- But substantive knowledge about the domain may restrict the space of alternative models
- Time order of variables
- Required cause/effect relationships
- Existence or non-existence of latent variables

Three schemata for search

- Bayesian / score-based
- Find the graph(s) with highest P(graph | data)
- Constraint-based
- Find the graph(s) that predict exactly the observed associations and independencies
- Combined
- Get “close” with constraint-based, and then find the best graph using score-based

Bayesian / score-based

- Informally:
- Give each model an initial score using “prior beliefs”
- Update each score based on the likelihood of the data if the model were true
- Output the highest-scoring model
- Formally:
- Specify P(M, v) for all models M and possible parameter values v of M
- For any data D, P(D | M, v) can easily be calculated
- P(M | D) ∝⎰vP(D | M, v)P(M, v)

Bayesian / score-based

- In practice, this strategy is completely computationally intractable
- There are too many graphs to check them all
- So, we use a greedy search strategy
- Start with an initial graph
- Iteratively compare the current graph’s score (∝ posterior probability) with that of each 1- or 2-step modification of that graph
- By edge addition, deletion or reversal

Bayesian / score-based

- Problem #1: Local maxima
- Often, greedy searches get stuck
- Solution:
- Greedy search over Markov equivalence classes,rather than graphs (Meek)
- Has a proof of correctness and convergence (Chickering)
- But it gets to the right answer slowly

Bayesian / score-based

- Problem #2: Unobserved variables
- Huge number of graphs
- Huge number of different parameterizations
- No fast, general way to compute likelihoods from latent variable models
- Partial solution:
- Focus on a small, “plausible” set of models for which we can compute scores

Constraint-based

- Implementation of the earlier idea
- “Build” the Markov equivalence class that predicts the pattern of association actually found in the data
- Compatible with a variety of statistical techniques
- Note that we might have to introduce a latent variable to explain the pattern of statistics
- Important constraints on search:
- Minimize the number of statistical tests
- Minimize the size of the conditioning sets (Why?)

Constraint-based

- Algorithm step #1: Discover the adjacencies
- Create the complete graph with undirected edges
- Test all pairs X, Y for unconditional independence
- Remove X—Y edge if they are independent
- Test all adjacent X, Y for independence given single N
- Remove X—Y edge if they are independent
- Test adjacent pairs given two neighbors
- …

Constraint-based

- Algorithm step #2: (Try to) Orient edges
- “Unshielded triple”: X — C — Y, but X, Y not adjacent
- If X & Y independent given S containing C, then C must be a non-collider
- Since we have to condition on it to achieve d-separation
- If X & Y independent given Snot containing C, then C must be a collider
- Since the path is not active when not conditioning on C
- And then do further orientations to ensure acyclicity and nodes being non-colliders

Constraint-based example

- Variables are {X, Y, Z, W}
- Only independencies are:
- XY
- X W | Z
- Y W | Z

Y

Z

W

Constraint-based example- Step 2: For each pair of variables, remove the edge between them if they’re unconditionally independent

X Y⇒

Y

Z

W

Constraint-based example- Step 3: For each adjacent pair, remove the edge if they’re independent conditional on some variable adjacent to one of them

{X, Y} W | Z⇒

Y

Z

W

Constraint-based example- Step 4: Continue removing edges, checking independence conditional on 2 (or 3, or 4, or…) variables

Y

Z

W

Constraint-based example- Step 5: Orientation
- For X – Z – Y, since XY without conditioning on Z, then make Z a collider
- Since Z is a non-collider between X and W, though, we must orient Z – W away from Z

Constraint-based output

- Searches that allow for latent variables can also have edges of the form X o→Y
- This indicates one of three possibilities:
- X→Y
- At least one unobserved common cause of X and Y
- Both of these

Interventions to the rescue?

- Interventions helped us solve an earlier equivalence class problem
- Randomization meant that:Treatment-Effect association ⇒ T → E
- Interventions alter equivalence classes, but don’t make them all into singletons
- The fundamental problem of search remains

Search with interventions

- Search with interventions is the same as search with observations, except
- We adjust the graphs in the search space to account for the intervention
- For multiple experiments, we search for graphs in every output equivalence class
- More complicated than this in the real world due to sampling variation

Looking ahead…

- Have:
- Basic formal representation for causation
- Fundamental causal asymmetry (of intervention)
- Inference & reasoning methods
- Search & causal discovery principles
- Need:
- Search & causal discovery methods that work in the real world

Download Presentation

Connecting to Server..