1 / 37

Structure Learning Using Causation Rules

Structure Learning Using Causation Rules. Raanan Yehezkel PAML Lab. Journal Club. March 13, 2003. Main References. Pearl, J., Verma, T., A Theory of Inferred Causation , Proceedings of the Second International Conference of Representation and Reasoning, San Francisco. 1991.

nellis
Download Presentation

Structure Learning Using Causation Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structure Learning Using Causation Rules Raanan Yehezkel PAML Lab. Journal Club March 13, 2003

  2. Main References • Pearl, J., Verma, T., A Theory of Inferred Causation, Proceedings of the Second International Conference of Representation and Reasoning, San Francisco. 1991. • Spirtes, P., Glymour, C., Scheines, R., Causation Prediction and Search, second edition, 2000, MIT Press.

  3. Simpson’s “Paradox” The sure thing principle (Savage, 1954) Let a, b be two alternative acts of any sort, and let G be any event. If you would definitely prefer b to a, either knowing that the event G obtained, or knowing that the event G did not obtain, then you definitely prefer b to a. Taken from Judea Pearl web-site

  4. Simpson’s “Paradox” Local Success Rate G = male patientsG’ = female patients Old 5% (50/1000) 50% (5000/10000) New10% (1000/10000) 92% (95/100) Global Success Rate all patients Old46% (5050/11000) New 11% (1095/10100) New treatment is preferred for male group (G). New treatment is preferred for female group (G’). => New treatment is preferred. Taken from Judea Pearl web-site

  5. G T S Simpson’s “Paradox” • Intuitive way of thinking: P(S,G,T)=P(G)  P(T) · P(S|G,T) P(S=1 | G,T=new) = 0.51 P(S=1 | G,T=old) = 0.27

  6. G T S Simpson’s “Paradox” • The faithful DAG: P(S,G,T)=P(G) · P(T | G) · P(S | G,T) P(S=1 | G,T=new) = 0.11 P(S=1 | G,T=old) = 0.46

  7. Assumptions: • Directed Acyclic Graph, Bayesian Networks. • All variables are observable. • No errors in Conditional Independence test results.

  8. Identifying cause and effect relations • Statistical data. • Statistical data and temporal information.

  9. Identifying cause and effect relations • Potential Cause • Genuine Cause • Spurious Association

  10. C1 C2 E C1 C2 C1 C2 H1 H2 H1 H2 E E Intransitive Triplet • I(C1,C2) • ~I(C1,E) • ~I(C2,E)

  11. Y Potential Cause X has a potential causal influence on Y if: • X and Y are dependent in every context. • ~I(Z,Y|Scontext) • I(X,Z|Scontext) X Z

  12. Genuine Cause Given context S X has a genuine causal influence on Y if: • Z is a potential cause of X. • ~I(Z,Y|Scontext) • I(Z,Y|X,Scontext) Z X Potential Y Given X and context S Z X Potential Y

  13. Spurious Association From conditions 1,2,4 X and Y are spuriously associated if: • ~I(X,Y| Scontext) • ~I(Z1,X|Scontext) • ~I(Z2,Y|Scontext) • I(Z1,Y|Scontext) • I(Z2,X|Scontext) Z1 Y X From conditions 1,3,5 Z2 Y X

  14. Z Z Genuine Cause with temporal information Given context S X has a genuine causal influence on Y if: • Z and Scontext precedes X. • ~I(Z,Y|Scontext) • I(Z,Y|X,Scontext) Y Given X and context S X Y

  15. Y X X Y Z Spurious Associationwith temporal information From conditions 1,2 X and Y are spuriously associated if: • ~I(X,Y|S) • X precedes Y. • I(Z,Y|Scontext) • ~I(Z,X|Scontext) From conditions 1,3,4

  16. Algorithms • Inductive Causation (IC). • PC. • Other.

  17. Inductive Causation (IC) • For each pair (X,Y) find the set of nodes SXY such that I(X,Y|SXY). If SXY is empty, place an undirected link between X and Y. • For each pair (X,Y) find the set of nodes SXY such that I(X,Y|SXY). If SXY is empty, place an undirected link between X and Y. • For each pair of non-adjacent nodes (X,Y) with a common neighbor C, if C is not in SXY then add arrowheads to C: X C Y. • For each pair of non-adjacent nodes (X,Y) with a common neighbor C, if C is not in SXY then add arrowheads to C: X C Y. Pearl and Verma, 1991

  18. Inductive Causation (IC) • Recursively: • 1. If X-Y and there is a strictly directed path from X to Y then add an arrowhead at Y. • 2. If X and Y aren’t adjacent but XC and there is Y-C then direct the link CY. • Recursively: 1. If X-Y and there is a strictly directed path from X to Y then add an arrowhead at Y. 2. If X and Y aren’t adjacent but XC and there is Y-C then direct the link CY. • Mark uni-directed links XY if there is some link with an arrow head at X. • Mark uni-directed links XY if there is some link with an arrow head at X. Pearl and Verma, 1991

  19. X1 X2 X3 X4 X5 Example (IC) True graph

  20. Example (IC) For each pair (X,Y) find the set of nodes SXY such that I(X,Y|SXY). If SXY is empty, place an undirected link between X and Y. X1 X2 X3 X4 X5

  21. Example (IC) For each pair of non-adjacent nodes (X,Y) with a common neighbor C, if C is not in SXY then add arrowheads to C: X C Y X1 X2 X3 X4 X5

  22. Example (IC) Recursively: 1. If X-Y and there is a strictly directed path from X to Y then add an arrowhead at Y. 2. If X and Y aren’t adjacent but XC and there is Y-C then direct the link CY. X1 X2 X3 X4 X5

  23. Example (IC) Mark uni-directed links XY if there is some link with an arrow head at X. X1 X2 X3 X4 X5

  24. PC • Form a complete undirected graph C on vertex set V. • Form a complete undirected graph C on vertex set V. Spirtes and Glymour, 1993

  25. PC • n = 0; • Repeat • Repeat • • Select an ordered pair X and Y such that: • |Adj(C,X)\{Y}|  n, and a subset S such that: • S  Adj(C,X)\{Y}, |S| = n • • if: I(X,Y|S) = true, then delete edge(X,Y) • Until all possible sets were tested. n = n + 1. • Until:  X,Y, |Adj(C,X)\{Y}| < n. • n = 0; • Repeat • Repeat • • Select an ordered pair X and Y such that: • |Adj(C,X)\{Y}|  n, and a subset S such that: • S  Adj(C,X)\{Y}, |S| = n • • if: I(X,Y|S) = true, then delete edge(X,Y) • Until all possible sets were tested. n = n + 1. • Until:  X,Y, |Adj(C,X)\{Y}| < n. Spirtes and Glymour, 1993

  26. PC • For each triple of vertices X, Y, Z, • such that edge(X,Z) and edge(Y,Z), • orient X  Z  Y, if and only if: • Z  SXY • For each triple of vertices X, Y, Z, • such that edge(X,Z) and edge(Y,Z), • orient X  Z  Y, if and only if: • Z  SXY Spirtes and Glymour, 1993

  27. Use Inductive Causation (IC) Recursively: 1. If X-Y and there is a strictly directed path from X to Y then add an arrowhead at Y. 2. If X and Y aren’t adjacent but XC and there is Y-C then direct the link CY. Mark uni-directed links XY if there is some link with an arrow head at X. Pearl and Verma, 1991

  28. X1 X3 X4 X2 X5 Example (PC) True graph Spirtes, Glymour and Scheines. 2000.

  29. X1 X3 X4 X2 X5 Example (PC) Form a complete undirected graph C on vertex set V.

  30. X1 X3 X4 X2 X5 Example (PC) n = 0; |SXY| = n Independencies: None

  31. X1 X3 X4 X2 X5 Example (PC) n = 1; |SXY| = n Independencies: I(X1,X3|X2) I(X1,X4|X2) I(X1,X5|X2) I(X3,X4|X2)

  32. X1 X3 X4 X2 X5 Example (PC) n = 2; |SXY| = n Independencies: I(X2,X5|X3,X4)

  33. S3,4={X2} S1,3 = {X2} Example (PC) For each triple of vertices X, Y, Z, such that edge(X,Z) and edge(Y,Z), orient X  Z  Y, if and only if: Z  SXY X3 X1 X2 X5 X4 D-Separation set:

  34. Possible PC improvements(2) • PC* - tests conditional independence between X,Y given a subset S, where • S { [(Adj(X)  Adj(Y)]  path(X,Y) } • CI test prioritization according to: • for a given variable X, first test those variables Y that are least dependent on X, conditional on those subsets of variables that are most dependent on X. • PC* - tests conditional independence between X,Y given a subset S, where • S { [(Adj(X)  Adj(Y)]  path(X,Y) } • CI test prioritization according to: • for a given variable X, first test those variables Y that are least dependent on X, conditional on those subsets of variables that are most dependent on X.

  35. Y Y Y X X X Z Z Z P=P(X)·P(Y)·P(Z|X,Y) P=P(Z)·P(X|Z)·P(Y|Z) = P(Y)·P(X|Z)·P(Z|Y) Markov Equivalence • (Verma and Pearl, 1990). Two casual models are equivalent if and only if their dags have the same links and same set of uncoupled head-to-head nodes (colliders).

  36. Summery • Algorithms such as PC and IC produce a partially directed graphs, which represent a family of Markov equivalent graphs. • The remaining undirected arcs can be oriented arbitrarily (under DAG restrictions), in order to construct a classifier. • The main flaw of the IC and PC algorithms, is that they might be unstable in a noisy environment. An error in one CI test for an arc, might lead to an error in other arcs. And one erroneous orientation might lead to other erroneous orientations. • Algorithms such as PC and IC produce a partially directed graphs, which represent a family of Markov equivalent graphs. • The remaining undirected arcs can be oriented arbitrarily (under DAG restrictions), in order to construct a classifier. • The main flaw of the IC and PC algorithms, is that they might be unstable in a noisy environment. An error in one CI test for an arc, might lead to an error in other arcs. And one erroneous orientation might lead to other erroneous orientations.

More Related