Learning Equivalence Classes of Bayesian-Network Structures - PowerPoint PPT Presentation

learning equivalence classes of bayesian network structures n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Learning Equivalence Classes of Bayesian-Network Structures PowerPoint Presentation
Download Presentation
Learning Equivalence Classes of Bayesian-Network Structures

play fullscreen
1 / 29
Learning Equivalence Classes of Bayesian-Network Structures
90 Views
Download Presentation
berg
Download Presentation

Learning Equivalence Classes of Bayesian-Network Structures

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Learning Equivalence Classes of Bayesian-Network Structures David M. Chickering Presented by Dmitry Zinenko

  2. Heuristic Search • We are looking for the best state in the search space. Naïvely: • state = a particular DAG • search space = all possible DAGs over our variables • Move between related states using search operators. Naively: • Egde addition/removal/inversion

  3. Heuristic Search Challenges • Search space graph should be well-connected • To reach good states quickly • To avoid local maxima • Search space graph should not be too dense • Computationally efficient scoring and transformations

  4. Equivalence • G1 and G2 are equivalent if the set of distributions that can be represented by them is identical • Equivalence is an equivalence relationship! X Y X Y P X Y

  5. Score Equivalence • If all we care about is the probability distribution, all we need is the equivalence class • The scoring function should give equal scores to structures from the same class • Called score equivalent • Why prefer one representation of the class to another?

  6. Equivalence Classes Are Good For You • We are ultimately looking for a probability representation, not a particular DAG • Searching individual DAGs is bad: • Some operators lead to the same class • Efficiency • Bad state connectivity for greedy

  7. Theorem 1 (Verma & Pearl 1990) • Two DAGs are equivalent if and only if they have the same skeletons and the same v-structures X Z X Z Y Y X Z X Z Y Y

  8. Partially Directed Acyclic Graph • A directed edge is called compelled in G, if for every G’ equivalent to G, that edge has the same direction • Otherwise we call it reversible • Partially Directed Acyclic Graph (PDAG) • Contains both directed and undirected edges • Does not contain any directed circles • Theorem 1 extends naturally to PDAGs • A DAG is also a PDAG

  9. X Y Z X Y Z X Y Z W CPDAG and Consistent Extension • Completed PDAG for Class(G) contains • directed edges for the compelled edges of G • undirected edges for the reversible edges of G • G is consistent extension of P if • G has the same skeleton and v-structures • Every directed edge in P has the same orientation in G

  10. CPDAGs And Equivalence • Every consistent extension of P is in Class(P) • If Pc is a completed PDAG, then every PDAG G in Class(Pc) is a consistent extension of Pc • If P1 and P2 are completed PDAGs that admit consistent extension, then P1=P2 if and only if Class(P1)=Class(P2) • A completed PDAG uniquely represents its equivalence class

  11. DAG to CPDAG (Meek 1995) • Undirect all edges except those that are in the v-structures • Direct (mark as compelled) undirected edges that match particular patterns X Z X Z X Z W Y Y Y

  12. Constructing Consistent Extension (I) • “Theorem 26”: The undirected components of a CPDAG are chordal • In any cycle of length >3 in a DAG, there must be a v-structure! Let {Ki} be the set of undirected components of a completed PDAG Pc. Let {Gi} be consistent extensions of {Ki} A graph G that results from replacing each reversible edge in Kiwith the directed edge from corresponding Gi is a consistent extension of Pc

  13. Constructing Consistent Extension (II) • Use decreasing maximum cardinality search to direct edges in each one of the chordal components • Property of dMCS: Every path between any pair of non-adjacent x, y contains a node numbered higher than x or y • Resulting graph is a consistent extension of Pc • Works only on completed PDAGs

  14. PDAG-to-DAG (Dor & Tarsi 1992) • Select a node x in P s.t. • x has no outgoing edges • Vertices adjacent to x form a clique • Direct all edges (x―y) toward x • x becomes a sink • Remove x from P • Works only on any PDAG

  15. Applying the Operators

  16. Operators • The set of operators should: • Ensure global connectivity (completeness) and good connectivity in general • Be easy to check for applicability (validity) • Avoid redundancy • Allow for efficient scoring • Local scoring– local changes in G cause “local” changes in score(G)

  17. Score Decomposability • A scoring function S is decomposable if it is a product (or sum) of factors s, each depending only on one node and its parents • For example: Z X Y Z X Y

  18. Used Operators

  19. Operator Scoring • Chickering 1996a • Apply the operator and score the consistent extension (DAG) • Drawbacks: • Need to apply PDAG-to-DAG for every operator • Local operators may cause non-local changes when applied to CPDAG • Cannot benefit from local scoring

  20. Local Operator Scoring

  21. InsertU Operator – “Theorem 34” • Let Pc be any completed PDAG for which nodes x and y are not adjacent. • If after adding an edge between x and yPc admits a consistent extension, then • The edge x―y is reversible if and only if x and y have exactly the same parents in the original PDAG

  22. InsertU Operator – “Theorem 6” • The insertion of the undirected edge x―y in a CPDAG Pc is valid if and only if: • x and y have the same parents in Pc • every undirected path between x and y contains at least one of their common neighbors • Only if (+Theorem 34): • Take the shortest undirected path from x to y in Pc that does not include any common neighbor of x and y • Length at least 3 and has no chord • After adding x―y becomes a cycle of length 4

  23. InsertU Operator – “Lemma 32” • Let Pc be any completed PDAG, and let x and y be any pair of nodes that are not adjacent. • There exists a consistent extension of Pc in which • all the reversible edges adjacent to x are directed away from x • all the reversible edges between y and the common neighbors of x and y are directed toward y • all the other reversible edges adjacent to y are directed away from y • If and only if every undirected path between x and y passes through a common neighbor of x and y

  24. InsertU Operator – Theorem 6“If” proof outline • Use consistent extension from Lemma 32 as G • Add a directed edge x→y to G to get G’ (the other direction is symmetric) • Show that G’ is a consistent extension of P’ (P with the addition of the undirected edge x―y) • G’ is acyclic • Same skeleton • Same v-structures

  25. InsertU Operator – Theorem 6G’ is a DAG • Assume by contradiction that there is a directed path from y to x in G • All the reversible edges are directed away from x, so the last edge in that path w→x is compelled • Then w is a parent of x in P, and it must also be a parent of y • In G there is a cycle y→w→y W X Y

  26. InsertU Operator – “Lemma 24” • Let Pc be a completed PDAG, and let P’ denote a PDAG that results from adding a single edge between x and y to Pc • Consider any consistent extension G of Pc, and G’ that results by inserting a directed edge between x and y in G • Then any v-structure in G’ but not in P’, or any v-structure in P’ but not in G’ must include the edge between x and y

  27. InsertU Operator – Theorem 6G’ is a consistent extension of P’ • By Lemma 24, any v-structure different between G’ and P’ must include the edge x―y • The v-structure must be in G’, because in P’ this edge is undirected • The other edge in the v-structure cannot be reversible in G’ • x does not have reversible parents • y’s reversible parents are adjacent to x • But any compelled parent of x or y is a parent of both Q.E.D

  28. Local Operator Evaluation • Since the only difference between G and G’ is the edge x→y, we can use score decomposability to compute the score of P’ in O(1) time • s(P’) = s(Pc)+s(y,Nx,y{x}y)-s(y,Nx,yy) • In general we do not need to transform the CPDAG to compute neighbor scores: • Calculate scores for all the neighbor states (locally!) • Check operator validity (efficiently!) starting from the highest score