1 / 57

Knowledge Representation & Reasoning Lecture #5

Knowledge Representation & Reasoning Lecture #5. UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005. (Based on slides by Lise Getoor and Alvaro Cardenas (UMD) (in turn based on slides by Nir Friedman (Hebrew U))). So Far and Today. Probabilistic graphical models

rcapobianco
Download Presentation

Knowledge Representation & Reasoning Lecture #5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowledge Representation & ReasoningLecture #5 UIUC CS 498: Section EAProfessor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro Cardenas (UMD) (in turn based on slides by Nir Friedman (Hebrew U)))

  2. So Far and Today • Probabilistic graphical models • Bayes Networks (Directed GMs) • Markov Fields (Undirected GMs) • Treewidth methods: • Variable elimination • Clique tree algorithm • Applications du jour: Sensor Networks

  3. Y1 Y2 X Non-descendent Markov Assumption Ancestor Parent • We now make this independence assumption more precise for directed acyclic graphs (DAGs) • Each random variable X, is independent of its non-descendents, given its parents Pa(X) • Formally,I (X, NonDesc(X) | Pa(X)) Non-descendent Descendent

  4. Burglary Earthquake Radio Alarm Call Markov Assumption Example • In this example: • I ( E, B ) • I ( B, {E, R} ) • I ( R, {A, B, C} | E ) • I ( A, R | B,E ) • I ( C, {B, E, R} | A)

  5. X Y X Y I-Maps • A DAG G is an I-Map of a distribution P if all Markov assumptions implied by G are satisfied by P (Assuming G and P both use the same set of random variables) Examples:

  6. Proof: • By chain rule: • wlog. X1,…,Xpis an ordering consistent with G • Hence, Factorization Theorem Thm: if G is an I-Map of P, then From assumption: • Since G is an I-Map, I (Xi, NonDesc(Xi)| Pa(Xi)) • We conclude, P(Xi | X1,…,Xi-1) = P(Xi | Pa(Xi) )

  7. Burglary Earthquake Radio Alarm Call Factorization Example P(C,A,R,E,B) = P(B)P(E|B)P(R|E,B)P(A|R,B,E)P(C|A,R,B,E) versus P(C,A,R,E,B) = P(B) P(E) P(R|E) P(A|B,E) P(C|A)

  8. Consequences • We can write P in terms of “local” conditional probabilities If G is sparse, • that is, |Pa(Xi)| < k ,  each conditional probability can be specified compactly • e.g. for binary variables, these require O(2k) params. representation of P is compact • linear in number of variables

  9. Summary We defined the following concepts • The Markov Independences of a DAG G • I (Xi , NonDesc(Xi) | Pai ) • G is an I-Map of a distribution P • If P satisfies the Markov independencies implied by G We proved the factorization theorem • if G is an I-Map of P, then

  10. Conditional Independencies • Let Markov(G) be the set of Markov Independencies implied by G • The factorization theorem shows G is an I-Map of P  • We can also show the opposite: Thm:  Gis an I-Map of P

  11. Proof (Outline) X Z Example: Y

  12. Implied Independencies • Does a graph G imply additional independencies as a consequence of Markov(G)? • We can define a logic of independence statements • Some axioms: • I( X ; Y | Z )  I( Y; X | Z ) • I( X ; Y1, Y2 | Z )  I( X; Y1 | Z )

  13. d-separation • A procedure d-sep(X; Y | Z, G) that given a DAG G, and sets X, Y, and Z returns either yes or no • Goal: d-sep(X; Y | Z, G) = yes iff I(X;Y|Z) follows from Markov(G)

  14. Burglary Earthquake Radio Alarm Call Paths • Intuition: dependency must “flow” along paths in the graph • A path is a sequence of neighboring variables Examples: • R  E  A  B • C A E  R

  15. Paths • We want to know when a path is • active -- creates dependency between end nodes • blocked -- cannot create dependency end nodes • We want to classify situations in which paths are active.

  16. E E Blocked Blocked Unblocked Active R R A A Path Blockage Three cases: • Common cause

  17. Blocked Blocked Unblocked Active E E A A C C Path Blockage Three cases: • Common cause • Intermediate cause

  18. Blocked Blocked Unblocked Active E E E B B B A A A C C C Path Blockage Three cases: • Common cause • Intermediate cause • Common Effect

  19. Path Blockage -- General Case A path is active, given evidence Z, if • Whenever we have the configurationB or one of its descendents are in Z • No other nodes in the path are in Z A path is blocked, given evidence Z, if it is not active. A C B

  20. d-sep(R,B)? Example E B R A C

  21. d-sep(R,B) = yes d-sep(R,B|A)? Example E B R A C

  22. d-sep(R,B) = yes d-sep(R,B|A) = no d-sep(R,B|E,A)? Example E B R A C

  23. d-Separation • X is d-separated from Y, given Z, if all paths from a node in X to a node in Y are blocked, given Z. • Checking d-separation can be done efficiently (linear time in number of edges) • Bottom-up phase: Mark all nodes whose descendents are in Z • X to Y phase:Traverse (BFS) all edges on paths from X to Y and check if they are blocked

  24. Soundness Thm: If • G is an I-Map of P • d-sep( X; Y | Z, G ) = yes • then • P satisfies I( X; Y | Z ) Informally: Any independence reported by d-separation is satisfied by underlying distribution

  25. Completeness Thm: If d-sep( X; Y | Z, G ) = no • then there is a distribution P such that • G is an I-Map of P • P does not satisfy I( X; Y | Z ) Informally: Any independence not reported by d-separation might be violated by the underlying distribution • We cannot determine this by examining the graph structure alone

  26. Summary: Structure • We explored DAGs as a representation of conditional independencies: • Markov independencies of a DAG • Tight correspondence between Markov(G) and the factorization defined by G • d-separation, a sound & complete procedure for computing the consequences of the independencies • Notion of minimal I-Map • P-Maps • This theory is the basis for defining Bayesian networks

  27. Complexity of variable elimination • Suppose in one elimination step we compute This requires • multiplications • For each value for x, y1, …, yk, we do m multiplications • additions • For each value of y1, …, yk , we do |Val(X)| additions Complexity is exponential in number of variables in the intermediate factor

  28. Undirected graph representation • At each stage of the procedure, we have an algebraic term that we need to evaluate • In general this term is of the form:where Zi are sets of variables • We now plot a graph where there is undirected edge X--Y if X,Y are arguments of some factor • that is, if X,Y are in some Zi • Note: this is the Markov network that describes the probability on the variables we did not eliminate yet

  29. S V L T B A S V X D L T B A X D Chordal Graphs • elimination ordering  undirected chordal graph Graph: • Maximal cliques are factors in elimination • Factors in elimination are cliques in the graph • Complexity is exponential in size of the largest clique in graph

  30. Induced Width • The size of the largest clique in the induced graph is thus an indicator for the complexity of variable elimination • This quantity is called the induced width of a graph according to the specified ordering • Finding a good ordering for a graph is equivalent to finding the minimal induced width of the graph

  31. A H C B E D F G PolyTrees • A polytree is a network where there is at most one path from one variable to another Thm: • Inference in a polytree is linear in the representation size of the network • This assumes tabular CPT representation

  32. Today • Probabilistic graphical models • Treewidth methods: • Variable elimination • Clique tree algorithm • Applications du jour: Sensor Networks

  33. Junction Tree • Why junction tree? • More efficient for some tasks than variable elimination • We can avoid cycles if we turn highly-interconnected subsets of the nodes into “supernodes”  cluster • Objective • Compute • is a value of a variable and is evidence for a set of variable

  34. ABD ADE DEF AD DE Cluster ABD SepsetDE Properties of Junction Tree • An undirected tree • Each node is a cluster (nonempty set) of variables • Running intersection property: • Given two clusters and , all clusters on the path between and contain • Separator sets (sepsets): • Intersection of the adjacent cluster

  35. Potentials • Potentials: • Denoted by • Marginalization • , the marginalization of into X • Multiplication • , the multiplication of and

  36. Properties of Junction Tree • Belief potentials: • Map each instantiation of clusters or sepsets into a real number • Constraints: • Consistency: for each cluster and neighboring sepset • The joint distribution

  37. Properties of Junction Tree • If a junction tree satisfies the properties, it follows that: • For each cluster (or sepset) , • The probability distribution of any variable , using any cluster (or sepset) that contains

  38. Moral Graph Triangulated Graph Junction Tree Identifying Cliques Building Junction Trees DAG

  39. A B C G D E H F Constructing the Moral Graph

  40. A B C G D E H F Constructing The Moral Graph • Add undirected edges to all co-parents which are not currently joined –Marrying parents

  41. A B C G D E H F Constructing The Moral Graph • Add undirected edges to all co-parents which are not currently joined –Marrying parents • Drop the directions of the arcs

  42. A B C G D E H F Triangulating • An undirected graph is triangulated iff every cycle of length >3 contains an edge to connects two nonadjacent nodes

  43. EGH CEG A B C G DEF ACE D E H ABD ADE F Identifying Cliques • A clique is a subgraph of an undirected graph that is complete and maximal

  44. EGH CEG ABD ACE CEG ADE AD AE CE DEF ACE DE EG ABD ADE DEF EGH Junction Tree • A junction tree is a subgraph of the clique graph that • is a tree • contains all the cliques • satisfies the running intersection property

  45. DAG Junction Tree Initialization Inconsistent Junction Tree Propagation Consistent Junction Tree Marginalization Principle of Inference

  46. X1 X2 Y1 Y2 X1,Y1 X2,Y2 X1,X2 X1 X2 Example: Create Join Tree HMM with 2 time steps: Junction Tree:

  47. X1,Y1 X2,Y2 X1,X2 X1 X2 Example: Initialization

  48. Example: Collect Evidence • Choose arbitrary clique, e.g. X1,X2, where all potential functions will be collected. • Call recursively neighboring cliques for messages: • 1. Call X1,Y1. • 1. Projection: • 2. Absorption:

  49. X1,Y1 X2,Y2 X1,X2 X1 X2 Example: Collect Evidence (cont.) • 2. Call X2,Y2: • 1. Projection: • 2. Absorption:

  50. Example: Distribute Evidence • Pass messages recursively to neighboring nodes • Pass message from X1,X2 to X1,Y1: • 1. Projection: • 2. Absorption:

More Related