1 / 43

Lecture 22: Inference in Graphical Models

Lecture 22: Inference in Graphical Models. Machine Learning CUNY Graduate Center. Today. Graphical Models Representing conditional dependence graphically Inference Junction Tree Algorithm. Undirected Graphical Models. A. D. C. B.

dermot
Download Presentation

Lecture 22: Inference in Graphical Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 22: Inference in Graphical Models Machine Learning CUNY Graduate Center

  2. Today • Graphical Models • Representing conditional dependence graphically • Inference • Junction Tree Algorithm

  3. Undirected Graphical Models A D C B • In an undirected graphical model, there is no trigger/response relationship. • Represent slightly different conditional independence relationships • Conditional independence determined by graph separability.

  4. Undirected Graphical Models Different relationships can be described with directed and undirected graphs. Cannot represent:

  5. Undirected Graphical Models Different relationships can be described with directed and undirected graphs.

  6. Probabilities in Undirected Graphs • Clique: a set of nodes such that there is an edge between every pair of nodes that are members of the set • We will defined the joint probability as a relationship between functions defined over cliques in the graphical model

  7. Probabilities in Undirected Graphs Guanrantees a sum of 1 • Potential Functions: positive functions over groups of connected variables (represented by maximal cliques of graphical model nodes) • Maximal cliques: if a clique of nodes A are not a proper subset of a clique B, then A is a maximal clique.

  8. Logical Inference NOT AND XOR • In logical inference, nodes are binary, edges represent gates. • AND, OR, XOR, NAND, NOR, NOT, etc. • Inference: given observed variables, predict others • Problems: uncertainty, conflicts, inconsistency

  9. Probabilistic Inference NOT AND XOR NOT • Rather than a logic network, use a BayesianNetwork • Probabilistic Inference: given observed variables, calculate marginals over others. • Logic networks are generalized by Bayesian Networks

  10. Probabilistic Inference NOT AND XOR NOT-ish • Rather than a logic network, use a BayesianNetwork • Probabilistic Inference: given observed variables, calculate marginals over others. • Logic networks are generalized by Bayesian Networks

  11. Inference in Graphical Models General Problem: Given a graphical model, for any subsets of observed and expected variables find Direct approach can be quite inefficient if there are many irrelevant variables

  12. Marginal Computation Graphical models provide efficient storage by decomposing p(x) into conditional probabilities and a simple MLE result. Now look for efficient calculation of marginals which will lead to efficient inference.

  13. Brute Force Marginal Calculation • First approach: have CPTs and graphical model. We can compute arbitrary joints. • Assume 6 variables

  14. Computation of Marginals • Pass messages (small tables) around the graph • The messages are small functions that propagate potentials around and undirected graphical model. • The inference technique is the Junction Tree Algorithm

  15. Junction Tree Algorithm • Efficient Message Passing for Undirected Graphs. • For Directed Graphs, first convert to undirected. • Goal: Efficient Inference in Graphical Models

  16. Junction Tree Algorithm Moralization Introduce Evidence Triangulate Construct Junction Tree Propagate Probabilities

  17. Moralization • Converts a directed graph to an undirected graph. • Moralization “marries” the parents. • Insert an undirected edge between every pair of nodes that have a child in common. • Replace all directed edges with undirected edges.

  18. Moralization Examples

  19. Moralization Examples

  20. Junction Tree Algorithm Moralization Introduce Evidence Triangulate Construct Junction Tree Propagate Probabilities

  21. Introduce Evidence • Given a moral graph, identify the observed variables. • Reduce probability functions since we know some are fixed. • Only keep probability functions over remaining nodes.

  22. Slices Differentiate potential functions from slices Potential Functions are related to joint probabilities over groups of nodes, but aren’t necessarily correctly normalized, and can even be initialized to conditionals. A slice of a potential function is a row or column of the underlying table (in the discrete case) or unnormalized marginal (in the continuous case)

  23. Separation from Introducing Evidence Observing nodes separates conditionally independent sets of variables Normalization Calculation. Don’t bother until the end when we want to determine an individual marginal.

  24. Junction Trees • Construction of junction trees. • Each node represents a clique of variables. • Edges connect cliques • There is a unique path from node to root • Between each clique node is a separator node. • Separators contain intersections of variables

  25. Triangulation A A ABD B D B D B D BC DE E C E C E C CE • Constructing a junction tree. • Need to guarantee that a Junction Graph, made up of cliques and separators of an undirected graph is a Tree. • Eliminate any chordless cycles of four or more nodes.

  26. Junction Tree Algorithm Moralization Introduce Evidence Triangulate Construct Junction Tree Propagate Probabilities

  27. Triangulation When eliminating cycles there may be many choices about which edge to add. Want to keep the largest clique size small – small potential functions Triangulation that minimizes the largest clique size is NP-complete. Suboptimal triangulation is acceptable (poly-time) and doesn’t introduce many extra dimensions.

  28. Triangulation When eliminating cycles there may be many choices about which edge to add. Want to keep the largest clique size small – small potential functions Triangulation that minimizes the largest clique size is NP-complete. Suboptimal triangulation is acceptable (poly-time) and doesn’t introduce many extra dimensions.

  29. Triangulation Examples A A A B D A C E A A F

  30. Junction Tree Algorithm Moralization Introduce Evidence Triangulate Construct Junction Tree Propagate Probabilities

  31. Constructing Junction Trees A ABD ABD BD B D D BCD BCD CD CD C E CDE CDE • Junction trees must satisfy the Running Intersection Property: • All nodes on a path between a clique node V and clique node W must include all nodes in V ∩ W • Junction trees will have maximal separator cardinality.

  32. Forming a Junction Tree • Given a set of cliques, connect the nodes, s.t. the Running Intersection Property holds. • Maximize the cardinality of the separators. • Maximum Spanning Tree (Kruskal’s algorithm) • Initialize a tree with no edges. • Calculate the size of separators between all pairs • O(N2) • Connect two cliques with the largest separator cardinality without creating a loop. • Repeat until all nodes are connected.

  33. Junction Tree Algorithm Moralization Introduce Evidence Triangulate Construct Junction Tree Propagate Probabilities

  34. Propagating Probabilities • We have a valid junction tree. • What can we do with it? • Probabilities in Junction Trees: • De-absorb smaller cliques from maximal cliques. • Doesn’t change anything, but is a less compact description.

  35. Conversion from Directed Graph X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X2 X2 X3 X3 X3 X4 Example conversion. Represent CPTs as potential and separator functions (with a normalizer)

  36. Junction Tree Algorithm • Goal: Make marginalsconsistent • Junction Tree Algorithm sends messages between cliques and separators until this consistency is reached.

  37. Message passing A B B B C If they agree, finished. Otherwise, iterate. • Send a message from a clique to a separator • The message is what the clique thinks the marginal should be. • Normalize the clique by each message from the separators s.t. agreement is reached

  38. Message Passing A B B B C

  39. Junction Tree Algorithm When convergence is reached – clique potentials are marginals and separator potentials are submarginals. p(x) is consistent across all of the message passing. This implies that, so long as p(x) is correctly represented in the potential functions, the JTA can be used to make each potential function correspond to an appropriate marginal without impacting the overall probability function.

  40. Converting a DAG to a Junction Tree X3 X4 X1 X2 X3 X4 X2 X3 X5 X3 X5 X1 X2 X6 X7 X5 X6 X5 X7 • Initialize separators to 1 and clique tables to CPTs • Run JTA to convert potential functions (CPTs) to marginals.

  41. Evidence in a Junction Tree Conditional Initialize as usual. Update with a slice rather that the whole table

  42. Efficiency of the Junction Tree Algorithm • Construct CPTs • Polynomial in # of data points • Moralization • Polynomial in # of nodes • Introduce Evidence • Polynomial in # of nodes • Triangulate • Suboptimal = polynomial. Optimal = NP • Construct Junction Tree • Polynomial in the number of cliques • Identifying cliques = polynomial in the number of nodes • Propagate Probabilities • Polynomial in the number of cliques • Exponential in the size of cliques

  43. Hidden Markov Models Q1 Q2 Q3 Q4 X1 X2 X3 X4 Powerful graphical model to describe sequential information.

More Related