Dynamic Programming & Hidden Markov Models.

Dynamic Programming& Hidden Markov Models. Alan Yuille Dept. Statistics UCLA

1. Chair Goal of this Talk • This talk introduces one of the major algorithms: dynamic programming (DP). • Then describe how it can be used in conjunction with EM for learning.

Dynamic Programming • Dynamic Programming exploits the graphical structure of the probability distribution. It can be applied to any structure without closed loops. • Consider the two-headed coin example given in Tom Griffiths talk (Monday).

Probabilistic Grammars • By the Markov Condition: • Hence we can exploit the graphical structure to efficiently compute: The structure means that the sum over x2 drops out. We need only sum over x1 and x3. Only four operations instead of eight.

Dynamic Programming Intuition • Suppose you wish to travel to Boston from Los Angeles by car. • To determine the cost of going via Chicago – you only need to calculate the shortest cost from Los Angeles to Chicago and then, independently, the shortest cost from Chicago to Boston. • Decomposing the route in this way gives an efficient algorithm which is polynomial in the number of nodes and feasible for computation.

Dynamic Programming Diamond • Compute the shortest cost from A to B.

Application to a 1-dim chain. • Consider a distribution defined on a 1-dim chain. • Important property: directed and undirected graphs are equivalent (for 1-dim chain). • P(A,B) = P(A|B) P(B) or P(A,B) = P(B|A) P(A) • For these simple graphs with two nodes -- you cannot distinguish causation from correlation without intervention (Wu’s lecture Friday). • For this lecture – we will treat a simple one-dimensional cover directed and undirected models simultaneously. (Translating between directed and undirected is generally possible for graphs without closed loops – but has subtleties).

Probability distribution on 1-D chain

1-D Chain.

1-Dim Chain: • (Proof by induction).

1-Dim Chain • We can also use DP to compute other properties: e.g. to convert the distribution from undirected form: • To directed form:

1-Dim Chain

Special Case: 1-D Ising Spin Model

Dynamic Programming Summary • Dynamic Programming can be applied to perform inference on all graphical models defined on trees –The key insight is that, for trees, we can define an order on the nodes (not necessarily unique) and process nodes in sequence (never needing to return to a node that have already been processed).

Extensions of Dynamic Programming: • What to do if you have a graph with closed loops? • There are a variety of advanced ways to exploit the graphical structure and obtain efficient exact algorithms. • Prof. Adnan Darwiche (CS, UCLA) is an expert on this topic. There will be an introduction to his SamIam code. • Also can use approximate methods like BP.

Junction Trees. • It is also possible to take a probability distribution defined on a graph with closed loops and reformulate it as a distribution on a new nodes without closed loops. (Lauritzen and Spiegelhalter 1990). • This lead to a variety of algorithm generally known as junction trees. • This is not a universal solution – because the resulting new graphs may have too many nodes to make them practical. • Google “junction trees” to find nice tutorials on junction trees.

Graph Conversion • Convert graph by a set of transformations.

Triangles & Augmented Variables • From triangles to ordered triangles. Original Variables: Loops Augmented Variables: No Loops

Summary of Dynamic Programming. • Dynamic Programming can be used to efficiently compute properties of a distribution for graphs defined on trees. • Directed graphs on trees can be reformulated as undirected graphs on trees, and vice versa. • DP can be extended to apply to graphs with closed loops by restructuring the graphs (junction trees). • It is an active research area to determine efficient inference algorithms which exploit the graphical structures of these models. • Relationship between DP and reinforcement learning (week 2). • DP and A*. DP and pruning.

HMM’s: Learning and Inference • So far we have considered inference only. • This assumes that the model is known. • How can we learn the model? • For 1D models -- this uses DP and EM.

A simple HMM for Coin Tossing • Two coins, one biased and the other fair, with the coins switched occasionally. • The observable 0,1 is whether the coin is head or tails. • The hidden state A,B is which coin is used. • There are unknown transition probabilities between the hidden states A and B, and unknown probabilities for the observations conditioned on the hidden states. • The learning task is to estimate these probabilities from a sequence of measurements.

HMM for Speech

HMM Summary • HMM define a class of markov models with hidden variables. Used for speech recognition, and many other applications. • Tasks involving HMM’s involve learning, inference, and model selection. • These can often be performed by algorithms based on EM and DP.

Dynamic Programming & Hidden Markov Models.