- 100 Views
- Uploaded on
- Presentation posted in: General

Incomplete Graphical Models

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Incomplete Graphical Models

Nan Hu

- Motivation
- K-means clustering
- Coordinate Descending algorithm

- Density estimation
- EM on unconditional mixture

- Regression and classification
- EM on conditional mixture

- A general formulation of EM Algorithm

Problem: Given a set of observations

how to group them into a set of K clustering, supposing the value of K is given.

- First Phase
- Second Phase

First Iteration

Original Set

Second Iteration

Third Iteration

- Coordinate descent algorithm
- The algorithm is trying to minimize distortion measure J
by setting the partial derivatives to zero

Problem: If the given sample data demonstrate multimodal densities, how to estimate the true density?

Fit a single density with this bimodal case.

Although algorithm converges, the results bear little relationship to the truth.

- A “divide-and-conquer” way to solve this problem
- Introducing latent variable Z

Multinomial node taking on one of K values

Z

Assign a density model for each subpopulation, overall density is

X

Back

- Gaussian Mixture Models
- In this model, the mixture components are Gaussian distributions with parameters

- Probability model for a Gaussian mixture

- Posterior probability of latent variable Z:
- Log likelihood:

- Partial derivative of over using Lagrange Multipliers
- Solve it, we have

- Partial derivative of over
- Setting it to zero, we have

- Partial derivative of over
- Setting it to zero, we have

- The EM Algorithm
- First Phase
- Second Phase

Back

- EM algorithm from expected complete log likelihood point of view
Suppose we observed the latent variables ,

the data set becomes completely observed, the likelihood is defined as the complete log likelihood

We treat the as random variables and take expectations conditioned on X and .

Note are binary r.v., where

Use this as the “best guess” for , we have

Expected complete log likelihood

- Minimizing expected complete log likelihood by setting the derivatives to zero, we have

- Graphical Model

For regression and classification

X

The relationship between X and Z can be modeled in a discriminative classification way, e.g. softmax func.

Z

Y

Latent variable Z, multinomial node taking on one of K values

Back

- By marginalizing over Z,
- X is taken to be always observed. The posterior probability is defined as

- Some specific choice of mixture components
- Gaussian components
- Logistic components
Where is the logistic function:

- Parameter estimation via EM
Complete log likelihood :

Use expectation as the “best guess”, we have

- The expected complete log likelihood can then be written as
- Taking partial derivatives and setting them to zero to find the update formula for EM

Summary of EM algorithm for conditional mixture

- (E step): Calculate the posterior probabilities
- (M step): Use the IRLS algorithm to update the parameter , base on data pairs .
- (M step): Use the weighted IRLS algorithm to update the parameters , based on the data points , with weights .

Back

- - all observable variables
- - all latent variables
- - all parameters
Suppose is observed, the ML estimate is

However, is in fact not observed

Complete log likelihood

Incomplete log likelihood

- Suppose factors in some way, complete log likelihood turns to be
- Since is unknown, it’s not clear how to solve this ML estimation. However, if we average over the r.v. of

- Use as an estimate of , complete log likelihood becomes expected complete log likelihood
- This expected complete log likelihood becomes solvable, and hopefully, it’ll also improve the complete log likelihood in some way. (The basic idea behind EM.)

- EM maximizes incomplete log likelihood

Jensen’s Inequality

Auxiliary Function

- Given , maximizing is equal to maximizing the expected complete log likelihood

- Given , the choice yields the maximum of .

Note:is the upper bound of

- From above, at every step of EM, we maximized .
- However, how do we know whether the finally maximized also maximized incomplete log likelihood ?

- The different between and

non-negative and uniquely minimized at

KL divergence

- EM and alternating minimization
- Recall the maximization of the likelihood is exactly the same as minimization of KL divergence between the empirical distribution and the model.
- Including the latent variable , KL divergence comes to be a “complete KL divergence” between joint distributions on .

- Reformulated EM algorithm
- (E step)
- (M step)

Alternating minimization algorithm

- Unconditional Mixture
- Graphic model
- EM algorithm

- Conditional Mixture
- Graphic model
- EM algorithm

- A general formulation of EM algorithm
- Maximizing auxiliary function
- Minimizing “complete KL divergence”

Incomplete Graphical Models

Thank You!