1 / 34

# Incomplete Graphical Models - PowerPoint PPT Presentation

Incomplete Graphical Models. Nan Hu. Outline . Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture Regression and classification EM on conditional mixture A general formulation of EM Algorithm. K-means clustering.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Incomplete Graphical Models' - jabir

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Incomplete Graphical Models

Nan Hu

• Motivation

• K-means clustering

• Coordinate Descending algorithm

• Density estimation

• EM on unconditional mixture

• Regression and classification

• EM on conditional mixture

• A general formulation of EM Algorithm

Problem: Given a set of observations

how to group them into a set of K clustering, supposing the value of K is given.

• First Phase

• Second Phase

First Iteration

Original Set

Second Iteration

Third Iteration

• Coordinate descent algorithm

• The algorithm is trying to minimize distortion measure J

by setting the partial derivatives to zero

Problem: If the given sample data demonstrate multimodal densities, how to estimate the true density?

Fit a single density with this bimodal case.

Although algorithm converges, the results bear little relationship to the truth.

• A “divide-and-conquer” way to solve this problem

• Introducing latent variable Z

Multinomial node taking on one of K values

Z

Assign a density model for each subpopulation, overall density is

X

Back

• Gaussian Mixture Models

• In this model, the mixture components are Gaussian distributions with parameters

• Probability model for a Gaussian mixture

• Posterior probability of latent variable Z:

• Log likelihood:

• Partial derivative of over using Lagrange Multipliers

• Solve it, we have

• Partial derivative of over

• Setting it to zero, we have

• Partial derivative of over

• Setting it to zero, we have

• The EM Algorithm

• First Phase

• Second Phase

Back

• EM algorithm from expected complete log likelihood point of view

Suppose we observed the latent variables ,

the data set becomes completely observed, the likelihood is defined as the complete log likelihood

We treat the as random variables and take expectations conditioned on X and .

Note are binary r.v., where

Use this as the “best guess” for , we have

Expected complete log likelihood

• Minimizing expected complete log likelihood by setting the derivatives to zero, we have

• Graphical Model

For regression and classification

X

The relationship between X and Z can be modeled in a discriminative classification way, e.g. softmax func.

Z

Y

Latent variable Z, multinomial node taking on one of K values

Back

• By marginalizing over Z,

• X is taken to be always observed. The posterior probability is defined as

• Some specific choice of mixture components

• Gaussian components

• Logistic components

Where is the logistic function:

• Parameter estimation via EM

Complete log likelihood :

Use expectation as the “best guess”, we have

• The expected complete log likelihood can then be written as

• Taking partial derivatives and setting them to zero to find the update formula for EM

Summary of EM algorithm for conditional mixture

• (E step): Calculate the posterior probabilities

• (M step): Use the IRLS algorithm to update the parameter , base on data pairs .

• (M step): Use the weighted IRLS algorithm to update the parameters , based on the data points , with weights .

Back

• - all observable variables

• - all latent variables

• - all parameters

Suppose is observed, the ML estimate is

However, is in fact not observed

Complete log likelihood

Incomplete log likelihood

• Suppose factors in some way, complete log likelihood turns to be

• Since is unknown, it’s not clear how to solve this ML estimation. However, if we average over the r.v. of

• Use as an estimate of , complete log likelihood becomes expected complete log likelihood

• This expected complete log likelihood becomes solvable, and hopefully, it’ll also improve the complete log likelihood in some way. (The basic idea behind EM.)

• EM maximizes incomplete log likelihood

Jensen’s Inequality

Auxiliary Function

• Given , maximizing is equal to maximizing the expected complete log likelihood

• Given , the choice yields the maximum of .

Note:is the upper bound of

• From above, at every step of EM, we maximized .

• However, how do we know whether the finally maximized also maximized incomplete log likelihood ?

• The different between and

non-negative and uniquely minimized at

KL divergence

• EM and alternating minimization

• Recall the maximization of the likelihood is exactly the same as minimization of KL divergence between the empirical distribution and the model.

• Including the latent variable , KL divergence comes to be a “complete KL divergence” between joint distributions on .

• Reformulated EM algorithm

• (E step)

• (M step)

Alternating minimization algorithm

• Unconditional Mixture

• Graphic model

• EM algorithm

• Conditional Mixture

• Graphic model

• EM algorithm

• A general formulation of EM algorithm

• Maximizing auxiliary function

• Minimizing “complete KL divergence”

Thank You!