Incomplete graphical models
1 / 34

Incomplete Graphical Models - PowerPoint PPT Presentation

  • Uploaded on

Incomplete Graphical Models. Nan Hu. Outline . Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture Regression and classification EM on conditional mixture A general formulation of EM Algorithm. K-means clustering.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Incomplete Graphical Models' - jabir

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


  • Motivation

  • K-means clustering

    • Coordinate Descending algorithm

  • Density estimation

    • EM on unconditional mixture

  • Regression and classification

    • EM on conditional mixture

  • A general formulation of EM Algorithm

K means clustering
K-means clustering

Problem: Given a set of observations

how to group them into a set of K clustering, supposing the value of K is given.

  • First Phase

  • Second Phase

K means clustering1
K-means clustering

First Iteration

Original Set

Second Iteration

Third Iteration

K means clustering2
K-means clustering

  • Coordinate descent algorithm

  • The algorithm is trying to minimize distortion measure J

    by setting the partial derivatives to zero

Unconditional mixture
Unconditional Mixture

Problem: If the given sample data demonstrate multimodal densities, how to estimate the true density?

Fit a single density with this bimodal case.

Although algorithm converges, the results bear little relationship to the truth.

Unconditional mixture1
Unconditional Mixture

  • A “divide-and-conquer” way to solve this problem

  • Introducing latent variable Z

Multinomial node taking on one of K values


Assign a density model for each subpopulation, overall density is



Unconditional mixture2
Unconditional Mixture

  • Gaussian Mixture Models

    • In this model, the mixture components are Gaussian distributions with parameters

  • Probability model for a Gaussian mixture

Unconditional mixture3
Unconditional Mixture

  • Posterior probability of latent variable Z:

  • Log likelihood:

Unconditional mixture4
Unconditional Mixture

  • Partial derivative of over using Lagrange Multipliers

  • Solve it, we have

Unconditional mixture5
Unconditional Mixture

  • Partial derivative of over

  • Setting it to zero, we have

Unconditional mixture6
Unconditional Mixture

  • Partial derivative of over

  • Setting it to zero, we have

Unconditional mixture7
Unconditional Mixture

  • The EM Algorithm

  • First Phase

  • Second Phase


Unconditional mixture8
Unconditional Mixture

  • EM algorithm from expected complete log likelihood point of view

    Suppose we observed the latent variables ,

    the data set becomes completely observed, the likelihood is defined as the complete log likelihood

Unconditional mixture9
Unconditional Mixture

We treat the as random variables and take expectations conditioned on X and .

Note are binary r.v., where

Use this as the “best guess” for , we have

Expected complete log likelihood

Unconditional mixture10
Unconditional Mixture

  • Minimizing expected complete log likelihood by setting the derivatives to zero, we have

Conditional mixture
Conditional Mixture

  • Graphical Model

For regression and classification


The relationship between X and Z can be modeled in a discriminative classification way, e.g. softmax func.



Latent variable Z, multinomial node taking on one of K values


Conditional mixture1
Conditional Mixture

  • By marginalizing over Z,

  • X is taken to be always observed. The posterior probability is defined as

Conditional mixture2
Conditional Mixture

  • Some specific choice of mixture components

    • Gaussian components

    • Logistic components

      Where is the logistic function:

Conditional mixture3
Conditional Mixture

  • Parameter estimation via EM

    Complete log likelihood :

    Use expectation as the “best guess”, we have

Conditional mixture4
Conditional Mixture

  • The expected complete log likelihood can then be written as

  • Taking partial derivatives and setting them to zero to find the update formula for EM

Conditional mixture5
Conditional Mixture

Summary of EM algorithm for conditional mixture

  • (E step): Calculate the posterior probabilities

  • (M step): Use the IRLS algorithm to update the parameter , base on data pairs .

  • (M step): Use the weighted IRLS algorithm to update the parameters , based on the data points , with weights .


General formulation
General Formulation

  • - all observable variables

  • - all latent variables

  • - all parameters

    Suppose is observed, the ML estimate is

    However, is in fact not observed

Complete log likelihood

Incomplete log likelihood

General formulation1
General Formulation

  • Suppose factors in some way, complete log likelihood turns to be

  • Since is unknown, it’s not clear how to solve this ML estimation. However, if we average over the r.v. of

General formulation2
General Formulation

  • Use as an estimate of , complete log likelihood becomes expected complete log likelihood

  • This expected complete log likelihood becomes solvable, and hopefully, it’ll also improve the complete log likelihood in some way. (The basic idea behind EM.)

General formulation3
General Formulation

  • EM maximizes incomplete log likelihood

Jensen’s Inequality

Auxiliary Function

General formulation4
General Formulation

  • Given , maximizing is equal to maximizing the expected complete log likelihood

General formulation5
General Formulation

  • Given , the choice yields the maximum of .

Note:is the upper bound of

General formulation6
General Formulation

  • From above, at every step of EM, we maximized .

  • However, how do we know whether the finally maximized also maximized incomplete log likelihood ?

General formulation7
General Formulation

  • The different between and

non-negative and uniquely minimized at

KL divergence

General formulation8
General Formulation

  • EM and alternating minimization

    • Recall the maximization of the likelihood is exactly the same as minimization of KL divergence between the empirical distribution and the model.

    • Including the latent variable , KL divergence comes to be a “complete KL divergence” between joint distributions on .

General formulation9
General Formulation

  • Reformulated EM algorithm

    • (E step)

    • (M step)

Alternating minimization algorithm


  • Unconditional Mixture

    • Graphic model

    • EM algorithm

  • Conditional Mixture

    • Graphic model

    • EM algorithm

  • A general formulation of EM algorithm

    • Maximizing auxiliary function

    • Minimizing “complete KL divergence”