Incomplete Graphical Models - PowerPoint PPT Presentation

Incomplete graphical models
1 / 34

  • Uploaded on
  • Presentation posted in: General

Incomplete Graphical Models. Nan Hu. Outline. Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture Regression and classification EM on conditional mixture A general formulation of EM Algorithm. K-means clustering.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Incomplete Graphical Models

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Incomplete graphical models

Incomplete Graphical Models

Nan Hu



  • Motivation

  • K-means clustering

    • Coordinate Descending algorithm

  • Density estimation

    • EM on unconditional mixture

  • Regression and classification

    • EM on conditional mixture

  • A general formulation of EM Algorithm

K means clustering

K-means clustering

Problem: Given a set of observations

how to group them into a set of K clustering, supposing the value of K is given.

  • First Phase

  • Second Phase

K means clustering1

K-means clustering

First Iteration

Original Set

Second Iteration

Third Iteration

K means clustering2

K-means clustering

  • Coordinate descent algorithm

  • The algorithm is trying to minimize distortion measure J

    by setting the partial derivatives to zero

Unconditional mixture

Unconditional Mixture

Problem: If the given sample data demonstrate multimodal densities, how to estimate the true density?

Fit a single density with this bimodal case.

Although algorithm converges, the results bear little relationship to the truth.

Unconditional mixture1

Unconditional Mixture

  • A “divide-and-conquer” way to solve this problem

  • Introducing latent variable Z

Multinomial node taking on one of K values


Assign a density model for each subpopulation, overall density is



Unconditional mixture2

Unconditional Mixture

  • Gaussian Mixture Models

    • In this model, the mixture components are Gaussian distributions with parameters

  • Probability model for a Gaussian mixture

Unconditional mixture3

Unconditional Mixture

  • Posterior probability of latent variable Z:

  • Log likelihood:

Unconditional mixture4

Unconditional Mixture

  • Partial derivative of over using Lagrange Multipliers

  • Solve it, we have

Unconditional mixture5

Unconditional Mixture

  • Partial derivative of over

  • Setting it to zero, we have

Unconditional mixture6

Unconditional Mixture

  • Partial derivative of over

  • Setting it to zero, we have

Unconditional mixture7

Unconditional Mixture

  • The EM Algorithm

  • First Phase

  • Second Phase


Unconditional mixture8

Unconditional Mixture

  • EM algorithm from expected complete log likelihood point of view

    Suppose we observed the latent variables ,

    the data set becomes completely observed, the likelihood is defined as the complete log likelihood

Unconditional mixture9

Unconditional Mixture

We treat the as random variables and take expectations conditioned on X and .

Note are binary r.v., where

Use this as the “best guess” for , we have

Expected complete log likelihood

Unconditional mixture10

Unconditional Mixture

  • Minimizing expected complete log likelihood by setting the derivatives to zero, we have

Conditional mixture

Conditional Mixture

  • Graphical Model

For regression and classification


The relationship between X and Z can be modeled in a discriminative classification way, e.g. softmax func.



Latent variable Z, multinomial node taking on one of K values


Conditional mixture1

Conditional Mixture

  • By marginalizing over Z,

  • X is taken to be always observed. The posterior probability is defined as

Conditional mixture2

Conditional Mixture

  • Some specific choice of mixture components

    • Gaussian components

    • Logistic components

      Where is the logistic function:

Conditional mixture3

Conditional Mixture

  • Parameter estimation via EM

    Complete log likelihood :

    Use expectation as the “best guess”, we have

Conditional mixture4

Conditional Mixture

  • The expected complete log likelihood can then be written as

  • Taking partial derivatives and setting them to zero to find the update formula for EM

Conditional mixture5

Conditional Mixture

Summary of EM algorithm for conditional mixture

  • (E step): Calculate the posterior probabilities

  • (M step): Use the IRLS algorithm to update the parameter , base on data pairs .

  • (M step): Use the weighted IRLS algorithm to update the parameters , based on the data points , with weights .


General formulation

General Formulation

  • - all observable variables

  • - all latent variables

  • - all parameters

    Suppose is observed, the ML estimate is

    However, is in fact not observed

Complete log likelihood

Incomplete log likelihood

General formulation1

General Formulation

  • Suppose factors in some way, complete log likelihood turns to be

  • Since is unknown, it’s not clear how to solve this ML estimation. However, if we average over the r.v. of

General formulation2

General Formulation

  • Use as an estimate of , complete log likelihood becomes expected complete log likelihood

  • This expected complete log likelihood becomes solvable, and hopefully, it’ll also improve the complete log likelihood in some way. (The basic idea behind EM.)

General formulation3

General Formulation

  • EM maximizes incomplete log likelihood

Jensen’s Inequality

Auxiliary Function

General formulation4

General Formulation

  • Given , maximizing is equal to maximizing the expected complete log likelihood

General formulation5

General Formulation

  • Given , the choice yields the maximum of .

Note:is the upper bound of

General formulation6

General Formulation

  • From above, at every step of EM, we maximized .

  • However, how do we know whether the finally maximized also maximized incomplete log likelihood ?

General formulation7

General Formulation

  • The different between and

non-negative and uniquely minimized at

KL divergence

General formulation8

General Formulation

  • EM and alternating minimization

    • Recall the maximization of the likelihood is exactly the same as minimization of KL divergence between the empirical distribution and the model.

    • Including the latent variable , KL divergence comes to be a “complete KL divergence” between joint distributions on .

General formulation9

General Formulation

  • Reformulated EM algorithm

    • (E step)

    • (M step)

Alternating minimization algorithm



  • Unconditional Mixture

    • Graphic model

    • EM algorithm

  • Conditional Mixture

    • Graphic model

    • EM algorithm

  • A general formulation of EM algorithm

    • Maximizing auxiliary function

    • Minimizing “complete KL divergence”

Incomplete graphical models1

Incomplete Graphical Models

Thank You!

  • Login