EM Algorithm and Mixture of Gaussians

1 / 32

# EM Algorithm and Mixture of Gaussians - PowerPoint PPT Presentation

EM Algorithm and Mixture of Gaussians. Collard Fabien - 20046056 김진식 (Kim Jinsik) - 20043152 주찬혜 (Joo Chanhye) - 20043595. Summary. Hidden Factors EM Algorithm Principles Formalization Mixture of Gaussians Generalities Processing Formalization Other Issues

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'EM Algorithm and Mixture of Gaussians' - anjelita

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### EM AlgorithmandMixture of Gaussians

Collard Fabien - 20046056

김진식 (Kim Jinsik) - 20043152

주찬혜 (Joo Chanhye) - 20043595

Summary
• Hidden Factors
• EM Algorithm
• Principles
• Formalization
• Mixture of Gaussians
• Generalities
• Processing
• Formalization
• Other Issues
• Bayesian Network with hidden variables
• Hidden Markov models
• Bayes net structures with hidden variables

2

Hidden factorsThe Problem : Hidden Factors
• Unobservable / Latent / Hidden
• Make them as variables
• Simplicity of the model

3

162

54

54

486

54

Symptom 1

Symptom 2

Symptom 3

Hidden factors

Simplicity details (graph1)

2

2

2

Smoking

Diet

Exercise

708 Priors !

4

Heart Disease

54

Hidden factors

Simplicity details (Graph2)

2

2

2

Smoking

Diet

Exercise

78 Priors

6

6

6

Symptom 1

Symptom 2

Symptom 3

5

EM AlgorithmPrinciples : Generalities
• Given :
• Cause (or Factor / Component)
• Evidence
• Compute :
• Probability in connection table

7

E Step : For each evidence (E),

Use parameters to compute probability distribution

Weighted Evidence :

P(causes/evidence)

M Step : Update the estimates of parameters

Based on weighted evidence

EM Algorithm

Principles : The two steps

Parameters :

P(effects/causes)

P(causes)

8

EM AlgorithmPrinciples : the E-Step
• Perception Step
• For each evidence and cause
• Compute probablities
• Find probable relationships

9

EM AlgorithmPrinciples : the M-Step
• Learning Step
• Recompute the probability
• Cause event / Evidence event
• Sum for all Evidence events
• Maximize the loglikelihood
• Modify the model parameters

10

EM AlgorithmFormulae : Notations
• Terms
•  : underlying probability distribution
• x : observed data
• z : unobserved data
• h : current hypothesis of 
• h’ : revised hypothesis
• q : a hidden variable distribution
• Task : estimate  from X
• E-step:
• M-step:

11

EM AlgorithmFormulae : the Log Likelihood
• L(h) estimates the fitting of the parameter h to the data x with the given hidden variables z :
• Jensen's inequality for any distribution of hidden states q(z) :
• Defines the auxiliary function A(q,h):
• Lower bound on the log likelihood
• What we want to optimize

12

EM AlgorithmFormulae : the E-step
• Lower bound on log likelihood :
• H(q) entropy of q(z),
• Optimize A(q,h)
• By distribute data over hidden variables

13

EM AlgorithmFormulae : the M-step
• Maximise A(q,h)
• By choosing the optimal parameters
• Equivalent to optimize likelihood

14

EM AlgorithmFormulae : Convergence (1/2)
• EM increases the log likelihood of the data at every iteration
• Kullback-Liebler (KL) divergence
• Non negative
• Equals 0 iff q(z)=p(z/x,h)

15

Formulae : Convergence (2/2)
• Likelihood increases at each iteration
• Usually, EM converges to a local optimum of L

16

Problem of likelihood
• Can be high dimensional integral
• Latent variables  additional dimensions
• Likelihood term can be complicated

17

Mixture of GaussiansThe Issue : Mixture of Gaussian
• Unsupervised clustering
• Set of data points (Evidences)
• Data generated from mixture distribution
• Continuous data : Mixture of Gaussians
• Not easy to handle :
• Number of parameters is Dimension-squared

18

Mixture of GaussiansGaussian Mixture model (2/2)
• Distribution
• Likelihood of Gaussian Distribution :
• Likelihood given a GMM :
• N number of Gaussians
• wi the weight of Gaussian I
• All weights positive
• Total weight = 1

19

EM for Gaussian Mixture Model
• What for ?
• Find parameters:
• Weights: wi=P(C=i)
• Means: i
• Covariances: i
• How ?
• Guess the priority Distribution
• Guess components (Classes -or Causes)
• Guess the distribution function

20

Mixture of GaussiansProcessing : EM Initialization
• Initialization :
• Assign random value to parameters

21

Mixture of GaussiansProcessing : the E-Step (1/2)
• Expectation :
• Pretend to know the parameter
• Assign data point to a component

22

Mixture of GaussiansProcessing : the E-Step (2/2)
• Competition of Hypotheses
• Compute the expected values of Pij of hidden indicator variables.
• Each gives membership weights to data point
• Normalization
• Weight = relative likelihood of class membership

23

Mixture of GaussiansProcessing : the M-Step (1/2)
• Maximization :
• Fit the parameter to its set of points

24

Mixture of GaussiansProcessing : the M-Step (2/2)
• For each Hypothesis
• Find the new value of parameters to maximize the log likelihood
• Based on
• Weight of points in the class
• Location of the points
• Hypotheses are pulled toward data

25

Mixture of GaussiansApplied formulae : the E-Step
• Find Gaussian for every data point
• Use Bayes’ rule:

26

Maximize A

For each parameter of h, search for :

Results :

μ*

σ2*

w*

Mixture of Gaussians

Applied formulae : the M-Step

27

Mixture of GaussiansEventual problems
• Gaussian Component shrinks
• Variance 0
• Likelihood infinite
• Gaussian Components merge
• Same values
• Share the data points
• A Solution : reasonable prior values

28

Other IssuesHidden Markov models
• Forward-Backward Algorithm
• Smooth rather than filter

30

Other IssuesBayes net with hidden variables
• Pretend that data is complete
• Or invent new hidden variable
• No label or meaning

31

Conclusion
• Widely applicable
• Diagnosis
• Classification
• Distribution Discovery
• Does not work for complex models
• High dimension
•  Structural EM

32