1 / 23

# Hierarchical Mixture of Experts - PowerPoint PPT Presentation

Hierarchical Mixture of Experts. Presented by Qi An Machine learning reading group Duke University 07/15/2005. Outline. Background Hierarchical tree structure Gating networks Expert networks E-M algorithm Experimental results Conclusions. Background. The idea of mixture of experts

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Hierarchical Mixture of Experts' - channing-melendez

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Hierarchical Mixture of Experts

Presented by Qi An

Duke University

07/15/2005

• Background

• Hierarchical tree structure

• Gating networks

• Expert networks

• E-M algorithm

• Experimental results

• Conclusions

• The idea of mixture of experts

• First presented by Jacobs and Hintons in 1988

• Hierarchical mixture of experts

• Proposed by Jordan and Jacobs in 1994

• Difference from previous mixture model

• Mixing weights depends on both the input and the output

μ

Ellipsoidal Gating function

g1

g2

g3

Gating Network

x

μ1

μ2

μ3

Expert Network

Expert Network

Expert Network

x

x

x

Linear Gating function

• At the leaves of trees

for each expert:

linear predictor

output of the expert

For example: logistic function for binary classification

• At the nonterminal of the tree

top layer: other layer:

• At the non-leaves nodes

top node: other nodes:

• For each expert, assume the true output y is chosen from a distribution P with mean μij

• Therefore, the total probability of generating y from x is given by

• Since the gij and gi are computed based only on the input x, we refer them as prior probabilities.

• We can define the posterior probabilities with the knowledge of both the input x and the output y using Bayes’ rule

• Introduce auxiliary variables zij which have an interpretation as the labels that corresponds to the experts.

• The probability model can be simplified with the knowledge of auxiliary variables

• Complete-data likelihood:

• The E-step

• The M-step

• Iteratively reweighted least squares alg.

• An iterative algorithm for computing the maximum likelihood estimates of the parameters of a generalized linear model

• A special case for Fisher scoring method

E-step

M-step

• This algorithm can be used for online regression

• For Expert network:

where Rij is the inverse covariance matrix for EN(i,j)

• For Gating network:

where Si is the inverse covariance matrix

and

where Sijis the inverse covariance matrix

• Simulated data of a four-joint robot arm moving in three-dimensional space

• Introduce a tree-structured architecture for supervised learning

• Much faster than traditional back-propagation algorithm

• Can be used for on-line learning

Questions?