Hierarchical Mixture of Experts

1 / 23

# Hierarchical Mixture of Experts - PowerPoint PPT Presentation

Hierarchical Mixture of Experts. Presented by Qi An Machine learning reading group Duke University 07/15/2005. Outline. Background Hierarchical tree structure Gating networks Expert networks E-M algorithm Experimental results Conclusions. Background. The idea of mixture of experts

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Hierarchical Mixture of Experts' - channing-melendez

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Hierarchical Mixture of Experts

Presented by Qi An

Duke University

07/15/2005

Outline
• Background
• Hierarchical tree structure
• Gating networks
• Expert networks
• E-M algorithm
• Experimental results
• Conclusions
Background
• The idea of mixture of experts
• First presented by Jacobs and Hintons in 1988
• Hierarchical mixture of experts
• Proposed by Jordan and Jacobs in 1994
• Difference from previous mixture model
• Mixing weights depends on both the input and the output
One-layer structure

μ

Ellipsoidal Gating function

g1

g2

g3

Gating Network

x

μ1

μ2

μ3

Expert Network

Expert Network

Expert Network

x

x

x

Hierarchical tree structure

Linear Gating function

Expert network
• At the leaves of trees

for each expert:

linear predictor

output of the expert

For example: logistic function for binary classification

Gating network
• At the nonterminal of the tree

top layer: other layer:

Output
• At the non-leaves nodes

top node: other nodes:

Probability model
• For each expert, assume the true output y is chosen from a distribution P with mean μij
• Therefore, the total probability of generating y from x is given by
Posterior probabilities
• Since the gij and gi are computed based only on the input x, we refer them as prior probabilities.
• We can define the posterior probabilities with the knowledge of both the input x and the output y using Bayes’ rule
E-M algorithm
• Introduce auxiliary variables zij which have an interpretation as the labels that corresponds to the experts.
• The probability model can be simplified with the knowledge of auxiliary variables
E-M algorithm
• Complete-data likelihood:
• The E-step
E-M algorithm
• The M-step
IRLS
• Iteratively reweighted least squares alg.
• An iterative algorithm for computing the maximum likelihood estimates of the parameters of a generalized linear model
• A special case for Fisher scoring method
Algorithm

E-step

M-step

Online algorithm
• This algorithm can be used for online regression
• For Expert network:

where Rij is the inverse covariance matrix for EN(i,j)

Online algorithm
• For Gating network:

where Si is the inverse covariance matrix

and

where Sijis the inverse covariance matrix

Results
• Simulated data of a four-joint robot arm moving in three-dimensional space
Conclusions
• Introduce a tree-structured architecture for supervised learning
• Much faster than traditional back-propagation algorithm
• Can be used for on-line learning

Questions?