Loading in 5 sec....

Hierarchical Mixture of ExpertsPowerPoint Presentation

Hierarchical Mixture of Experts

- 90 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Hierarchical Mixture of Experts' - channing-melendez

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Hierarchical Mixture of Experts

### Thank you

Presented by Qi An

Machine learning reading group

Duke University

07/15/2005

Outline

- Background
- Hierarchical tree structure
- Gating networks
- Expert networks
- E-M algorithm
- Experimental results
- Conclusions

Background

- The idea of mixture of experts
- First presented by Jacobs and Hintons in 1988

- Hierarchical mixture of experts
- Proposed by Jordan and Jacobs in 1994

- Difference from previous mixture model
- Mixing weights depends on both the input and the output

One-layer structure

μ

Ellipsoidal Gating function

g1

g2

g3

Gating Network

x

μ1

μ2

μ3

Expert Network

Expert Network

Expert Network

x

x

x

Hierarchical tree structure

Linear Gating function

- At the leaves of trees
for each expert:

linear predictor

output of the expert

link function

For example: logistic function for binary classification

- At the nonterminal of the tree
top layer: other layer:

- At the non-leaves nodes
top node: other nodes:

Probability model

- For each expert, assume the true output y is chosen from a distribution P with mean μij
- Therefore, the total probability of generating y from x is given by

Posterior probabilities

- Since the gij and gi are computed based only on the input x, we refer them as prior probabilities.
- We can define the posterior probabilities with the knowledge of both the input x and the output y using Bayes’ rule

E-M algorithm

- Introduce auxiliary variables zij which have an interpretation as the labels that corresponds to the experts.
- The probability model can be simplified with the knowledge of auxiliary variables

E-M algorithm

- Complete-data likelihood:
- The E-step

E-M algorithm

- The M-step

IRLS

- Iteratively reweighted least squares alg.
- An iterative algorithm for computing the maximum likelihood estimates of the parameters of a generalized linear model
- A special case for Fisher scoring method

Online algorithm

- This algorithm can be used for online regression
- For Expert network:
where Rij is the inverse covariance matrix for EN(i,j)

Online algorithm

- For Gating network:
where Si is the inverse covariance matrix

and

where Sijis the inverse covariance matrix

Results

- Simulated data of a four-joint robot arm moving in three-dimensional space

Conclusions

- Introduce a tree-structured architecture for supervised learning
- Much faster than traditional back-propagation algorithm
- Can be used for on-line learning

Questions?

Download Presentation

Connecting to Server..