hierarchical mixture of experts
Download
Skip this Video
Download Presentation
Hierarchical Mixture of Experts

Loading in 2 Seconds...

play fullscreen
1 / 23

Hierarchical Mixture of Experts - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

Hierarchical Mixture of Experts. Presented by Qi An Machine learning reading group Duke University 07/15/2005. Outline. Background Hierarchical tree structure Gating networks Expert networks E-M algorithm Experimental results Conclusions. Background. The idea of mixture of experts

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Hierarchical Mixture of Experts' - channing-melendez


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
hierarchical mixture of experts

Hierarchical Mixture of Experts

Presented by Qi An

Machine learning reading group

Duke University

07/15/2005

outline
Outline
  • Background
  • Hierarchical tree structure
  • Gating networks
  • Expert networks
  • E-M algorithm
  • Experimental results
  • Conclusions
background
Background
  • The idea of mixture of experts
    • First presented by Jacobs and Hintons in 1988
  • Hierarchical mixture of experts
    • Proposed by Jordan and Jacobs in 1994
  • Difference from previous mixture model
    • Mixing weights depends on both the input and the output
one layer structure
One-layer structure

μ

Ellipsoidal Gating function

g1

g2

g3

Gating Network

x

μ1

μ2

μ3

Expert Network

Expert Network

Expert Network

x

x

x

hierarchical tree structure
Hierarchical tree structure

Linear Gating function

slide8
Expert network
  • At the leaves of trees

for each expert:

linear predictor

output of the expert

link function

For example: logistic function for binary classification

slide9
Gating network
  • At the nonterminal of the tree

top layer: other layer:

slide10
Output
  • At the non-leaves nodes

top node: other nodes:

probability model
Probability model
  • For each expert, assume the true output y is chosen from a distribution P with mean μij
  • Therefore, the total probability of generating y from x is given by
posterior probabilities
Posterior probabilities
  • Since the gij and gi are computed based only on the input x, we refer them as prior probabilities.
  • We can define the posterior probabilities with the knowledge of both the input x and the output y using Bayes’ rule
e m algorithm
E-M algorithm
  • Introduce auxiliary variables zij which have an interpretation as the labels that corresponds to the experts.
  • The probability model can be simplified with the knowledge of auxiliary variables
e m algorithm1
E-M algorithm
  • Complete-data likelihood:
  • The E-step
e m algorithm2
E-M algorithm
  • The M-step
slide16
IRLS
  • Iteratively reweighted least squares alg.
    • An iterative algorithm for computing the maximum likelihood estimates of the parameters of a generalized linear model
    • A special case for Fisher scoring method
algorithm
Algorithm

E-step

M-step

online algorithm
Online algorithm
  • This algorithm can be used for online regression
  • For Expert network:

where Rij is the inverse covariance matrix for EN(i,j)

online algorithm1
Online algorithm
  • For Gating network:

where Si is the inverse covariance matrix

and

where Sijis the inverse covariance matrix

results
Results
  • Simulated data of a four-joint robot arm moving in three-dimensional space
conclusions
Conclusions
  • Introduce a tree-structured architecture for supervised learning
  • Much faster than traditional back-propagation algorithm
  • Can be used for on-line learning
thank you

Thank you

Questions?

ad