1 / 9

4. Function Approximation Theory

Cybenko 88, Funahashi, Hornik 89 : One hidden layer is sufficient. Sigmoid. Nondecreasing, Continuous/Discrete. Unit Hypercube, = Space of Continous Functions. Given any ,  > 0, there exist such that. Ex. MLP. for all.

drago
Download Presentation

4. Function Approximation Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cybenko 88, Funahashi, Hornik 89 : One hidden layer is sufficient Sigmoid Nondecreasing, Continuous/Discrete Unit Hypercube, = Space of Continous Functions. Given any ,  > 0, there exist such that Ex. MLP for all 4. Function Approximation Theory Ref. Haykin (1) Universal Approximation Theorem(Existence) (.) = nonconstant, bounded, monotone increasing, continuous

  2. Barron : Large M for best approximation Small M/N for better empirical fit M may be small if Cf is small. Conflict (2) Mean Squared Error R : ability to generalize / N : # of Training Examples M: # hidden neurons / p : input dimension = regularity or smoothness of f ` , Given mean square error Scalability: The MLP Error is independent of the size of the input space and scales as inverse of the #hidden nodes for large training sets. MLPs are suited to large input Dim. Problems.

  3. (3) Practical Considerations Funahashi : 2 hidden layers are more manageable, In a 1 layer, interaction between neurons makes it difficult to learn. (Hidden= salient feature detector) Sontag : 2 Hidden layers are sufficient to find f -1 from any f ( Inverse problem ) x(n) = f (x(n-1), u(n)) u(n) = f -1(x(n-1), x(n))

  4. u(n) x(n-1) x(n) yi xi Example : Robot Dynamics and Kinematics • Dynamics: x = state u = joint torque • Kinematics: x = Cartesian Position, u = Joint Position (4) Network Jacobian: If the NN has learned the function, then it has learned its Jacobian, too.

  5. Students’ Questions 1. Can the FF net learning take place concurrently (or synchronously) ? 2. Is M = Cf ? 3. Mathematical meaning of the step size, η ? 4. Why do we use the mean square error instead of the first, third order, etc . ? 5. What if more than 2 hidden layers are used ? Any problem then ?

  6. 6. BP becomes simple through Jacobian, right ? 7. What do you mean by dividing the input space into local regions ? 8. Where else is Jacobian used, other than in kinematics ? 9. The error can be defined for the global feature by comparing against the desired output. Does each local feature also need a desired output ?

  7. In BackProp, In Network Jacobian Change Error function from EP to Ej Apply BP to xi as weight 1 xi yj NN

More Related