1 / 26

Neural Networks

Neural Networks. The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo. Modeling the Human Brain. Input builds up on receptors (dendrites) Cell has an input threshold Upon breech of cell’s threshold, activation is fired down the axon. “Magical” Secrets Revealed.

quincy
Download Presentation

Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo

  2. Modeling the Human Brain • Input builds up on receptors (dendrites) • Cell has an input threshold • Upon breech of cell’s threshold, activation is fired down the axon.

  3. “Magical” Secrets Revealed • Linear features are derived from inputs • Target concept(s) are non-linear functions of features

  4. Outline • Projection Pursuit Regression • Neural Networks proper • Fitting Neural Networks • Issues in Training • Examples

  5. Projection Pursuit Regression • Generalization of 2-layer regression NN • Universal approximator • Good for prediction • Not good for deriving interpretable models of data

  6. Projection Pursuit Regression Output ridge functions & unit vectors Inputs

  7. PPR: Derived Features • Dot product is projection of onto • Ridge function varies only in the direction

  8. PPR: Training • Minimize squared error • Consider • Given , we derive features and smooth • Given , we minimize over with Newton’s Method • Iterate those two steps to convergence

  9. PPR: Newton’s Method • Use derivatives to iteratively improve estimate Least squares regression to hit the target

  10. PPR: Implementation Details • Suggested smoothing methods • Local regression • Smoothing splines • ‘s can be readjusted with backfitting • ‘s usually not readjusted • “( , ) pairs added in a forward stage-wise manner”

  11. Outline • Projection Pursuit Regression • Neural Networks proper • Fitting Neural Networks • Issues in Training • Examples

  12. Neural Networks

  13. NNs: Sigmoid and Softmax • Transforming activation to probability (?) • Sigmoid: • Softmax: • Just like multilogit model

  14. NNs: Training • We need an error function to minimize • Regression: sum squared error • Classification: cross-entropy • Generic approach: Gradient Descent (a.k.a. back propagation) • Error functions are differentiable • Forward pass to evaluate activations, backward pass to update weights

  15. Update rules: Back propagation equations: NNs: Back Propagation

  16. NNs: Back Propagation Details • Those were regression equations; classification equations are similar • Can be batch or online • Online learning rates can be decreased during training, ensuring convergence • Usually want to start weights small • Sometimes unpractically slow

  17. Outline • Projection Pursuit Regression • Neural Networks proper • Fitting Neural Networks • Issues in Training • Examples

  18. Issues in Training: Overfitting • Problem: might reach the global minimum of • Proposed solutions: • Limit training by watching the performance of a test set • Weight decay: penalizing large weights

  19. A Closer Look at Weight Decay Less complicated hypothesis has lower error rate

  20. Outline • Projection Pursuit Regression • Neural Networks proper • Fitting Neural Networks • Issues in Training • Examples

  21. Example #1: Synthetic Data More hidden nodes -> overfitting Multiple initial weight settings should be tried Radial function learned poorly

  22. 2 parameters to tune: Weight decay Hidden units Suggested training strategy: Fix either parameter where model is least constrained Cross validate other Example #1: Synthetic Data

  23. Example #2: ZIP Code Data • Yann LeCun • NNs can be structurally tailored to suit the data • Weight sharing: multiple units in a given layer will condition the same weights

  24. Example #2: 5 Networks • Net 1: No hidden layer • Net 2: One hidden layer • Net 3: 2 hidden layers • Local connectivity • Net 4: 2 hidden layers • Local connectivity • 1 layer weight sharing • Net 5: 2 hidden layers • Local connectivity • 2 layer weight sharing

  25. Example #2: Results • Net 5 does best • Small number of features identifiable throughout image

  26. Conclusions • Neural Networks are very general approach to both regression and classification • Effective learning tool when: • Signal / noise is high • Prediction is desired • Formulating a description of a problem’s solution is not desired • Targets are naturally distinguished by direction as opposed to distance

More Related