1 / 29

Neural networks

Neural networks. Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning. Neural network. K-class classification: K nodes in top layer Continuous outcome: Single node in top layer. Neural network. K-class classification.

emele
Download Presentation

Neural networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neural networks Introduction Fitting neural networks Going beyond single hidden layer Brief discussion of deep learning

  2. Neural network K-class classification: K nodes in top layer Continuous outcome: Single node in top layer

  3. Neural network K-class classification. Zm are created from linear combinations of the inputs, Yk is modeled as a function of linear combinations of the Zm For classification, can use a simple gk(T) =Tk.

  4. Neural network

  5. Neural network A simple network with linear functions. “bias”: intercept y1: x1 + x2 + 0.5 ≥ 0 y2: x1 +x2 −1.5 ≥0 z1 = +1 if and only if both y1 and y2 have value +1

  6. Neural network

  7. Neural network

  8. Fitting Neural Networks Set of parameters (weights): Objective function: Regression: Classification: cross-entropy (deviance)

  9. Fitting Neural Networks minimizing R(θ) is by gradient descent, called “back-propagation” Middle-layer values for each data point: We use the square error loss for demonstration:

  10. Fitting Neural Networks Derivatives: Descent along the gradient: k m l i: observation index :earning rate

  11. Fitting Neural Networks By definition

  12. Fitting Neural Networks General workflow of back-propagation: Forward: fix weights and compute Backward: compute back propagate to compute use both to compute the gradients for the updates update the weights

  13. Fitting Neural Networks Can use parallel computing - each hidden unit passes and receives information only to and from units that share a connection. Online training the fitting scheme allows the network to handle very large training sets, and also to update the weights as new observations come in. Training neural network is an “art” – the model is generally overparametrized optimization problem is nonconvex and unstable A neural network model is ablackboxand hard to directly interpret

  14. Fitting Neural Networks • Initiation • When weight vectors are close to length zero • all Z values are close to zero. • The sigmoid curve is close to linear. • the overall model is close to linear. • a relatively simple model. • (This can be seen as a regularized solution) • Start with very small weights. • Let the neural network learn necessary nonlinear relations from the data. • Starting with large weights often leads to poor solutions.

  15. Fitting Neural Networks Overfitting The model is too flexible, involving too many parameters. May easily overfit the data. Early stopping – do not let the algorithm converge. Because the model starts with linear, this is a regularized solution (towards linear). Explicit regularization (“weight decay”) – minimize tends to shrink smaller weights more. Cross-validation is used to estimate λ.

  16. Fitting Neural Networks

  17. Fitting Neural Networks

  18. Fitting Neural Networks Number of Hidden Units and Layers Too few – might not have enough flexibility to capture the nonlinearities in the data Too many – overly flexible, BUT extra weights can be shrunk toward zero if appropriate regularization is used. ✔ Typical range: 5-100 Cross-validation can be used. It may not be necessary if cross-validation is used to tune the regularization parameter.

  19. Examples “A radial function is in a sense the most difficult for the neural net, as it is spherically symmetric and with no preferred directions.”

  20. Examples

  21. Examples

  22. Going beyond single hidden layer A benchmark problem: classification of handwritten numerals.

  23. Going beyond single hidden layer 5x5  1 3x3  1 No weight sharing 5x5  1 weight shared each of the units in a single 8 × 8 feature map share the same set of nine weights (but have their own bias parameter) 3x3  1 same operation on different parts

  24. Going beyond single hidden layer

  25. Going beyond single hidden layer

  26. Deep learning Data Features Model • Finding the correct features is critical in the success. • Kernels in SVM • Hidden layer nodes in neural network • Predictor combinations in RF • A successful machine learning technology needs to be able to extract useful features (data representations) on its own. • Deep learning methods: • Composition of multiple non-linear transformations of the data • Goal: more abstract – and ultimately more useful representations IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828

  27. Deep learning IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828

  28. Deep learning Has to learn high level abstract concepts from data. Ex: Wheels of a car. Eye, nose, etc. of a face Be very resistant to irrelevant information. Ex: Car’s orientation Nature 505, 146–148 (09 January 2014)

  29. Deep learning • Major areas of application • Speech Recognition and Signal Processing • Object Recognition • Natural Language Processing • …… • So far in bioinformatics • Training data size (subjects) is still too small compared to the number of variables (N<<p issue) • Could be applied when human selection of variables is done first. • Biological knowledge, in the form of existing networks, are already explicitly used, instead of being learned from data. They are hard to beat with a limited amount of data. IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828

More Related