1 / 42

Sieci neuronowe – bezmodelowa analiza danych?

Sieci neuronowe – bezmodelowa analiza danych?. K. M. Graczyk IFT, Uniwersytet Wrocławski Poland. Why Neural Networks?. Inspired by C. Giunti (Torino) PDF’s by Neural Network Papers of Forte et al.. ( JHEP 0205:062,200, JHEP 0503:080,2005, JHEP 0703:039,2007, Nucl.Phys.B809:1-63,2009 ).

duponte
Download Presentation

Sieci neuronowe – bezmodelowa analiza danych?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sieci neuronowe – bezmodelowa analiza danych? K. M. Graczyk IFT, Uniwersytet Wrocławski Poland

  2. Why Neural Networks? • Inspired by C. Giunti (Torino) • PDF’s by Neural Network • Papers of Forte et al.. (JHEP 0205:062,200, JHEP 0503:080,2005, JHEP 0703:039,2007, Nucl.Phys.B809:1-63,2009). • A kind of model independent way of fitting data and computing associated uncertainty • Learn, Implement, Publish (LIP rule) • Cooperation with R. Sulej (IPJ, Warszawa) and P. Płoński (Politechnika Warszawska) • NetMaker • GrANNet ;) my own C++ library

  3. Road map • Artificial Neural Networks (NN) – idea • FeedForward NN • PDF’s by NN • Bayesian statistics • Bayesian approach to NN • GrANNet

  4. Inspired by Nature The human brain consists of around 1011 neurons which are highly interconnected with around 1015 connections

  5. Applications • Function approximation, or regression analysis, including time series prediction, fitness approximation and modeling. • Classification, including pattern and sequence recognition, novelty detection and sequential decision making. • Data processing, including filtering, clustering, blind source separation and compression. • Robotics, including directing manipulators, Computer numerical control.

  6. Output, target Input layer Hidden layer Feed Forward Artificial Neural Network the simplest example  Linear Activation Functions  Matrix

  7. weights i-th perceptron activation function output Summing input threshold

  8. sigmoid qth tanh(x) threshol activation functions • Heavside function q(x) •  0 or 1 signal • sigmoid function • tanh() • linear signal is amplified Signal is weaker

  9. architecture • 3 -layers network, two hidden: • 1:2:1:1 • 2+2+1 + 1+2+1: #par=9: • Bias neurons, instead of thresholds • Signal One F(x) x Linear Function Symmetric Sigmoid Function

  10. Neural Networks – Function Approximation • The universal approximation theorem for neural networks states that every continuous function that maps intervals of real numbers to some output interval of real numbers can be approximated arbitrarily closely by a multi-layer perceptron with just one hidden layer. This result holds only for restricted classes of activation functions, e.g. for the sigmoidal functions. (Wikipedia.org)

  11. Q2 Q2 F2 s e x A map from one vector space to another

  12. Supervised Learning • Propose the Error Function • in principle any continuous function which has a global minimum • Motivated by Statistics: Standard Error Function, chi2, etc, … • Consider set of the data • Train given NN by showing the data  marginalize the error function • back propagation algorithms • An iterative procedure which fixes weights

  13. Learning Algorithms • Gradient Algorithms • Gradient descent • RPROP (Ridmiller & Braun) • Conjugate gradients • Look at curvature • QuickProp (Fahlman) • Levenberg-Marquardt (hessian) • Newtonian method (hessian) • Monte Carlo algorithms (based on Marcov chain algorithm)

  14. Overfitting • More complex models describe data in better way, but lost generalities • bias-variance trade-off • Overfitting  large values of the weights • Compare with the test set (must be twice larger than original) • Regularization  additional penalty term to error function Decay rate

  15. Data Still Moreprecise than Theory • PDF Nature Observation Measurements Physics given directly by the data Idea Statistics Data free parameters Most of Models model QCD nonoperative Nonparametric QED What about physics Problems Some general constraints Model Independent Analysis Statistical Model  data  Uncertainty of the predictions

  16. Fitting data with Artificial Neural Networks ‘The goal of the network training is not to learn on exact representation of the training data itself, but rather to built statistical model for the process which generates the data’ C. Bishop, ‘Neural Networks for Pattern Recognition’

  17. Q2 F2 x Parton Distribution Function with NN Some method but…

  18. Parton Distributions Functions S. Forte, L. Garrido, J. I. Latorre and A. Piccione, JHEP 0205 (2002) 062 • A kind of model independent analysis of the data • Construction of the probability density P[G(Q2)] in the space of the structure functions • In practice only one Neural Network architecture • Probability density in the space of parameters of one particular NN But in reality Forte at al.. did

  19. Generating Monte Carlo pseudo data The idea comes from W. T. Giele and S. Keller Training Nrep neural networks, one for each set of Ndat pseudo-data The Nrep trained neural networks  provide a representation of the probability measure in the space of the structure functions

  20. uncertainty correlation

  21. 10, 100 and 1000 replicas

  22. short enough long too long 30 data points, overfitting

  23. My criticism • The simultaneous use of artificial data and chi2 error function overestimates uncertainty? • Do not discuss other NN architectures • Problems with overfitting (a need of test set) • Relatively simple approach, comparing with the present techniques in NN computing. • The uncertainty of the model predictions must be generated by the probability distribution obtained for the model then the data itself

  24. GraNNet – Why? • I stole some ideas from FANN • C++ Library, easy in use • User defined Error Function (any you wish) • Easy access to units and their weights • Several ways for initiating network of given architecture • Bayesin learning • Main objects: • Classes: NeuralNetwork, Unit • Learning algorithms: so far QuickProp, Rprop+, Rprop-, iRprop-, iRprop+,…, • Network Response Uncertainty (based on Hessian) • Some restarting and stopping simple solutions

  25. Structure of GraNNet • Libraries: • Unit class • Neural_Network class • Activation (activation and error function structures) • Learning algorithms • RProp+, RProp-, iRProp+, RProp-, Quickprop, Backprop • generatormt • TNT inverse matrix package

  26. Bayesian Approach ‘common sense reduced to calculations’

  27. Bayesian Framework for BackProp NN, MacKay, Bishop,… • Objective Criteria for comparing alternative network solutions, in particular with different architectures • Objective criteria for setting decay rate a • Objective choice of regularizing function Ew • Comparing with test data is not required.

  28. Data point, vector input, vector Network response Data set Number of data points Number of data weights Notation and Conventions

  29. Probability of D given Hi Normalizing constatnt Model Classification • A collection of models, H1, H2, …, Hk • We believe that models are classified by P(H1), P(H2), …, P(Hk) (sum to 1) • After observing data D  Bayes’ rule  • Usually at the beginning P(H1)=P(H2)= …=P(Hk)

  30. Single Model Statistics • Assume that model Hi is the correct one • The neural network A with weights w is considered • Task 1: Assuming some prior probability of w, after including data, construct Posterior • Task 2: consider the space of hypothesis and construct evidence for them

  31. Hierarchy

  32. wMP Constructing prior and posterior functions Weight distribution!!! likelihood Prior Posterior probability w0

  33. Computing Posterior hessian Covariance matrix

  34. How to fix proper a? • Two ideas: • Evidence Approximation (MacKay) • Hierarchical • Find wMP • Find aMP • Perform analytically integrals over a If sharply peaked!!!

  35. Getting aMP The effective number of well-determined parameters Iterative procedure during training

  36. Bayesian Model Comparison – Occam Factor Occam Factor • The log of Occam Factor  amount of • Information we gain after data have arrived • Large Occam factor  complex models • larger accessible phase space (larger range of posterior) • Small Occam factor  simple models • small accessible phase space (larger range of posterior) Best fit likelihood

  37. Q2 Misfit of the interpolant data x Occam Factor – Penalty Term Symmetry Factor F2 Tanh(.) change w sign Evidence

  38. Network 121 preferred by data Occam hill

  39. 131 network preferred by data

  40. 131 seems to be preferred by the data

More Related