1 / 68

Sieci neuronowe – bezmodelowa analiza danych?

Sieci neuronowe – bezmodelowa analiza danych?. K. M. Graczyk IFT, Uniwersytet Wrocławski Poland. Abstract.

asabi
Download Presentation

Sieci neuronowe – bezmodelowa analiza danych?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sieci neuronowe – bezmodelowa analiza danych? K. M. Graczyk IFT, Uniwersytet Wrocławski Poland

  2. Abstract • Podczas seminarium opowiem o zastosowaniu jednokierunkowych sieci neuronowych do analizy danych eksperymentalnych. W szczególności skupię uwagę na podejściu bayesowskim, które pozwala na klasyfikację i wybór najlepszej hipotezy badawczej. Metoda ta ma w naturalny sposób wbudowane tzw. kryterium „brzytwy Ockhama”, preferujące modele o mniejszym stopniu złożoności. Dodatkowym atutem podejścia jest brak wymogu używania tzw. zbioru testowego do weryfikacji procesu uczenia. • W drugiej części seminarium omówię własną implementacje sieci neuronowej, zawierającą metody uczenia bayesowskiego. Na zakończenie pokaże moje pierwsze zastosowania w analizie danych rozproszeniowych.

  3. Why Neural Networks? • Look at Electromagnetic Form Factor data • Simple • Strightforward • Then attac more serious problems • Inspired by C. Giunti (Torino) • Papers of Forte et al.. (JHEP 0205:062,200, JHEP 0503:080,2005, JHEP 0703:039,2007, Nucl.Phys.B809:1-63,2009). • A kind of model independet way of fitting data and computing assiosiated uncertienty. • Cooperation with R. Sulej (IPJ, Warszawa) and P. Płoński (Politechnika Warszawska) • NetMaker • GrANNet ;) my own C++ library

  4. Road map • Artificial Neural Networks (NN) – idea • FeedForward NN • Bayesian statistics • Bayesian approach to NN • PDF’s by NN • GrANNet • Form Factors by NN

  5. Inspired by Nature

  6. Aplications, general list • Function approximation, or regression analysis, including time series prediction, fitness approximation and modeling. • Classification, including pattern and sequence recognition, novelty detection and sequential decision making. • Data processing, including filtering, clustering, blind source separation and compression. • Robotics, including directing manipulators, Computer numerical control.

  7. Artificial Neural Network Output, target Input layer Hidden layer

  8. weights i-th perceptron activation function output input Summing threshold

  9. Q2 Q2 F2 s e x GM Q2 A map from one vector space to another

  10. Neural Networks • The universal approximation theorem for neural networks states that every continuous function that maps intervals of real numbers to some output interval of real numbers can be approximated arbitrarily closely by a multi-layer perceptron with just one hidden layer. This result holds only for restricted classes of activation functions, e.g. for the sigmoidal functions. (Wikipedia.org)

  11. sigmoid tanh(x) Feed-Forward-Network activation function • Heavside function q(x) •  0 or 1 signal • Sigmoid function • Tanh()

  12. architecture • 3-layers network, two hidden: • 1:2:1:1 • 2+2+1 + 1+2+1: #par=9: Bias neurons, instead of thresholds G(Q2) Q2 Linear Function Symmetric Sigmoid Function

  13. Supervised Learning • Propose the Error Function (Standard Error Function, chi2, etc, …, any continous function which has a global minimum) • Consider set of the data • Train given network with data  marginalize the error function • Back propagation algorithms • Iterative procedure which fixes weights

  14. Learning • Gradient Algorithms • Gradient descent • QuickProp (Fahlman) • RPROP (Ridmiller & Braun) • Conjugate gradients • Levenberg-Marquardt (hessian) • Newtonian method (hessian) • Monte Carlo algorithms (based on the Marcov chain algorithm)

  15. Overfitting • More complex models describe data in better way, but lost generalities • bias-variance trade-off • After fitting one needs to compare with the test set (must twice larger than original) • Overfitting  large values of the wigths • Regularization  additional penalty term to error function

  16. Fitting data with Artificial Neural Networks ‘The goal of the network training is not to learn on exact representation of the training data itself, but rather to built statistical model for the process which generates the data’ C. Bishop, Neural Networks for Pattern Recognation

  17. Q2 F2 x Parton Distribution Function with NN Some method but…

  18. Parton Distributions Functions S. Forte, L. Garrido, J. I. Latorre and A. Piccione, JHEP 0205 (2002) 062 • A kind of model independent analysis of the data • Construction of the probability density P[G(Q2)] in the space of the structure functions • In practice only one Neural Network architecture • Probability density in the space of parameters of one particular NN But in reality Forte at al.. did

  19. Generating Monte Carlo pseudo data The idea comes from W. T. Giele and S. Keller Training Nrep neural networks, one for each set of Ndat pseudo-data The Nrep trained neural networks  provide a representation of the probability measure in the space of the structure functions

  20. uncertainty correlation

  21. 10, 100 and 1000 replicas

  22. short enough long too long 30 data points, overfitting

  23. My criticism • Artificial data, and chi2 error function  overestimate error function? • Do not discuss other architectures? • Problems with overfitting?

  24. Form Factors with NN, done with FANN library Applying Forte et al..

  25. How to apply NN to the ep data • First stage: checking if the NN are able to work on the reasonable level • GE and GM and Ratio separately • Input Q2  output Form Factor • The standard error function • GE: 200 points • GM: 86 points • Ratio: 152 points • Combination of the GE, GM, and Ratio • Input Q2 output GM and GE • The standard error function: a sum of three functions • GE+GM+Ratio: around 260 points • One needs to constrain the fits by adding some artificial points with GE(0)=GM(0)/mp=1

  26. GMp

  27. GMp

  28. GMp Neural Networks Fit with TPE (our work)

  29. GEp

  30. GEp

  31. Ratio

  32. GEn

  33. GEn

  34. GMn

  35. GMn

  36. Bayesian Approach ‘common sense reduced to calculations’

  37. Bayesian Framework for BackProp NN, MacKay, Bishop,… • Objective Criteria for comparing alternative network solutions, in particular with different architectures • Objective criteria for setting decay rate a • Objective choice of reularising function Ew • Comparing with test data is not requiered.

  38. Data point, vector input, vector Network response Data set Number of data points Number of data weights Notation and Conventions

  39. Probability of D given Hi Normalizing constatnt Model Classification • A collection of models, H1, H2, …, Hk • We belive that models are classified by P(H1), P(H2), …, P(Hk) (sum to 1) • After observing data D  Bayes’ rule  • Usually at the beginning P(H1)=P(H2)= …=P(Hk)

  40. Single Model Statistics • Assume that model Hi is correct one • The neural network A with weights w is considered • Task 1: Assuming some prior probability of w, construct Posterior after including data

  41. Hierarchy

  42. wMP Constructing prior and posterior function Weight distribution!!! likelihood Prior Posterior probability w0

  43. Computing Posterior hessian Covariance matrix

  44. How to fix proper a • Two ideas: • Evidence Approximation (MacKay) • Hirerchical • Find wMP • Find aMP • Perform analitically integrals over a If sharply peaked!!!

  45. Getting aMP The effective number of well-determined parameters Iterative procedure during training

  46. Bayesian Model Comparison – Occam Factor Occam Factor • The log of Occam Factor  amount of • Information we gain after data have arraived • Large Occam factor  complex models • larger accesible phase space (larger range of posterior) • Small Occam factor  simple models • larger accesible phase space (larger range of posterior) Best fit likelihood

  47. Q2 Misfit of the interpolant data x Occam Factor – Penalty Term Symmetry Factor F2 Tanh(.) change w sign Evidence

  48. What about cross sections • GE and GM simultaneously, • Input Q2 and e  cross sections • Standard error function • the chi-2-like function, with the covariance matrix obtained from the Rosenbluth separation • Possibilities: • The set of Neural Networks becomes a natural distribution of the differential cross sections • One can produce artificial data in the wide range of the epsilon and perform the Rosenbluth separation, searching the nonlinearities of sR in the epsilon dependence.

  49. Q2 e GM GE TPE What about TPE? • Q2, epsilon  GE, GM and TPE? • In the perfect case the changeof the epsilon should not affect the GE and GM. • training by the NN by series of the artificial cross section data with fixed epsilon? • Collecting data in the epsilon bins, and Q2 bins, then showing network the set of data with particular epsilon in the wide range of Q2.

More Related