1 / 27

Defeating the Black Box – Neural Networks in HEP Data Analysis

Defeating the Black Box – Neural Networks in HEP Data Analysis. Jan Therhaag (University of Bonn) TMVA Workshop @ CERN, January 21 st , 2011. T MVA on the web: http:// tmva.sourceforge.net /. TexPoint fonts used in EMF.

chava
Download Presentation

Defeating the Black Box – Neural Networks in HEP Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Defeating the Black Box – Neural Networks in HEP Data Analysis Jan Therhaag (University of Bonn) TMVA Workshop @ CERN, January 21st, 2011 TMVA on the web: http://tmva.sourceforge.net/ TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA

  2. The Problem …

  3. The single neuron as a classifier

  4. A simple approach: • Code the classes as a binary variable (here: blue= 0,orange= 1) • Perform a linear fit to this discrete function • Define the decision boundary by • //###################################################################################### //TMVA code //###################################################################################### //create Factory TMVA::Factory *factory = new TMVA::Factory(“TMVAClassification”,outputfile,”AnalysisType=Classification”) factory->AddVariable(“x1”,”F”); • factory->AddVariable(“x2”,”F”); • //book linear discriminant classifier (LD) • factory->BookMethod(TMVA::Types::kLD,”LD”); • Factory->TrainAllMethods(); • Factory->TestAllMethods(); • Factory->EvaluateAllMethods();

  5. Now consider the sigmoid transformation: • has values in [0,1] and can be interpreted as the probability p(orange | x) (then obviously p(blue| x) = 1- p(orange | x) = )

  6. We have just invented the neuron! • is called the activity of the neuron, while is called the activation

  7. The idea of neuron training – searching the weight space

  8. The training proceeds via minimization of the error function • The neuron learns via gradient descent* • Examples may be learned one-by-one (online learning) or all at once (batch learning) • Overtraining may occur! *more sophisticated techniques may be used

  9. Network training and regularization

  10. The class of networks used for regression and classification tasks is called feedforward networks • Neurons are organized in layers • The output of a neuron in one layer becomes the input for the neurons in the next layer • //###################################################################################### //TMVA code //###################################################################################### //create Factory TMVA::Factory *factory = new TMVA::Factory(“TMVAClassification”,outputfile,”AnalysisType=Classification”) factory->AddVariable(“x1”,”F”); • factory->AddVariable(“x2”,”F”); • //book Multi Layer Perceptron(MLP) network and definde network architecture • factory->BookMethod(TMVA::Types::kMLP,”MLP”,”NeuronType=sigmoid:HiddenLayers=N+5,N”); • Factory->TrainAllMethods(); • Factory->TestAllMethods(); • Factory->EvaluateAllMethods();

  11. training data • Feedforward networks are universal approximators • Any continuous function can be approximated with arbitratry precision • The complexity of the output function is determined by the number of hidden units and the characteristic magnitude of the weights

  12. From neuron training to network training - backpropagation • In order to find the optimal set of weights w, we have to calculate the derivatives • Recall the single neuron: • It turns out that: with for output neurons and else While input information is always propagated forward, errors are propagated backwards!

  13. Some issues in network training • The error function has several minima, the result of the minimization typically depends on the starting values of the weights • The scaling of the inputs has an effect on the final solution • //###################################################################################### //TMVA code //###################################################################################### //create Factory TMVA::Factory *factory = new TMVA::Factory(“TMVAClassification”,outputfile,”AnalysisType=Classification”) factory->AddVariable(“x1”,”F”); • factory->AddVariable(“x2”,”F”); • //book Multi Layer Perceptron(MLP) network with normalized input distributions • factory->BookMethod(TMVA::Types::kMLP,”MLP”,”RandomSeed=1:VarTransform=N”); • Factory->TrainAllMethods(); • Factory->TestAllMethods(); • Factory->EvaluateAllMethods(); • Overtraining • bad generalization and overconfident predictions NN with 10 hidden units

  14. Regularization and early stopping • Early stopping: Stopping the training before the minimum of E(w) is reached • a validation data set is needed • convergence is monitored in TMVA • Weight decay: Penalize large weights explicitly • //###################################################################################### //TMVA code //###################################################################################### //create Factory TMVA::Factory *factory = new TMVA::Factory(“TMVAClassification”,outputfile,”AnalysisType=Classification”) factory->AddVariable(“x1”,”F”); • factory->AddVariable(“x2”,”F”); • //book Multi Layer Perceptron(MLP) network with regulariaztion • factory->BookMethod(TMVA::Types::kMLP,”MLP”,”NCycles=500:UseRegulator”); • Factory->TrainAllMethods(); • Factory->TestAllMethods(); • Factory->EvaluateAllMethods(); NN with 10 hidden units and λ=0.02

  15. Network complexity vs. regularization • Unless prohibited by computing power, a large number of hidden units H is to be preferred • no ad hoc limitation of the model • In the limits of , network complexity is entirely determined by the typical size of the weights Output

  16. Advanced Topics Network learning as inference and Bayesian neural networks

  17. Network training as inference • Reminder: Given the network output , the error function is just minus the log likelihood of the training data D • Similarly, we can interpret the weight decay term as a log probability distribution for w • Obviously, there is a close connection between the regularized error function and the inference for the network parameters likelihood prior normalization

  18. Predictions and confidence • Minimizing the error corresponds to finding the most probable value which is used to make predictions • Problem: Predictions for points in regions less populated by the training data may be to confident Can we do better?

  19. Using the posterior to make predictions • Instead of using , we can also exploit the full information in the posterior

  20. Using the posterior to make predictions • Instead of using , we can also exploit the full information in the posterior See Jiahang’s talk this afternoon for details of the Bayesian approach to NN in the TMVA framework!

  21. A full Bayesian treatment • In a full Bayesian framework, the hyperparameter(s) λareestimatedfromthedatabymaximizingtheevidence • notestdatasetisneeded • neuralnetworktunesitself • relevanceofinput variables canbetested (automaticrelevancedetermination ARD) • Simultaneous optimization of parameters and hyperparameters is technically challenging • TMVA uses a clever approximation model complexity model complexity

  22. Summary (1) * A neuron can be understood as an extension of a linear classifier* A neural net consists of layers of neurons, input information always propagates forward, errors propagate backwards* Feedforward networks are universal approximators* The model complexity is governed by the typical weight size, which can be controlled by weight decay or early stopping* In the Bayesian framework, error minimization corresponds to inference and regularization corresponds to the choice of a prior for the parameters* The Bayesian approach makes use to the full posterior and gives better predictive power* The amount of regularization can be learned from the data by maximizing the evidence

  23. Summary (2) Current features of the TMVA MLP:* Support for regression, binary and multiclass classification (new in 4.1.0 !)* Efficient optional preprocessing (Gaussianization, normalization) of the input distributions * Optional regularization to prevent overtraining+ efficient approximation of the posterior distribution of the network weights + self adapting regulator + error estimationFuture development in TMVA:* Automatic relevance determination for input variables * Extended automatic model (network architecture) comparison Thank you!

  24. References Figures taken from:David MacKay: “Information Theory, Inference and Learning Algorithms”Cambridge University Press 2003Christopher Bishop: “Pattern Recognition and Machine Learning”Springer 2006Hastie, Tibshirani, Friedman: “The Elements of Statistical Learning”, 2nd Ed.Springer 2009These books are also recommended for further reading on neural networks

More Related