1 / 23

Statistical Learning Methods in HEAP

Statistical Learning Methods in HEAP. Jens Zimmermann, Christian Kiesling. Max-Planck-Institut für Physik, München MPI für extraterrestrische Physik, München Forschungszentrum Jülich GmbH. Statistical Learning: Introduction with a simple example Occam‘s Razor Decision Trees

Download Presentation

Statistical Learning Methods in HEAP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Learning Methods in HEAP Jens Zimmermann, Christian Kiesling Max-Planck-Institut für Physik, München MPI für extraterrestrische Physik, München Forschungszentrum Jülich GmbH Statistical Learning: Introduction with a simple example Occam‘s Razor Decision Trees Local Density Estimators Methods Based on Linear Separation Examples: Triggers in HEP and Astrophysics Conclusion C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  2. Statistical Learning • Does not use prior knowledge „No theory required“ • Learns only from examples „Trial and error“ „Learning by reinforcement“ • Two classes of statistical learning: discrete output 0/1: „classification“ continuous output: „regression“ • Application in High Energy- and Astro-Physics: Background suppression, purification of events Estimation of parameters not directly measured C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  3. A simple Example: Preparing a Talk # slides 0 1 2 3 4 5 6 x10 ExperimentalistsTheorists 0 1 2 3 4 5 6 x10 # formulas Data base established by Jens duringYoung Scientists Meeting at MPI C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  4. # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas Discriminating Theorists from Experimentalists: A First Analysis 0 2 4 6 x10 # formulas 0 2 4 6 x10 # slides Experimentalists Theorists First talks handed in Talks a week beforemeeting C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  5. Simple „model“, but no completeseparation Completely separable, but only via complicated boundary # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas First Problems New talk by Ludger: 28 formulas on 31 slides # slides 0 1 2 3 4 5 6 x10 At this point we cannot know which feature is „real“! Use Train/Test or Cross-Validation! 0 1 2 3 4 5 6 x10 # formulas C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  6. Training Set E Test Set # slides 0 1 2 3 4 5 6 x10 Overtraining Training epochs 0 1 2 3 4 5 6 x10 # formulas See Overtraining - Want Generalization Need Regularization Train Test Want to tune the parameters of the learning algorithm depending on the overtraining seen! C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  7. Training Set E Test Set # slides 0 1 2 3 4 5 6 x10 Training epochs Regularization will ensure adequate performance (e.g. VC dimensions):Limit the complexity of the model “Factor 10” - Rule: (“Uncle Bernie’s Rule #2”) 0 1 2 3 4 5 6 x10 # formulas See Overtraining - Want Generalization Need Regularization Train Test C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  8. Yes! But not of much use. No! „No free lunch“-theorem Wolpert 1996 Philosophy: Occam‘s Razor • Pluralitas non est ponenda sine necessitate. • Do not make assumptions, unless they are really • necessary. • From theories which describe the same phenomenon equally well • choose the one which contains the least number of assumptions. 14th century First razor: Given two models with the same generalization error, the simpler one should be preferred because simplicity is desirable in itself. Second razor: Given two models with the same training-set error, the simpler one should be preferred because it is likely to have lower generalization error. C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  9. # formulas #formulas < 20 exp #formulas > 60 th 0 2 4 6 x10 # slides #slides > 40 exp #slides < 40 th 0 2 4 6 x10 all events Classify Ringaile: 31 formulas on 32 slides #formulas > 60 #formulas < 20 rest th exp subset 20 < #formulas < 60 #slides < 40 #slides > 40 th th exp Decision Trees 20 < #formulas < 60 ? Regularization: Pruning C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  10. Local Density Estimators Search for similar events already classified within specified region, count the members of the two classes in that region. # slides 0 1 2 3 4 5 6 x10 # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas 0 1 2 3 4 5 6 x10 # formulas C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  11. # formulas # slides 0 2 4 6 x10 0 2 4 6 x10 31 32 out= Maximum Likelihood Regularization: Binning Correlation gets lost completely by projection! C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  12. k=2 out= k=3 out= k=4 out= k=5 out= k-Nearest-Neighbour k=1 out= # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas Regularization: Parameter k For every evaluation position the distances to each training position need to be determined! C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  13. 5 4 3 1 y 7 # slides 0 1 2 3 4 5 6 x10 6 5 8 3 x 10 7 8 6 Small box: checked 1,2,4,9 out= 0 1 2 3 4 5 6 x10 # formulas Large box: checked all out= Range Search 1 x 2 3 y y 9 6 4 5 8 x x 10 7 9 2 10 Regularization: Box-Size Tree needs to be traversed only partially if box size is small enough! C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  14. # slides 0 1 2 3 4 5 6 x10 # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas 0 1 2 3 4 5 6 x10 # formulas Methods Based on Linear Separation Divide the input space into regions separated by one or more hyperplanes. Extrapolation is done! LDA (Fisher discr.) C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  15. 1 0 -1.8 +3.6 +3.6 arbitrary inputs and hidden neurons 0 1 2 3 4 5 6 x10 -50 +20 +1.1 -1.1 +0.1 +0.2 # formulas # slides 0 1 2 3 4 5 6 x10 Neural Networks Network with two hidden neurons (gradient descent): Regularization: # hidden neurons weight decay C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  16. Separating hyperplane with maximum distance to each data point: Maximum margin classifier Found by setting up condition for correct classfication and minimizing which leads to the Lagrangian Necessary condition for a minimum is Output becomes No! Replace dot products: The mapping to feature space is hidden in a kernel Non-separable case: Support Vector Machines Only linear separation? C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  17. Physics Applications: Neural Network Trigger at HERA H1 keep physics reject background C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  18. Eff@Rej=95%: NN 99.6% SVM 98.3% k-NN 97.7% RS 97.5% C4.5 97.5% ML 91.2% LDA 82% Trigger for J/y Events H1 C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  19. Eff@Rej=80%: NN 74% SVM 73% C4.5 72% RS 72% k-NN 71% LDA 68% ML 65% Triggering Charged Current Events signal background C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  20. Astrophysics: MAGIC - Gamma/Hadron Separation Photon Hadron Training with Data and MC Evaluation with Data vs. s = signal (photon) enhancement factor Random Forest: s = 93.3 Neural Net: s = 96.5 C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  21. transfer direction ~10µm ~300µm electron potential s of reconstruction in µm NN 3.6 SVM 3.6 k-NN 3.7 RS 3.7 ETA 3.9 CCOM 4.0 Future Experiment XEUS: Position of X-ray Photons (Application of Stat. Learning in Regression Problems) XEUS C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  22. Conclusion • Statistical learning theory is full of subtle details (models statistics) • Widely used statistical learning methods studied: • Decision Trees • LDE: ML, k-NN, RS • Linear separation: LDA, Neural Nets, SVM‘s • Neural Networks found superior in the HEP and Astrophysics applications (classification, regression) studied so far • Further applications (trigger, offline analyses) under study C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

  23. k-NN RS 2 2 4 3 4 3 2 2 3 3 5 5 5 5 Fit Gauss NN a=s(-2.1x - 1)b=s(+2.1x - 1) out=s(-12.7a-12.7b+9.4) From Classification to Regression C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

More Related