1 / 28

Talking Statistics  Impressions from the ATLAS Statistics WS, Jan 2007 

Talking Statistics  Impressions from the ATLAS Statistics WS, Jan 2007 . Andreas H oecker (CERN) CAT Physics meeting, Feb 9, 2007. Preliminary Remarks. Main statistical topics of importance for HEP data analysis.

Download Presentation

Talking Statistics  Impressions from the ATLAS Statistics WS, Jan 2007 

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Talking Statistics  Impressions from the ATLAS Statistics WS, Jan 2007  Andreas Hoecker (CERN) CAT Physics meeting, Feb 9, 2007 A. Hoecker: Statistical Issues

  2. Preliminary Remarks • Main statistical topics of importance for HEP data analysis • Avoid biases (statistics is science – its correct use is not a question of taste !) • Choose optimised approaches (under all aspects, i.e., including systematics) A. Hoecker: Statistical Issues

  3. P r e l i m i n a r i e s A. Hoecker: Statistical Issues

  4. Significance G. Cowan, Introduction • p-value: P(data reject H0|H0), where H0 null hypothesis • Probability of getting a value of test statistic more signal-like than that observed, if the null hypothesis is true • In frequentist statistics one cannot talk about P(H0), unless H0 is a repeatable observation • The p-value is equal to the significance level of the test for which we would only reject the null hypothesis. The p-value is compared with the significance level and, if it is smaller, the result is significant. • Define beforehand what leads to an exclusion of the null hypothesis • One-sided p-value: e.g., only N > N[H0] leads to exclusion • Two-sided p-value: e.g., any deviation from N[H0] leads to exclusion • For Gaussian test statistics: pone-sided = 0.5×ptwo-sided A. Hoecker: Statistical Issues

  5. Kinds of Errors in Statistical Interpretation G. Cowan, Introduction S. Caron, Search strategies … • Type-I error: reject null hypothesis though it is true • Frequency of Type-I errors rises with number of trials • Naïve p-value must be corrected: for n independent trials: pcorr = 1– (1 – p)n • Frequency of Type-I errors independent of analysis optimisation (unless prior information can be exploited) “Look else-where effect” • Type-II error: accept null hypothesis though it is not true • The frequency of Type-II errors depends on analysis optimisation • It may decrease with number of trials • Goal: minimise frequency of Type-I and Type-II errors • Focus search using prior information (e.g., SUSY may have large ET,miss) • Optimise the sensitivity of the analysis using prior information A. Hoecker: Statistical Issues

  6. Frequentist versus Subjective (Bayesian) G. Cowan, Introduction • Frequentist probability defines an event's probability as the limit of its relative frequency in a large number of trials • The true outcome of an event is fixed but not known and cannot be known • The tools of frequentist statistics tell us what to expect, under the assumption of certain probabilities, about hypothetical repeated observations • Frequentist confidence levels (CLs) are straightforwardly obtained from toy MC samples • The nuisance parameters in these toys must be set such that the lowest CLs are obtained • Confidence levels determine exclusion probabilities. If in presence of nuisance parameters a measurement gave x ± , this does not mean that x is the most probable value ! • Subjective Bayesian statistics gives the probability of x to take some value • It is the result of a convolution of input PDFs for all observables and nuisance parameters • The “posterior” result is subjective w.r.t. the arbitrary prior PDFs, bounds and parameterisations used • It is extremely difficult to reproduce a Bayesian result w/o having all the subjective details A. Hoecker: Statistical Issues

  7. A Frequentist Analysis G. Cowan, Introduction • The principles of a frequentist analysis are simple: • Define a test statistics, e.g.: • a Likelihood estimator • a multivariate analyser output • Your age • Throw toy experiments and determine the p-value to achieve an as extreme or more extreme value than the one found in the data • Examples: • exclusion analysis, Nobs events observed for Nexp expected: determine the fraction of toy experiments with null hypothesis for which Nobs  Nexp • measurement, x0± : throw toys with true value x0– , and determine fraction of experiments with x0,toy x0, same for positive error • If one wants to be smart, one can compute the first example by hand: • That’s elegant, but there is no law that requires elegance… A. Hoecker: Statistical Issues

  8. More Complicated T. Eifert-AH • When the model gets more realistic, elegant solutions are not always straightforward… • Examples: • exclusion analysis, Nobs events observed for Nexp expected – but: Nexp has uncertainty  ! • If one wants to be smart, one can computes the p-value by hand: 1F1 are confluent hypergeometric functions of first kind where: • Is this any elegant ?  toys perform numerical integration and are super simple, just use: • TRandom::Poisson(Nobs, N ) • TRandom::Gauss(N, Nexp, ) A. Hoecker: Statistical Issues

  9. Categorisation of Systematic Errors K. Cranmer • Class-I: the Good Taken from P. Sinervo’s PhyStat03 talk • Can be taken from auxiliary measurements • Well behaved statistics wise, improve with luminosity • Class-II: the Bad • Arise from poorly understood analysis features or model assumptions • Can control size of effects • No statistical meaning, may be modeled by Gaussians (giving it Bayesian credibility intervals) following “central limit theorem” • Class-III: the Evil • Arise from theoretical assumptions or uncontrolled model uncertainties • Cannot reasonably control size of effect • No statistical meaning, no reasonable prior modeling A. Hoecker: Statistical Issues

  10. A statistical method has “coverage” (1–α) if, in infinitely many repeated experiments the resulting CLs include (cover) the true value in a fraction (1–α) of all cases (irrespective of what the true value is) Coverage K. Cranmer • Treat the test statistics interpretation (limit setting procedure, errors) as black box • Fix its nuisance parameters (e.g., to values found in a maximum-likelihood fit) • Generate toy MC samples • Determine the true Type-I error rate for this setup • Compare with initial statistical interpretation: test its “coverage” • Coverage calibrates the statistical apparatus • Different statistical methods may have different coverage ! Coverage versus nuisance parameters G. Punzi, PhyStat’05 A. Hoecker: Statistical Issues

  11. A p p l i c a t i o n s A. Hoecker: Statistical Issues

  12. Example: Higgs Searches at LEP A. Read, Lessons from LEP • Higgs searches at LEP Likelihood includes shape information • Test statistics likelihood ratio (LR): • Determine PDFs for lnQ with MC • Define another statistics “CLs” to obtain lower bound on mH Other test statistics have been tried: similar sensitivity to exclusion and discovery, but none performed better • The P’s are obtained from integrating Q PDFs • The CLs is not gauged with MC anymore, but directly used • CLs(mH) = 0.05, mH is excluded at  95% • This interpretation leads to an overcoverage, i.e., to a conservative (too low) limit • Ouf ! (why so complicated ?) A. Hoecker: Statistical Issues

  13. Example: Lessons from TEVATRON T. Junks, Lessons from Tevatron Tom Junk gave an interesting talk about lessons from Tevatron. Many concrete examples of statistics use cases and pitfalls (some touched in this résumé). Too rich to summarise here. Have a look yourself ! A. Hoecker: Statistical Issues

  14. Example: ATLAS Higgs Searches W. Quayle, Higgs searches • “You can’t do discovery physics at LHC without at least a little bit of statistical analysis” … my god ! • Use as straightforward statistical arguments (LEP missed that one), which are as rigorous as possible • Points out danger of Type-I errors when scanning mH range [Guillaume et al.’s note, ‘06] • Bill advertises to perform a fit of mH instead [EPJ C45, 659 (2006)] • Toy MC must model the entire hypothesis test ! AH: cannot believe it makes a diff-erence whether one scans or fits mH • “Many analyses evolve towards background extraction from ML fits” • H (use categories in rapidity & more variables, fit nuisance parameters) • ttH (Hbb) (fit mH and signal, background yields) • HWWqq (uncertainty in BG, signal can be near BG peak, W + jets control samples) • + others … • Combined limit/discovery: combine test statistics (e.g., likelihoods) ?  requires combined toy analysis ! Combine confidence levels ?  not unambiguous ! Hot topic, I guess ! A. Hoecker: Statistical Issues

  15. Example: ATLAS SUSY Searches T. Lari, Stat issues in SUSY searches • MSSM has 105 parameters  use constrained models for signal MC (e.g., mSUGRA with 4.5 parameters) • Searches driven by signature: hard jets, LSP (ET,miss), large Meff, maybe leptons • Optimise (and finalize) analysis before looking into signal region ! • Statistics challenges: • Optimise analysis at a single mSUGRA point ? (small T-I error, but maybe large T-II error) • Optimise and test full mSUGRA grid ? (large T-I error, maybe smaller T-II error) • Apply “general search strategy” (S. Caron) ? (huge T-I error, maybe smaller T-II error) • We can compute rate of T-I errors, but do not know anything about the T-II error rate ! • Optimisation should include systematics ! • Other challenges, potentially more important for early discovery: • Need to control backgrounds (from data ?) and systematic errors • Can we extrapolate background from “sidebands” into signal region ? A. Hoecker: Statistical Issues

  16. A n a l y s i s O p t i m i s a t i o n A. Hoecker: Statistical Issues

  17. x2 x2 x2 H1 H1 H1 H0 H0 H0 x1 x1 x1 Data Mining: Event Classification G. Cowan, Introduction • Suppose data sample with two types of events: H0, H1 • We have found discriminating input variables x1, x2, … • What decision boundary should we use to select events of type H1? Rectangular cuts? A linear boundary? A nonlinear one? • How can we decide this in an optimal way ?  Let the machine learn it ! A. Hoecker: Statistical Issues

  18. Multivariate Analysis (MVA) G. Cowan, Introduction J. Stelzer, TMVA W. Verkerke, RooFit • Create test statistics compactifying the input information in a scalar quantity y, with e.g., y(H0)  0, y(H1)  1 • Large variety of MVA methods, reaching from cuts, over likelihood, to linear and non-linear discriminants to rule-based approaches like Boosted Decision Trees and RuleFit • If correlations among the xi are negligible, one can perform maximum-likelihood fit (same principle, see later) A. Hoecker: Statistical Issues

  19. Example: DØ Single Top Search (I) B. Vachon, Stat. methods for single top search, hep-ex/0612052 • Electroweak top quark production: s-channel @Tevatron: tb vertex:  ≈ 0.9 pb t-channel @Tevatron: tqb vertex:  ≈ 2 pb • Event signature: isolated leptons, 2-4 jets, 1 b-jet, ET,miss • Dominant background: W + jets, multi-jets, tt-bar • Use 3 MV discrimination methods: • Boosted Decision Trees: 36 signal classes (s/t, e/, #jets, #b-tags), 49 input variables • Bayesian (~ average of many) neural network: 24 input variables, 40 hidden nodes • Matrix element: ratio of signal and background PDFs from approx. matrix element of event A. Hoecker: Statistical Issues

  20. Example: DØ Single Top Search (II) B. Vachon, Stat. methods for single top search, hep-ex/0612052 • Results (DØ – 0.9 fb–1, preliminary): BDT ME • Extracts from Brigitte’s comments: • Physicists should keep an open mind w.r.t. new data analysis techniques • Collaboration should have an “official” set of software tools; […] develop them now ! • Let's stop calling [MVAs] “black boxes” and let's learn how they work and behave • Can be good to use different [MVAs] as cross-checks and to ensure maximal sensitivity • Most important thing is understanding of data/background modeling, not the MVA you use A. Hoecker: Statistical Issues

  21. B 0h+h’- signalcandidates It would be naïve to believe that all analyses at the LHC could be done that way, but we should keep in mind Amir’s main message: draw as much information on the nuisance parameters from the data as possible, and do this simultaneously with the fit of the signal component Control sample Optimised Analysis Strategy – BABAR Example A. Farbin, Practical experience from BABAR (figure modified) • Comprise measurement, validation and evaluation of systematics in single analysis step  the ML fit Also see: ML fits by B-phys group (E. Kneringer) Kinematic variables Signal CP Parameters (blind) MVA Signal/Background Yields PIDvariables Maximum Likelihood Fit Background PDF parameters Flavour Tagging Signal PDF parameters Same variables for control sample 119 free parameters in fit; weak (but not negligible) correlations External input: PDF parameters from MC or other control samples not in fit A. Hoecker: Statistical Issues

  22. Comment on Goodness-Of-Fit Validation • It is often said that the unbinned ML fits cannot easily be validated • However, there exists a straightforward manner to visualize exactly what the fit does, and to quantitatively determine its goodness AH & Remark by G. Cowan • Form the likelihood ratio after fit: Tuple with fit results • Compute R(x) for all events entering the ML fit and plot them • Produce high-statistics toy MC for all fit components using the likelihood model • Plot the events normalised to their relative abundances used/found in fit  compare ! Example from BABAR B  analysis AH, 2003 A. Hoecker: Statistical Issues

  23. S t a t i s t i c s T o o l k i t s A. Hoecker: Statistical Issues

  24. Tools – Summary from the Workshop • ROOT (http://root.cern.ch/) • Large number of utilities needed for statistical analysis, including: minimisation, random generation, statistical tests, also TRolke, TLimit… (and a poor man’s TFeldmanCousins) • RooFit (http://roofit.sf.net/) W. Verkerke • ROOT data modelling toolkit for unbinned maximum-likelihood fits and toy MC analysis • TMVA (http://tmva.sf.net/) J. Stelzer • ROOT multivariate analysis toolkit for parallel discrimination analysis and data mining • sPlots (ROOT::TSPlots, physics/0402083) T. Petersen • Optimised visualization of maximum-likelihood fit results (not (yet) RooFit based ) • RooStats (under development, prototype for PhyStat’07) K. Cranmer • ROOT & RooFit based statistical interpretation with horizontal comparison of methods If you can’t wait, and need p-value for count analysis taking into account background systematics you may checkout catsusy/StatTools A. Hoecker: Statistical Issues

  25. B l i n d A n a l y s i s A. Hoecker: Statistical Issues

  26. Blind Analysis (I) AH, Blind Analysis • Most modern HEP experiments apply blind techniques A. Hoecker: Statistical Issues

  27. Blind Analysis (II) AH, Blind Analysis • Hiding the result formalises our way to do data analysis • Besides the obvious advantages, blind analysis canalises competition and improves internal review. It strengthens the role of the physics groups • Cannot determine general blinding rule, but can suggest to consider and discuss blinding and the technique to use for each analysis individually • Most serious objection:could miss/delay obvious new physics signal • Could be dealt with by weakening the validation requirements for search analyses… • … once ok: unblind the data – and if no clear-cut signal, re-hide signal box and finalise • My impression from discussions after the talk: ATLAS is not yet mature for the adventure of blind analysis … ;-) A. Hoecker: Statistical Issues

  28. C o n c l u s i o n s D. Froidevaux’s “Motherhood statements” • Daniel doesn’t adhere to frequentist vs. Bayesian discussions, and he doesn’t seem to like nuisance parameters either ;-) • “Today we have acquired far more sophisticated tools than twenty years ago, but we do not write them always ourselves, which often entails the risk that we do not test them adequately”  MC generators, analysis tools ! • Need to detain our statistics enthusiasm and first understand: • …the detector response • …the basic QCD processes and other backgrounds • …and validate the MC simulation • Fortunately: optimised analysis also helps to reduce background systematics ! (AH) • Statistics working group contacts with CMS being prepared (by whom?) • Inter-experiment combination should not wait years to be started A. Read A. Hoecker: Statistical Issues

More Related