The small- n problem in High Energy Physics

The small-n problem in High Energy Physics Glen Cowan Department of Physics Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan Statistical Challenges in Modern Astronomy IV June 12 - 15, 2006 Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Outline I. High Energy Physics (HEP) overview Theory Experiments Data II. The small-n problem, etc. Making a discovery Setting limits Systematic uncertainties III. Conclusions Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

The current picture in particle physics Matter... + force carriers... photon (g) W± Z gluon (g) + relativity + quantum mechanics + symmetries... = “The Standard Model” • almost certainly incomplete • 25 free parameters (masses, coupling strengths,...) • should include Higgs boson (not yet seen) • no gravity yet • agrees with all experimental observations so far Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Experiments in High Energy Physics HEP mainly studies particle collisions in accelerators, e.g., Large Electron-Positron (LEP) Collider at CERN, 1989-2000 4 detectors, each collaboration ~400 physicists. Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

More HEP experiments LEP tunnel now used for the Large Hadron Collider (LHC) proton-proton collisions, Ecm=14 TeV, very high luminosity Two general purpose detectors: ATLAS and CMS Each detector collaboration has ~2000 physicists Data taking to start 2007 The ATLAS Detector Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

HEP data Basic unit of data: an ‘event’. Ideally, an event is a list of momentum vectors & particle types. In practice, particles ‘reconstructed’ as tracks, clusters of energy deposited in calorimeters, etc. Resolution, angular coverage, particle id, etc. imperfect. An event from the ALEPH detector at LEP Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Data samples At LEP, event rates typically ~Hz or less ~106 Z boson events in 5 years for each of 4 experiments At LHC, ~109 events/sec(!!!), mostly uninteresting; do quick sifting, record ~200 events/sec single event ~ 1 Mbyte 1 ‘year’ ≈ 107 s, 1016 pp collisions per year, 2 billion / year recorded (~2 Pbyte / year) For new/rare processes, rates at LHC can be vanishingly small Higgs bosons detectable per year could be e.g. ~103 →‘needle in a haystack’ Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

HEP game plan Goals include: Fill in the gaps in the Standard Model (e.g. find the Higgs) Find something beyond the Standard Model (New Physics) Example of an extension to SM: Supersymmetry (SUSY) For every SM particle → SUSY partner (none yet seen!) Minimal SUSY has 105 free parameters, constrained models ~5 parameters (plus the 25 from SM) Provides dark matter candidate (neutralino), unification of gauge couplings, solution to hierarchy problem,... Lightest SUSY particle can be stable (effectively invisible) Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Simulated HEP data Monte Carlo event generators available for essentially all Standard Model processes, also for many possible extensions to the SM (supersymmetric models, extra dimensions, etc.) SM predictions rely on a variety of approximations (perturbation theory to limited order, phenomenological modeling of non-perturbative effects, etc.) Monte Carlo programs also used to simulate detector response. Simulated event for ATLAS Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

A simulated event PYTHIA Monte Carlo pp → gluino-gluino Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

The data stream Experiment records events of different types, with different numbers of particles, kinematic properties, ... Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Selecting events To search for events of a given type (H0: ‘signal’), need discriminating variable(s) distributed as differently as possible relative to unwanted event types (H1: ‘background’) Count number of events in acceptance region defined by ‘cuts’ Expected number of signal events: s = ss L Expected number of background events: b =  bb L s, b = cross section for signal, background ‘Efficiencies’: s= P( accept | s ), b = P( accept | b ) L = integrated luminosity (related to beam intensity, data taking time) Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Poisson data with background Count n events, e.g., in fixed time or integrated luminosity. s = expected number of signal events b = expected number of background events n ~ Poisson(s+b): Sometimes b known, other times it is in some way uncertain. Goals: (i) convince people that s ≠ 0 (discovery); (ii) measure or place limits on s, taking into consideration the uncertainty in b. Widely discussed in HEP community, see e.g. proceedings of PHYSTAT meetings, Durham, Fermilab, CERN workshops... Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Making a discovery Often compute p-value of the ‘background only’ hypothesis H0 using test variable related to a characteristic of the signal. p-value = Probability to see data as incompatible with H0, or more so, relative to the data observed. Requires definition of ‘incompatible with H0’ HEP folklore: claim discovery if p-value equivalent to a 5 fluctuation of Gaussian variable (one-sided) Actual p-value at which discovery becomes believable will depend on signal in question (subjective) Why not do Bayesian analysis? Usually don’t know how to assign meaningful prior probabilities Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Computing p-values For n ~ Poisson (s+b) we compute p-value of H0 : s = 0 Often we don’t simply count events but also measure for each event one or more quantities number of events observed n replaced by numbers of events (n1, ..., nN) in a histogram Goodness-of-fit variable could be e.g. Pearson’s 2 Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Example: search for the Higgs boson at LEP Several usable signal modes: Mass of jet pair = mass of Higgs boson; b jets contain tracks not from interaction point Important background from e+e-→ ZZ b-jet pair of virtual Z can mimic Higgs Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

A candidate Higgs event 17 ‘Higgs like’ candidates seen but no claim of discovery -- p-value of s=0 (background only) hypothesis ≈ 0.09 Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Setting limits Frequentist intervals (limits) for a parameter s can be found by defining a test of the hypothesized value s (do this for all s): Specify values of the data n that are ‘disfavoured’ by s (critical region) such that P(n in critical region) ≤g for a prespecified g, e.g., 0.05 or 0.1. (Because of discrete data, need inequality here.) If n is observed in the critical region, reject the value s. Now invert the test to define a confidence interval as: set of s values that would not be rejected in a test of sizeg (confidence level is 1 - g ). The interval will cover the true value of s with probability ≥ 1 - g. Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Setting limits: ‘classical method’ E.g. for upper limit on s, take critical region to be low values of n, limit sup at confidence level 1 -b thus found from Similarly for lower limit at confidence level 1 - a, Sometimes choose a = b = g /2→ central confidence interval. Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Calculating classical limits To solve for slo, sup, can exploit relation to 2 distribution: Quantile of 2 distribution For low fluctuation of n this can give negative result for slo; i.e. confidence interval is empty. b Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Likelihood ratio limits (Feldman-Cousins) Define likelihood ratio for hypothesized parameter value s: Here is the ML estimator, note Critical region defined by low values of likelihood ratio. Resulting intervals can be one- or two-sided (depending on n). (Re)discovered for HEP by Feldman and Cousins, Phys. Rev. D 57 (1998) 3873. Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Coverage probability of confidence intervals Because of discreteness of Poisson data, probability for interval to include true value in general > confidence level (‘over-coverage’) Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

More on intervals from LR test (Feldman-Cousins) Caveat with coverage: suppose we find n >> b. Usually one then quotes a measurement: If, however, n isn’t large enough to claim discovery, one sets a limit on s. FC pointed out that if this decision is made based on n, then the actual coverage probability of the interval can be less than the stated confidence level (‘flip-flopping’). FC intervals remove this, providing a smooth transition from 1- to 2-sided intervals, depending on n. But, suppose FC gives e.g. 0.1 < s < 5 at 90% CL, p-value of s=0 still substantial. Part of upper-limit ‘wasted’? Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Properties of upper limits Example: take b = 5.0, 1 -  = 0.95 Upper limit sup vs. n Mean upper limit vs. s Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Upper limit versus b Feldman & Cousins, PRD 57 (1998) 3873 b If n = 0 observed, should upper limit depend on b? Classical: yes Bayesian: no FC: yes Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Nuisance parameters and limits In general we don’t know the background b perfectly. Suppose we have a measurement of b, e.g., bmeas ~ N (b, b) So the data are really: n events and the value bmeas. In principle the confidence interval recipe can be generalized to two measurements and two parameters. Difficult and not usually attempted, but see e.g. talks by K. Cranmer at PHYSTAT03, G. Punzi at PHYSTAT05. G. Punzi, PHYSTAT05 Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Bayesian limits with uncertainty on b Uncertainty on b goes into the prior, e.g., Put this into Bayes’ theorem, Marginalize over b, then use p(s|n) to find intervals for s with any desired probability content. For b = 0, b = 0, (s) = const. (s > 0), Bayesian upper limit coincides with Classical one. Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Cousins-Highland method Regard b as random, characterized by pdf (b). Makes sense in Bayesian approach, but in frequentist model b is constant (although unknown). A measurement bmeas is random but this is not the mean number of background events, rather, b is. Compute anyway This would be the probability for n if Nature were to generate a new value of b upon repetition of the experiment with b(b). Now e.g. use this P(n;s) in the classical recipe for upper limit at CL = 1 - b: Widely used method in HEP. Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

‘Integrated likelihoods’ Consider again signal s and background b, suppose we have uncertainty in b characterized by a prior pdf b(b). Define integrated likelihood as also called modified profile likelihood, in any case not a real likelihood. Now use this to construct likelihood-ratio test and invert to obtain confidence intervals. Feldman-Cousins & Cousins-Highland (FHC2), see e.g. J. Conrad et al., Phys. Rev. D67 (2003) 012002 and Conrad/Tegenfeldt PHYSTAT05 talk. Calculators available (Conrad, Tegenfeldt, Barlow). Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Digression: tangent plane method Consider least-squares fit with parameter of interest 0 and nuisance parameter 1, i.e., minimize Standard deviations from tangent lines to contour Correlation between causes errors to increase. Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

The profile likelihood The ‘tangent plane’ method is a special case of using the profile likelihood: is found by maximizing L (q0, q1) for each q0. Equivalently use The interval obtained from is the same as what is obtained from the tangents to Well known in HEP as the ‘MINOS’ method in MINUIT. See e.g. talks by Reid, Cranmer, Rolke at PHYSTAT05. Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Interval from inverting profile LR test Suppose we have a measurement bmeas of b. Build the likelihood ratio test with profile likelihood: and use this to construct confidence intervals. Not widely used in HEP but recommended in e.g. Kendall & Stuart; see also PHYSTAT05 talks by Cranmer, Feldman, Cousins, Reid. Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Wrapping up, Frequentist methods have been most widely used but for many questions (particularly related to systematics), Bayesian methods are getting more notice. Frequentist properties such as coverage probability of confidence intervals seen as very important (overly so?) Bayesian methods remain problematic in cases where it is difficult to enumerate alternative hypotheses and assign meaningful prior probabilities. Tools widely applied at LEP; some work needed to extend these to LHC analyses (ongoing). Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Finally, The LEP programme was dominated by limit setting: Standard Model confirmed, No New Physics The Tevatron discovered the top quark and Bs mixing (both parts of the SM) and also set many limits (but NNP) By ~2012 either we’ll have discovered something new and interesting beyond the Standard Model, or, we’ll still be setting limits and HEP should think seriously about a new approach! Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Extra slides Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

A recent discovery: Bs oscillations Recently the D0 experiment (Fermilab) announced the discovery of Bs mixing: Moriond talk by Brendan Casey, also hep-ex/0603029 Produce a Bq meson at time t=0; there is a time dependent probability for it to decay as an anti-Bq (q = d or s): |Vts|À |Vtd| and so Bs oscillates quickly compared to decay rate Sought but not seen at LEP; early on predicted to be visible at Tevatron Discovery quickly confirmed by the CDF experiment Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan Statistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester

Confidence interval from likelihood function In the large sample limit it can be shown for ML estimators: (n-dimensional Gaussian, covariance V) defines a hyper-ellipsoidal confidence region, If then Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Approximate confidence regions from L( ) So the recipe to find the confidence region with CL = 1- is: For finite samples, these are approximate confidence regions. Coverage probability not guaranteed to be equal to 1- ; no simple theorem to say by how far off it will be (use MC). Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan Statistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester

Upper limit from test of hypothesized ms Base test on likelihood ratio (here  = ms): Observed value is lobs , sampling distribution is g(l;) (from MC) is excluded at CL=1- if D0 shows the distribution of ln l for ms = 25 ps-1 equivalent to 2.1 effect 95% CL upper limit Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

The significance of an observed signal Suppose b = 0.5, and we observe nobs = 5. Often, however, b has some uncertainty thiscan have significant impact on p-value, e.g. if b = 0.8, p-value = 1.4  10-3 Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

The significance of a peak Suppose we measure a value x for each event and find: Each bin (observed) is a Poisson r.v., means are given by dashed lines. In the two bins with the peak, 11 entries found with b = 3.2. We are tempted to compute the p-value for the s = 0 hypothesis as: Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

The significance of a peak (2) But... did we know where to look for the peak? → give P(n ≥ 11) in any 2 adjacent bins Is the observed width consistent with the expected x resolution? → take x window several times the expected resolution How many bins  distributions have we looked at? →look at a thousand of them, you’ll find a 10-3 effect Did we adjust the cuts to ‘enhance’ the peak? →freeze cuts, repeat analysis with new data How about the bins to the sides of the peak... (too low!) Should we publish???? Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Statistical vs. systematic errors Statistical errors: How much would the result fluctuate upon repetition of the measurement? Implies some set of assumptions to define probability of outcome of the measurement. Systematic errors: What is the uncertainty in my result due to uncertainty in my assumptions, e.g., model (theoretical) uncertainty; modeling of measurement apparatus. The sources of error do not vary upon repetition of the measurement. Often result from uncertain value of, e.g., calibration constants, efficiencies, etc. Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

Systematic errors and nuisance parameters Response of measurement apparatus is never modeled perfectly: y (measured value) model: truth: x (true value) Model can be made to approximate better the truth by including more free parameters. systematic uncertainty ↔ nuisance parameters Glen Cowan, SCMA4, 12-15 June, 2006 Glen Cowan SCMA4, 12-15 June, 2006

The small- n problem in High Energy Physics