Physics Statistics: Recommendations and Discussions for CMS and ATLAS Experiments

Statistics Forum Follow-up info for Physics Coordination 27 June, 2011 Glen Cowan, Eilam Gross and the Third Man Kyle Cranmer Follow-up from the Statistics Forum / CERN, 24 June 2011

Main questions What do we see as the main way forward with CMS? What do we recommend in the short term (summer 2011)? What do we recommend after summer 2011? Follow-up from the Statistics Forum / CERN, 24 June 2011

Interactions with the CMS Statistics Group Interaction between ATLAS and CMS statistics groups began already several years ago in the context of the Higgs combination; this effort continues successfully in the separate HCG(with CLs being the method with the ATLAS one-sided test statistic and ATLAS treatment of nuisance parameters): In addition, the meetings between the ATLAS and CMS Statistics Groups have increased this year with the goal of agreeing on statistical tools and practice to facilitate comparison and eventual combination of results. ATLAS: G. Cowan, E. Gross, K. Cranmer, O. Vitells, W. Murray CMS: R. Cousins, L. Lyons, L. Demortier, T. Dorigo Report from the Statistics Forum / CERN, 23 June 2011

The way forward with CMS We met again with CMS in the evening of 23 June 2011 (ATLAS: Cowan, Gross, Murray, Read, Cranmer; CMS: Cousins, Lyons, Dorigo, Demortier) Cousins more or less ruled out supporting either CLs or PCL as a long-term recommendation for CMS. We tried to clarify if this was his view or that of CMS. He believes his own view, which is to use Feldman-Cousins unified (two-sided) intervals would be followed in CMS. We replied that the prevailing view in ATLAS has been to quote a one-sided upper limit, and it was difficult to envisage adopting F-C in place of this. So at present there is no single frequentist method that would have long-term support from both ATLAS and CMS. Follow-up from the Statistics Forum / CERN, 24 June 2011

ATLAS/CMS discussions on one-sided limits Some prefer to report one-sided frequentist upper limits (CLs, PCL); others prefer unified (Feldman-Cousins) limits, where the lower edge may or may not exclude zero. The prevailing view in the ATLAS Statistics Forum has been that in searches for new phenomena, one wants to know whether a cross section is excluded on the basis that its predicted rate is too high relative to the observation, not excluded on some other grounds (e.g., a mixture of too high or too low). Among statisticians there is support for both approaches. Follow-up from the Statistics Forum / CERN, 24 June 2011

ATLAS/CMS discussions on one-sided limits Using FC is almost sure to produce intervals which will exclude μ = 0 at the 95% CL; those might require an apologetic explanation to the reader.Using 2-sided to produce limits will enable an exclusion of the Higgs boson when the data fluctuates upwards with respect to the expected signal.We prefer to stick to the traditional way of 1-sided, where we state clearly that we are interested in a limit and produce a CI which always include zero, i.e. Follow-up from the Statistics Forum / CERN, 24 June 2011

Discussions concerning flip-flopping One-sided limits (CLs, PCL) can suffer from “flip-flopping”, i.e., violation of coverage probability if one decides, based on the data, whether to report an upper limit or a measurement with error bars (two-sided interval). This can be avoided by “always” reporting: (1) An upper limit based on a one-sided test. (2) The discovery significance (equivalent to p-value of background-only hypothesis with the q0 test statistic). In practice, “always” can mean “for every analysis carried out as a search”, i.e., until the existence of the process is well established (e.g., 5σ). I.e. we only require what is done in practice to map approximately onto the idealized infinite ensemble. Follow-up from the Statistics Forum / CERN, 24 June 2011

Discussions on CLs and F-C CLs has been criticized as a method for preventing spurious exclusion as it leads to significant overcoverage that is in practice not communicated to the reader. This was the motivation behind PCL. We have also not supported using the upper edge of a Feldman- Cousins interval as a substitute for a one-sided upper limit, since when used in this way F-C has lower power. Furthermore F-C unified intervals protect against small (or null) intervals by counting the probability of upward data fluctuations, which are not relevant if the goal is to establish an upper limit. Follow-up from the Statistics Forum / CERN, 24 June 2011

The way forward with CMS (2) In the short term, there is support for CLs in both collaborations as an interim solution to allow for comparison of limits. Bayesian methods emerged as a solution with support from both sides. On the one hand this had always been viewed as a useful complement to the frequentist limit. Furthermore, one can study and report the frequentist properties of Bayesian intervals (i.e., the fraction of times they would cover the true parameter value), and in many examples this turns out to be very good. Both sides agreed to consider Bayesian methods with priors chosen to have good frequentist properties as a common method. Follow-up from the Statistics Forum / CERN, 24 June 2011

The way forward with CMS (3) At a more detailed level it will take some more time to agree on and implement the procedures. So in the short term this is not a realistic solution for analyses where Bayesian methods have not already been developed.We have already started to discuss the Bayesian implementation in ATLAS. A twiki has been created| Follow-up from the Statistics Forum / CERN, 24 June 2011

https://twiki.cern.ch/twiki/bin/view/AtlasProtected/BayesianLimitRecommendationImplementationhttps://twiki.cern.ch/twiki/bin/view/AtlasProtected/BayesianLimitRecommendationImplementation Follow-up from the Statistics Forum / CERN, 24 June 2011

Recommendation on minimum power for PCL from 16% to 50% For summer 2011 (and beyond), we recommend quoting PCL limits with the minimum power of 50%. The reasons for moving the minimum power to 50% are both theoretical and practical: 50% avoids the possibility of having a conservative treatment of systematics lead to a stronger limit. Some computational issues related to low-count analyses are less problematic with 50%. There is a slight reduction in the burden on the analyst, since the 50% quantile (median) needed for the power constraint is easier to find than the 16% quantile (-1 sigma error band). Follow-up from the Statistics Forum / CERN, 24 June 2011

Recommendation on minimum power for PCL from 16% to 50% (2) 50% minimum power gives a slight reduction in the “psychological burden” on conference speakers, in that the fraction of times one sees a sizable difference between PCL and CLs would be less, and then only in cases where a strong downward fluctuation leads to a stronger CLs limit (see graph on next page and recall that under the background-only model, muHhat lives 68% of the time between -1 and 1). Owing to the short notice before EPS, it may be desirable to leave the minimum power at 16% for the short term. This should depend on whether groups feel they need more time to shift from 16% to 50%. In practice this step should not take any more time, and in some cases will save time. Follow-up from the Statistics Forum / CERN, 24 June 2011

Upper limits for Gaussian problem (unknown) true value → measurement → Follow-up from the Statistics Forum / CERN, 24 June 2011

Follow-up from the Statistics Forum / CERN, 24 June 2011

Changing the constraining power to 50% looks natural and seems to be in the same boat with CLs (psychologically) The expected PCL is better because it covers exactly at 95%, whileCLs has an intrinsic overcoverage (its expectation is at the 97.5% CL) Follow-up from the Statistics Forum / CERN, 24 June 2011

Recommendations PCL solves problem of “spurious exclusion” by separating the parameter space into regions in which one has/hasn’t sufficient sensitivity as given by the probability to reject μ if background-only model is true. Recommendations for ATLAS: Report unconstrained limit. Report power constrained limit (with power M0(μ) ≥ 0.5). Report p-value of background-only hypothesis. Also report CLs. In problems with low background, recent improvement to software implementation related to treatment of nuisance params. ATLAS also has ongoing effort to establish recommendations for Bayesian limits (Georgios Choudalakis, Diego Casadei). new Report from the Statistics Forum / CERN, 23 June 2011

https://twiki.cern.ch/twiki/bin/view/AtlasProtected/StatisticsToolshttps://twiki.cern.ch/twiki/bin/view/AtlasProtected/StatisticsTools Follow-up from the Statistics Forum / CERN, 24 June 2011

New frequentist limit document https://twiki.cern.ch/twiki/pub/AtlasProtected/ StatisticsTools/Frequentist_Limit_Recommendation.pdf Report from the Statistics Forum / CERN, 23 June 2011

https://twiki.cern.ch/twiki/bin/view/AtlasProtected/FrequentistLimitRecommendationImplementationhttps://twiki.cern.ch/twiki/bin/view/AtlasProtected/FrequentistLimitRecommendationImplementation Follow-up from the Statistics Forum / CERN, 24 June 2011

Low Counts Follow-up from the Statistics Forum / CERN, 24 June 2011

Intermediate Follow-up from the Statistics Forum / CERN, 24 June 2011

Asymptotic Follow-up from the Statistics Forum / CERN, 24 June 2011

Conclusions We recommend using PCL with a minimum power of 50% as the primary result. For the short term, we support also reporting CLs provided to allow for comparison with CMS. In the longer term, the Bayesian approach appears to have common support in both ATLAS and CMS. This will take some time to implement for many analyses; for others it is already available. Search analyses should also report the discovery significance (p-value of the background-only hypothesis).Documentation and code exist with twiki walkthroughhttps://twiki.cern.ch/twiki/bin/view/AtlasProtected/FrequentistLimitRecommendationImplementation Follow-up from the Statistics Forum / CERN, 24 June 2011

Extra material (repeated from 23 June talk) Follow-up from the Statistics Forum / CERN, 24 June 2011

Discussions concerning PCL PCL has been criticized as it does not obviously map onto a Bayesian result for some choice of prior (CLs = Bayesian for special cases, e.g., x ~ Gauss(μ, σ), constant prior for μ ≥ 0). We are not convinced of the need for this. The frequentist properties of PCL are well defined, and as with all frequentist limits one should not interpret them as representing Bayesian credible intervals. Further criticism of PCL is related to an unconstrained limit that could exclude all values of μ. A remnant of this problem could survive after application of the power constraint (cf. “negatively biased relevant subsets”). PCL does not have negatively biased relevant subsets (nor does our unconstrained limit, as it never excludes μ = 0). On both points, debate still ongoing. Follow-up from the Statistics Forum / CERN, 24 June 2011

Physics Statistics: Recommendations and Discussions for CMS and ATLAS Experiments

Physics Statistics: Recommendations and Discussions for CMS and ATLAS Experiments

Presentation Transcript

June 27, 2011

Global Forum on Trade Statistics Follow-ups

Colorado Hospital Association FOLLOW UP FORUM #1

Follow-up Conference Call June 25, 2009

AGA Damage Prevention Follow Up Meeting June 2011

LMG Forum June 2011

Unit F1 - Country profiles, Coordination of Follow-up

Anne McKeown 27 June 2011

September 27, 2011 Forum for Excellence

DROPS Forum Stavanger 1. June 2011

Follow up to June Hazard Workshop

Care Coordination Forum

Queensland Divisions Forum 27 June 2008

Landlord Forum 8 June 2011

Head Teacher Forum June 2011

June 27 2011

Session 1.3 The Vital Statistics System 27 June 2011

DROPS Forum Stavanger 1. June 2011

DROPS forum 1. June 2011