Phystat05 highlights statistical problems in particle physics astrophysics and cosmology
This presentation is the property of its rightful owner.
Sponsored Links
1 / 46

PHYSTAT05 Highlights: Statistical Problems in Particle Physics, Astrophysics and Cosmology PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on
  • Presentation posted in: General

1. PHYSTAT05 Highlights: Statistical Problems in Particle Physics, Astrophysics and Cosmology. Phystat05 Highlights. University College London 03/11/2006. M ü ge Karag ö z Ü nel Oxford University. MKU. 2. Outline. Conference Information and History Introduction to statistics

Download Presentation

PHYSTAT05 Highlights: Statistical Problems in Particle Physics, Astrophysics and Cosmology

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Phystat05 highlights statistical problems in particle physics astrophysics and cosmology

1

PHYSTAT05 Highlights:Statistical Problems in Particle Physics, Astrophysics and Cosmology

Phystat05 Highlights

University College London

03/11/2006

Müge Karagöz Ünel

Oxford University

MKU


Outline

2

Outline

  • Conference Information and History

  • Introduction to statistics

  • Selection of hot topics

  • Available tools

  • Astrophysics and cosmology

  • Conclusions

Phystat05 Highlights

MKU


Phystat history

3

PHYSTAT History

Phystat05 Highlights

MKU


Phystat05 highlights statistical problems in particle physics astrophysics and cosmology

4

Phystat05 Highlights

Poster

MKU


Chronology of phystat05

5

Chronology of PHYSTAT05

Phystat05 Highlights

MKU


Phystat05 programme

6

PHYSTAT05 Programme

7 Invited talks by Statisticians

9 Invited talks by Physicists

38 Contributed talks

8 Posters

Panel Discussion

3 Conference Summaries

90 participants

Phystat05 Highlights

MKU


Invited talks by statisticians

7

Invited Talks by Statisticians

David CoxKeynote Address: Bayesian, Frequentists & Physicists

Steffen LauritzenGoodness of Fit

Jerry FriedmanMachine Learning

Susan Holmes Visualisation

Peter CliffordTime Series

Mike TitteringtonDeconvolution

Nancy ReidConference Summary (Statistics)

Phystat05 Highlights

MKU


Invited talks by astro physicists

8

Invited Talks by (Astro+)Physicists

Bob CousinsNuisance Parameters for Limits

Kyle CranmerLHC discovery

Alex Szalay Astrophysics + Terabytes

Jean-Luc Starck Multiscale geometry

Jim LinnemannStatistical Software for Particle Physics

Bob NicholStatistical Software for Astrophysics

Stephen JohnsonHistorical Transits of Venus

Andrew JaffeConference Summary (Astrophysics)

Gary FeldmanConference Summary (Particles)

Phystat05 Highlights

MKU


Contents of the proceedings

9

Contents of the Proceedings

Bayes/Frequentist5 talks

Goodness of Fit5

Likelihood/parameter estimation6

Nuisance parameters/limits/discovery10

Machine learning7

Software8

Visualisation1

Astrophysics5

Time series1

Deconvolution3

Phystat05 Highlights

MKU


Statistics in a p physics

10

Statistics in (A/P)Physics

Phystat05 Highlights

MKU


Statistics in particle physics

11

Statistics in (Particle) Physics

An experiment goes through following stages:

  • Prepare conditions for taking data for a particle X ( if theory driven)

  • Record events that might be X and reconstruct the measurables

  • Select events that could have X by applying criteria (cuts)

  • Generate histograms of variables and ask the questions:

    Is there any evidence for new things or is the null hypothesis unrefuted? If there is evidence, what are the estimates for parameters of X?(Confrontation of theory with experiment or v.v.)

  • The answers can come via your favorite statistical technique (depends on how you ask the question)

Phystat05 Highlights

MKU


Yet another chronology from s andreon s web page

12

(yet another) Chronology (from S. Andreon’s web page)

  • Homo apriorius establishes probability of an hypothesis, no matter what data tell.

  • Homo pragamiticus establishes that it is interested by the data only.

  • Homo frequentistus measures probability of the data given the hypothesis.

  • Homo sapiens measures probability of the data and of the hypothesis.

  • Homo bayesianis measures probability of the hypothesis, given the data.

Phystat05 Highlights

MKU


Bayesian vs frequentist

13

Bayesian vs Frequentist

We need to make a statement about Parameters, given Data

Bayes 1763 Frequentism 1937

Both analyse data (x)  statement about parameters (  )

Both use Prob (x;  ), e.g. Prob ( ) = 90%

but very different interpretation

Phystat05 Highlights

Bayesian :Probability (parameter, given data)

Frequentist :Probability (data, given parameter)

“Bayesians address the question everyone is interested in, by using assumptions no-one believes”

“Frequentists use impeccable logic to deal with an issue of no interest to anyone”

MKU


Goodness of fit

14

Goodness of Fit

LauritzenInvited talk - GoF

Yabsley GoF and sparse multi-D data

IanniGoF and sparse multi-D data

RajaGoF and L

Gagunashvili2and weighting

PiaSoftware Toolkit for Data Analysis

Block Rejecting outliers

Bruckman Alignment

BlobelTracking

Phystat05 Highlights

MKU


Goodness of fit1

15

Goodness of Fit

  • We would like to: know if a given distribution is of a specified type, test the validation of a postulated model,..

  • A few GoF tests are widely used in practice:

    • 2 test: most widely used application is 1 or 2D fits to data

    • G2 (the likelihood ratio statistics) test: the general version of 2 test (Lauritzen’s personal choice)

    • Kolmogorov-Smirnov test: a robust but prone to mislead test, can be used to confirm, say, two distributions (histograms) are the same by calculating the p-value for the difference hypothesis.

    • Other new methods, like Aslan&Zech’s energy test, exist…

Phystat05 Highlights

MKU


An example from atlas bruckman

track

Intrinsic measurement error + MCS

hit

residual

Key relation!

16

An example from ATLAS (Bruckman)

Direct Least-Squares solution to the Silicon Tracker alignment problem

The method consists of minimizing the giant 2resulting from a simultaneous fit of all particle trajectories and alignment parameters:

Let us consequently use the linear expansion (we assume all second order derivatives are negligible). The track fit is solved by:

Phystat05 Highlights

while the alignment parameters are given by:

Systems large: inherent

Computational challenges

MKU

Equivalent to Millepede approach from V. Blobel


Nuisance parameters limits discovery

17

Nuisance Parameters/Limits/Discovery

Cousins Limits and Nuisance Params

Reid Respondent

Punzi Frequentist multi-dimensional ordering rule

Tegenfeldt Feldman-Cousins + Cousins-Highland

Rolke Limits

Heinrich Bayes + limits

Bityukov Poisson situations

Hill Limits v Discovery (see Punzi @ PHYSTAT2003)

Cranmer LHC discovery and nuisance parameters

Phystat05 Highlights

MKU


Systematics

18

Systematics

Note:Systematic errors (HEP) <-> nuisance params (statistician)

An example:

we need to know these, probably from other measurements (and/or theory)

Uncertainties error in

Phystat05 Highlights

Physics parameter

Observed

for statistical errors

Some are arguably statistical errors

MKU


Nuisance parameters

19

Nuisance Parameters

  • Nuisance parameters are parameters with unknown true values. They may be:

    • statistical, such as number of background events in a sideband used for estimating the background under a peak.

    • systematic, such as the shape of the background under the peak, or the error caused by the uncertainty of the hadronic fragmentation model in the Monte Carlo.

    • Most experiments have a large number of systematic uncertainties.

    • If the experimenter is blind to these uncertainties, they become a bigger nuisance!

Phystat05 Highlights

MKU


Issues with lhc

20

Issues with LHC

  • LHC will collide 40 million times/sec and collect petabytes of data. pp collisions at 14 TeV will generate events much more complicated than LEP, TeVatron.

  • Kyle Cranmer has pointed out that systematic issues will be even more important at the LHC.

    • If the statistical error is O(1) and systematic error is O(0.1), it does not much matter how you treat it.

    • However, at the LHC, we may have processes with 100 background events and 10% systematic errors, this is not negligible.

    • Even more critical, we want 5s for a discovery level.

Phystat05 Highlights

MKU


Why 5 s feldman cranmer

21

Why 5s? (Feldman+Cranmer)

  • LHC searches: 500 searches each of which has 100 resolution elements (mass, angle bins, ...) = 5 x 104 chances to find something.

  • One experiment: False positive rate at 5 s(5 x 104) (3 x 10-7) = 0.015. OK.

  • Two experiments:

    • Assume allowable false positive rate: 10.

    • 2 (5 x 104) (1 x 10-4) = 10 3.7 s required.

    • Required other experiment verification, assume rate 0.01: (1 x 10-3)(10) = 0.01 3.1 s required.

  • Caveats: Is the significance real? Are there common systematic errors?

Phystat05 Highlights

MKU


Confidence intervals

22

Confidence Intervals

  • Various techniques discussed during conference. Most concerns were summarized by Feldman.

    • Bayesian: good method but Heinrich showed that flat priors in multi-D may lead to undesirable results (undercoverage).

    • Frequentist-Bayesian hybrids: Bayesian for priors and frequentist to extract range. Cranmer considered this for LHC (which was also used at Higgs searches).

    • Profile likelihood: shown by Punzi to have issues when distribution is Poisson-like.

    • Full Neyman reconstruction: Cranmer and Punzi attempted this, but is not feasible for large number of nuisance parameters.

  • Banff workhsop of this summer was found useful in comparing various methods. The real suggestions for LHC will likely come from 2007 workshop on LHC issues.

Phystat05 Highlights

MKU


Event classification

23

Event Classification

  • The problem: Given a measurement of an event X find F(X) which returns 1 if the event is signal (s) and 0 if the event is background (b) to optimize a figure of merit, say, s/√b for discovery and s/ √(s+b) for established signal.

  • Theoretical solution: Use MC to calculate the likelihood ratio Ls(X)/Lb(X) and derive F(X) from it. Unfortunately, this does not work as in a high-dimension space, even the largest data set is sparse. (Feldman)

  • In recent years, physicists have turned to machine learning: give the computer samples of s and b events and let the computer figure out what F(X) is.

Phystat05 Highlights

MKU


Multivariate analysis

24

Multivariate Analysis

FriedmanMachine learning

ProsperRespondent

Narsky Bagging

RoeBoosting (Miniboone)

Gray Bayes optimal classification

Bhat Bayesian networks

Sarda Signal enhancement

Phystat05 Highlights

MKU


Multivariates and machine learning

25

Multivariates and Machine Learning

Various methods exist to classify, train and test events.

  • Artificial neural networks (ANN): currently the most widely used (examples from Prosper, …)

  • Decision trees: differentiating variable is used to separate sample into branches until a leaf with a preset number of signal and background events are found.

  • Trees with rules: combining a series of trees to increase single decision tree power (Friedman)

  • Bagging (Bootstrap AGGregatING) trees: build a collection of trees by selecting a sample of the training data (Narsky)

  • Boosted trees: a robust method that gives misclassified events in one tree a higher weight in the generation of a new tree

    Comparisons of significance were performed, but not all of were controlled experiments, so conclusions may be deceptive until further tests..

Phystat05 Highlights

MKU


Ex boosted decision trees roe

Boosting the tree

Decision tree

26

Ex: Boosted Decision Trees (Roe)

  • An nice example from MiniBoone

  • Create M many trees and take the final score for signal and background as weighted sum of individual trees

Phystat05 Highlights

MKU


Punzi effect getting l wrong

27

Punzi effect (getting L wrong)

Giovanni Punzi @ PHYSTAT2003

“Comments on L fits with variable resolution”

Separate two close signals (A and B) , when resolution σvaries event by event, and is different for 2 signals

e.g. M, Different numbers of tracks  different σM

Avoiding Punzi bias

  • Include p(σ|A) and p(σ|B) in fitOR

  • Fit each range of σi separately, and add (NA)i (NA)total, and similarly for B

    Beware of event-by-event variables and construct likelihoods accordingly

    (Talk by Catastini)

Phystat05 Highlights

MKU


Blind analyses

28

Blind Analyses

Potential problem: Experimenters’ bias

Original suggestion? Luis Alvarez

Methods of blinding:

  • Keep signal region box closed

  • Add random numbers to data

  • Keep Monte Carlo parameters blind

  • Use part of data to define procedure

    A number of analyses in experiments doing blind searches

    Don’t modify result after unblinding, in general..

    Question: Will LHC experiments choose to be blind? In which analysis?

Phystat05 Highlights

MKU


Astrophysics cosmology highlights

29

Astrophysics + Cosmology Highlights

Phystat05 Highlights

MKU


Astro cosmo general issues

Cosmologists

“Astronomers”

Particle Physicists

Bayesians

Frequentists

30

Astro/Cosmo General Issues

‘“There is only one universe” and some experiments can never be rerun’ – A. Jaffe (concluding talk)

 Astro+cosmo tend to be more Bayesian, by nature.

Phystat05 Highlights

  • Virtual Observatories: all astro data available from desktop

  • Data volume growth doubling every year, most data are on the web (Szalay)

    • Bad: computing & storage issues

    • Good (?): Systematic errors more significant statistical errors

  • Nichol discussed using grid techniques.

MKU


Astrophysics various hot points

31

Astrophysics: Various Hot Points

  • Flat priors have been used commonly, but are dangerous (Cox, Le Diberder, Cousins): would  be the best quantity to use or is it h2 ?

  • Issues with non-gaussian distribution of noise taken into account in the spectrum: a few methods discussed by Starck, Digel, ..

  • Blind analyses are rare (not so good at a priori modeling!)

  • Lots of good software in astrophysics and repositories more advanced than PP.

  • Jaffe’s talk has a a nice usage of CMB as a case study for statistical methods in astrophysics, starting from 1st principles of Bayesian.

Phystat05 Highlights

MKU


Software and available tools

32

Software and Available Tools

Phystat05 Highlights

MKU


Talks given on software

33

Talks Given on Software

LinnemannSoftware for Particles

NicholSoftware for Astro (and Grid)

Le DiberdersPlot

Paterno R

Kreschuk ROOT

Verkerke RooFit

Pia Goodness of Fit

Buckley CEDAR

Narsky StatPatternRecognition

Phystat05 Highlights

MKU


Available tools

34

Available Tools

  • A number of good software has become more and more available (good news for LHC!)

  • PP and astro use somehow different softwares (IDL, IRAF by astro, for ex.)

  • 2004 Phystat workshop at MSU on statistical software (mainly on R & ROOT) by Linnemann

  • Statatisticians have a repository of standard source codes (StatLib): http://lib.stat.cmu.edu/

  • One good output of the conference was a Recommendation of Statistical Software Repository at FNAL

  • Linnemann has a web page of collections: http://www.pa.msu.edu/people/linnemann/stat_resources.html

Phystat05 Highlights

MKU


Cdf statistics committee resources

35

CDF Statistics Committee resources

  • Documentation about statistics and a repository: http://www-cdf.fnal.gov/physics/statistics/statistics_home.html

Phystat05 Highlights

MKU


Sample repository page

36

Sample Repository Page

Phystat05 Highlights

MKU


Cedar cepa

37

CEDAR & CEPA

Phystat05 Highlights

MKU


Summary conclusions

38

Summary & Conclusions

  • Very useful physicists/statisticians interaction

  • e.g. Confidence intervals with nuisance parameters,

  • Multivariate techniques, etc..

  • Lots of things learnt from

    • ourselves (by having to present own stuff!)

    • each other (various different approaches..)

    • statisticians (update on techniques..)

  • A step towards common tools/Software repositories: http://www.phystat.org(Linnemann)

  • Programme, transparencies, papers, etc:http://www.physics.ox.ac.uk/phystat05(with useful links such as recommended readings)

  • Proceedings published by Imperial College Press (Spring ’06)

Phystat05 Highlights

MKU


What is next

39

What is Next?

  • A few workshops/schools took place since October, 2005

  • e.g. Manchester (Nov 2005), SAMSI Duke (April 2006), Banff (July 2006), Spanish Summer School (July 2006)

  • No PHYSTAT Conference in summer 2007

  • ATLAS Workshop on Statistical Methods, 18-19 Jan 2007

  • PHYSTAT Workshop at CERN, 27-29 June 2007 on

  • “Statistical issues for LHC Physics analyses”.

  • (Both workshops will likely aim at discovery significance. Please attend!)

  • Suggestions/enquiries to: [email protected]

Phystat05 Highlights

  • LHC will take data soon. We do not wish to say

  • rather say

“The experiment was inconclusive, so we had to use statistics”

(inside cover of “the Good Book” by L. Lyons)

We used statistics, and so we are sure that we’ve discovered X

(well… with some confidence level!)

MKU


Some final notes

40

Some Final Notes

  • Tried to give you a collage of PHYSTAT05 topics.

  • My deepest thanks to Louis for giving me the chance & introducing me to the PHYSTAT experience!

  • Apologies to those talks I have not been able to cover…

  • Thank you for the invitation!

Phystat05 Highlights

MKU


Backup

41

Backup

  • Bayes

  • Frequentist

  • Cousins-Highland

  • Higgs Saga at CERN

Phystat05 Highlights

MKU


Bayesian approach

42

Bayesian Approach

Bayesian

Bayes’ Theorem

Phystat05 Highlights

posterior

likelihood

prior

Problems: P(param) True or False

“Degree of belief”

Prior What functional form?

Flat? Which variable?

MKU


Frequentist approach

43

Frequentist Approach

Neyman Construction

µ

x

x0

µ = Theoretical parameter

x = ObservationNO PRIOR

Phystat05 Highlights

MKU


Frequentist approach1

44

Frequentist Approach

at 90% confidence

Frequentist

Phystat05 Highlights

Bayesian

MKU


A method

45

A Method

Method: Mixed Frequentist - Bayesian

Full frequentist method hard to apply in several dimensions

Bayesian for nuisance parameters and

Frequentist to extract range

Philosophical/aesthetic problems?

Highland and Cousins

NIM A320 (1992) 331

Phystat05 Highlights

MKU


Higgs saga

46

Higgs Saga

P (Data;Theory) P (Theory;Data)

Is data consistent with Standard Model?

or with Standard Model + Higgs?

Phystat05 Highlights

End of Sept 2000: Data not very consistent with S.M.

Prob (Data ; S.M.) < 1% valid frequentist statement

Turned by the press into: Prob (S.M. ; Data) < 1%

and therefore Prob (Higgs ; Data) > 99%

i.e. “It is almost certain that the Higgs has been seen”

MKU


  • Login