- 56 Views
- Uploaded on
- Presentation posted in: General

Statistics

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

- We collect a sample of data, what do we do with it?
- Estimate parameters (possibly of some model)
- Test whether a particular theory is consistent with our data (hypothesis testing)

- Statistics is a set of tools that allows us to achieve these goals

- Preliminaries

- Some common estimators are for the mean and variance

- A common situation is that you have a set of measurements xi and you know the true value of each xit
- How good are our measurements?

- Similarly you may be comparing a histogram of data with another that contains expectation values under some hypothesis
- How well do the data agree with this hypothesis?

- Or if parameters of a function were estimated using the method of least squares, a minimum value of c2 was obtained
- How good was the fit?

- Assuming
- The measurements are independent of each other
- The measurements come from a Gaussian distribution

- One can use the “goodness-of-fit” statistic c2 to answer these questions
- In the case of Poisson distributed numbers, si2=xti, this is called Pearson’s c2 statistic

- Chi-square distribution

- The integrals (or cumulative distributions) between arbitrary points for both the Gaussian and c2 distributions cannot be evaluated analytically and must be looked up
- What is the probability of getting a c2 > 10 with 4 degrees of freedom?
- This number tells you the probability that random fluctuations (chance fluctuations) in the data would give a value of c2 > 10

- Note the p-value is defined as
- We’ll come back to p-values in a moment

- 1- cumulative c2distribution

- Often one uses the reduced c2 = c2/n

- Hypothesis tests provide a rule for accepting or rejecting hypotheses depending on the outcome of a measurement

- Normally we define regions in x-space that define where the data is compatible with H or not

- Let’s say there is just one hypothesis H
- We can define some test statistic t whose value in some way reflects the level of agreement between the data and they hypothesis
- We can quantify the goodness-of-fit by specifying a p-value given an observed tobs in the experiment
- Assumes t is defined such that large values correspond to poor agreement with the hypothesis
- g is the pdf for t

- Notes
- p is not the significance level of the test
- p is not the confidence level of a confidence interval
- p is not the probability that H is true
- That’s Bayesian speak

- p is the probability, under the assumption of H, of obtaining data (x or t(x)) having equal or lesser compatibility with H as xobs

- Flip coins
- Hypothesis H is coin is fair (random) so ph=pt=0.5
- We could take t=|nh-N/2|

- Toss coin N=20 times and observe nh=17
- Is H false?
- Don’t know
- We can say that probability of observing 17 or more heads assuming H is 0.0026
- p is the probability of observing this result “by chance”

- The K-S test is an alternative to the c2test when the data sample is small
- It is also more powerful than the c2test since it does not rely on bins – though one commonly uses it that way
- A common use is to quantify how well data and Monte Carlo distributions agree

- It also does not depend on the underlying cumulative distribution function being tested

- Data – Monte Carlo comparison

- The K-S test is based on the empirical distribution function (ECDF) Fn(x)
- For n ordered data points yi

- This is a step function that increases by 1/N at the value of each ordered data point

- The K-S statistic is given by
- If D > some critical value obtained from tables, the hypothesis (data and theory distributions agree) is rejected

- Suppose N independent measurements xi are drawn from a pdf f(x;q)
- We want want to estimate the parameters q
- The most important method for doing this is the method of maximum likelihood
- A related method in the case of least squares

- Example
- Properties of some selected events
- Hypothesis H is these are top quark events

- Working in x-space is hard so usually one constructs a test statistic t instead whose value reflects the compatibility between the data vector x and H
- Low t – data more compatible with H
- High t – data less compatible with H

- Since f(x,H) is known, g(t,H) can be determined

- Notes
- p is not the significance level of the test
- p is not the confidence level of a confidence interval
- p is not the probability that H is true
- That’s Bayesian speak

- p is the probability, under the assumption of H, of obtaining data (x or t(x)) having equal or lesser compatibility with H as xobs

- Since p is a function of r.v. x, p itself is a r.v
- If H is true, p is uniform in [0,1]
- If H is not true, p is peaked closer to 0

- Suppose we observe nobs=ns+nb events
- ns, nb are Poisson r.v.’s with means ns,nb
- nobs=ns+nb is Poisson r.v. with mean n=ns+nb

- Suppose nb=0.5 and we observe nobs=5
- Publish/NY Times headline or not?

- Often we take H to be the null hypothesis – assume it’s random fluctuation of background
- Assume ns=0
- This is the probability of observing 5 or more resulting from chance fluctuations of the background

- Another problem, instead of counting events say we measure some variable x
- Publish/NY Times headline or not?

- Again take H to be the null hypothesis – assume it’s random fluctuation of background
- Assume ns=0

- Again p is the probability of observing 11 or more events resulting from chance fluctuations of the background
- How did we know where to look / how to bin?
- Is the observed width consistent with the resolution in x?
- Would a slightly different analysis still show a peak?
- What about the fact that the bins on either side of the peak are low?

- Another approach is to compare a histogram with a hypothesis that provides expectation values
- In this case we’d compare a vector of Poisson distributed numbers (the histogram) with their expectation values ni=E[ni]
- This is called Pearson’s statistic
- If the ni are not too small (e.g. ni > 5) then the observed c2 will follow the chi-square pdf for N dof
- Or more generally for N – number of fitted parameters
- Same will hold true for N independent measurements yi that are Gaussian distributed

- We can calculate the p-value as
- In our example

- In our example though we have many bins with a small number of counts or 0
- We can still use Pearson’s test but we need to determine the pdf f(c2) by Monte Carlo
- Generate ni from Poisson, mean niin each bin
- Compute c2 and record in a histogram
- Repeat for a large number of times (see next slide)

- Using the modified pdf would give p=0.11 rather than p=0.073
- In either case, we won’t publish

- Usage in ROOT
- TFile * data
- TFile * MC
- TH1F * jet_pt = data → Get(“h_jet_pt”)
- TH1F * MCjet_pt = MC → Get(“h_jet_pt”)
- Double_t KS=MCjet_pt→KolmogorovTest(jet_pt)

- Notes
- The returned value is the probability of the test
- << 1 means the two histograms are not compatable

- The returned value is not the maximum KS distance though you can return this with option “M”

- The returned value is the probability of the test
- Also available in statistical toolbox in MatLab

Binomial

Poisson

Gaussian

- CDF result

- A patient is treated for a disease. What is the probability of an individual surviving or remaining disease-free?
- Usually patients will be followed for various lengths of time after treatment
- Some will survive or remain disease-free while others will not. Some will leave the study.
- A nonparametric method can be found using
- Kaplan-Meier curve
- Life table
- Survival curve

36

- Calculate a conditional probability
- S(tN) = P(t1) x P(t2) x P(t3) x … P(tN)
- The survival function S(t) is equivalent to the empirical distribution function F(t)

- We can write this as

- S(tN) = P(t1) x P(t2) x P(t3) x … P(tN)

37

- The square root of the variance of S(t) can be calculated as
- Assuming the pk follow a Gaussian (normal) distribution, then the 95% CL will be

39

- Some useful properties of the Gaussian distribution are
- P(x in range m±s) = 0.683
- P(x in range m±2s) = 0.9555
- P(x in range m±3s) = 0.9973
- P(x outside range m±3s) = 0.0027
- P(x outside range m±5s) = 5.7x10-7
- P(x in range m±0.6745s) = 0.5

- Suppose you have a bag of black and white marbles and wish to determine the fraction f that are white. How confident are you of the initial composition? How does your confidence change after extracting n black balls?
- Suppose you are tested for a disease. The test is 100% accurate if you have the disease. The test gives 0.2% false positive if you do not. The test comes back positive. What is the probability that you have the disease?

- Suppose you are searching for the Higgs and have a well-known expected background of 3 events. What 90% confidence limit can you set on the Higgs cross section
- if you observe 0 events?
- if you observe 3 events?
- if you observe 10 events?

- The ability to set confidence limits (or claim discovery) is an important part of frontier physics
- How to do this the “correct” way is somewhat/very controversial

- Questions
- What is the mass of the top quark?
- What is the mass of the tau neutrino
- What is the mass of the Higgs

- Answers
- Mt = 172.5 ± 2.3 GeV
- Mv < 18.2 MeV
- MH > 114.3 GeV

- More correct answers
- Mt = 172.5 ± 2.3 GeV with CL = 0.683
- 0 < Mv < 18.2 MeV with CL = 0.95
- Infinity > MH > 114.3 GeV with CL = 0.95

- A confidence interval reflects the statistical precision of the experiment and quantifies the reliabiltiy of a measurement
- For a sufficiently large data sample, the mean and standard deviation of the mean provide a good provide a good interval
- What if the pdf isn’t Gaussian?
- What if there are physical boundaries?
- What if the data sample is small?

- Here we run into problems

- A dog has a 50% probability of being 100m from its master
- You observe the dog, what can you say about its master?
- With 50% probability, the master is within 100m of the dog
- But this assumes
- The master can be anywhere around the dog
- The dog has no preferred direction of travel

- You observe the dog, what can you say about its master?

- Neyman’s construction
- Consider a pdf f(x;θ) = P(x|θ)
- For each value of θ, we construct a horizontal line segment [x1,x2] such that P(x Î[x1,x2]|θ) = 1-a
- The union of such intervals for all values of θ is called the confidence belt

- Neyman’s construction
- After performing an experiment to measure x, a vertical line is drawn through the experimentally measured value x0
- The confidence interval for θis the set of all values of θfor which the corresponding line segment [x1,x2] is intercepted by the vertical line

- Notes
- The coverage condition is not unique
- P(x<x1|θ) = P(x>x2|θ) = a/2
- Called central confidence intervals

- P(x<x1|θ) = a
- Called upper confidence limits

- P(x>x2|θ) = a
- Called lower confidence limits

- P(x<x1|θ) = P(x>x2|θ) = a/2

- The coverage condition is not unique

- We previously mentioned that the number of events produced in a reaction with cross section σ and fixed luminosity L follows a Poisson distribution with mean n=σ∫Ldt
- P(n;v) = e-nnn / n!
- If the variables are discrete by convention one constructs the confidence belt by requiring P(x1<x<x2|θ) >= 1-a

- Example: Measuring the Higgs production cross section assuming no background

Poisson Distribution

- Assume signal s and background b

- Sometimes though confidence intervals
- Are empty
- Reduce in size when the background estimate increases
- Are smaller for a poorer experiment
- Exclude parameters for which the experiment is insensitive

- Example
- We know that P(x=0|v=2.3) = 0.1
- v < 2.3 @ 90% CL
- If the number of background events b is 3, then since v = s + b, number of signal events s < -0.7 at 90% CL?

- Experiment X uses a fit to extract the neutrino mass
- Mv = -4 ± 2 eV
- => P (Mv < 0 eV) = 0.98?

- What is probability?
- Frequentist approach
- Developed by Venn, Fisher, Neyman, von Mises
- The relative frequency with which something happens
- number of successes / number of trials
- Venn limit (n trials to infinity)

- Assumes success appeared in the past and will occur in the future with the same probability

- It will rain tomorrow in Tucson and P(S) = 0.01
- The relative frequency it rains on Mondays in April is 0.01

- Frequentist approach

- What is probability
- Bayesian approach
- Developed by Bayes, Laplace, Gauss, Jeffreys, de Finetti
- The degree of belief or confidence of a statement or measurement
- Closer to what is used in everyday life
- Is the Standard Model correct

- Similar to betting odds
- Not “scientific”?

- It will rain tomorrow in Tucson and P(S) = 0.01
- The plausibility of the above statement is 0.01 (ie the same as if I were to draw a white ball out of a container of 100 balls, 1 of which is white)

- Bayesian approach

- Usually
- Confidence interval == frequentist confidence interval
- Credible interval == Bayesian posterior probability interval
- But you’ll also hear Bayesian confidence interval

- Probability
- P = 1 – a
- a = 0.05 => P = 95%

- P = 1 – a

- Suppose you wish to determine a parameter θ whose true value is θt is unknown
- Assume we make a single measurement of an observable x whose pdf P(x|θ) depends on θ
- Recall this is the probability of obtaining x given θ

- Say we measure x0, then we obtain P(x0|θ)
- Frequentist
- Makes statements about P(x|θ)

- Bayesian
- Makes statements about P(θt|x0)
- P(θt|x0) = P(x0|θt) P(θt) / P(x0)

- We’ll stick with the frequentist approach for the moment

- (Frequentist) confidence intervals are constructed to include the true value of the parameter (θt) with a probability of 1-α
- In fact this is true for any value of θ

- A confidence interval [θ1,θ2] is a member of a set, such that the set has the property that P(θÎ [θ1,θ2])= 1-α
- Perform an ensemble of experiments with fixed θ
- The interval [θ1,θ2] will vary and cover the fixed value θ in a fraction of 1-α of the experiments

- Presumably when we make a measurement we are selecting it at random from the ensemble that contains the true value of θ, θt
- Note we haven’t said anything about the probability of θt being in the interval [θ1,θ2] as a Bayesian would

- If P(θ Î[θ1,θ2]) = 1-a is true we say the intervals “cover” θat the stated confidence
- If there are values of θfor which P(θ Î[θ1,θ2]) < 1-a we say the intervals “undercover” for that θ
- If there are values of θfor which P(θ Î[θ1,θ2]) > 1-a we say the intervals “overcover” for that θ
- Undercoverage is bad
- Overcoverage is conservative

- Neyman’s construction
- Consider a pdf f(x;θ) = P(x|θ)
- For each value of θ, we construct a horizontal line segment [x1,x2] such that P(x Î[x1,x2]|θ) = 1-a
- The union of such intervals for all values of θ is called the confidence belt

- Neyman’s construction
- After performing an experiment to measure x, a vertical line is drawn through the experimentally measured value x0
- The confidence interval for θis the set of all values of θfor which the corresponding line segment [x1,x2] is intercepted by the vertical line

- Notes
- The coverage condition is not unique
- P(x<x1|θ) = P(x>x2|θ) = a/2
- Called central confidence intervals

- P(x<x1|θ) = a
- Called upper confidence limits

- P(x>x2|θ) = a
- Called lower confidence limits

- P(x<x1|θ) = P(x>x2|θ) = a/2

- The coverage condition is not unique

- These confidence intervals have a confidence level = 1-a
- By construction, P(θ Î[θ1,θ2]) > 1-a is satisfied for all θ including θt
- Another method is to consider a test of the hypothesis that the parameters true value is θ
- If the variables are discrete by convention one constructs the confidence belt by requiring P(x1<x<x2|θ) >= 1-a

- Data consisting of a single random variable x that follows a Gaussian distribution
- Counting experiments

- We previously mentioned that the number of events produced in a reaction with cross section σ and fixed luminosity L follows a Poisson distribution with mean v=σ∫Ldt
- P(n;v) = e-v vn / n!
- If the variables are discrete by convention one constructs the confidence belt by requiring P(x1<x<x2|θ) >= 1-a

- Example: Measuring the Higgs production cross section assuming no background

Poisson Distribution

- Sometimes though confidence intervals
- Are empty
- Reduce in size when the background estimate increases
- Are smaller for a poorer experiment
- Exclude parameters for which the experiment is insensitive

- Example
- We know that P(x=0|v=2.3) = 0.1
- v < 2.3 @ 90% CL
- If the number of background events b is 3, then since v = s + b, number of signal events s < -0.7 at 90% CL?