Statistics

1 / 82

# Statistics - PowerPoint PPT Presentation

Statistics. We collect a sample of data, what do we do with it? Estimate parameters (possibly of some model) Test whether a particular theory is consistent with our data (hypothesis testing) Statistics is a set of tools that allows us to achieve these goals. Statistics. Preliminaries.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Statistics' - elias

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Statistics
• We collect a sample of data, what do we do with it?
• Estimate parameters (possibly of some model)
• Test whether a particular theory is consistent with our data (hypothesis testing)
• Statistics is a set of tools that allows us to achieve these goals
Statistics
• Preliminaries
Statistics
• Some common estimators are for the mean and variance
c2 Distribution
• A common situation is that you have a set of measurements xi and you know the true value of each xit
• How good are our measurements?
• Similarly you may be comparing a histogram of data with another that contains expectation values under some hypothesis
• How well do the data agree with this hypothesis?
• Or if parameters of a function were estimated using the method of least squares, a minimum value of c2 was obtained
• How good was the fit?
c2 Distribution
• Assuming
• The measurements are independent of each other
• The measurements come from a Gaussian distribution
• One can use the “goodness-of-fit” statistic c2 to answer these questions
• In the case of Poisson distributed numbers, si2=xti, this is called Pearson’s c2 statistic
c2 Distribution
• Chi-square distribution
c2 Distribution
• The integrals (or cumulative distributions) between arbitrary points for both the Gaussian and c2 distributions cannot be evaluated analytically and must be looked up
• What is the probability of getting a c2 > 10 with 4 degrees of freedom?
• This number tells you the probability that random fluctuations (chance fluctuations) in the data would give a value of c2 > 10
c2 Distribution
• Note the p-value is defined as
• We’ll come back to p-values in a moment
c2 Distribution
• 1- cumulative c2distribution
c2 Distribution
• Often one uses the reduced c2 = c2/n
Hypothesis Testing
• Hypothesis tests provide a rule for accepting or rejecting hypotheses depending on the outcome of a measurement
Hypothesis Testing
• Normally we define regions in x-space that define where the data is compatible with H or not
Hypothesis Testing
• Let’s say there is just one hypothesis H
• We can define some test statistic t whose value in some way reflects the level of agreement between the data and they hypothesis
• We can quantify the goodness-of-fit by specifying a p-value given an observed tobs in the experiment
• Assumes t is defined such that large values correspond to poor agreement with the hypothesis
• g is the pdf for t
Hypothesis Testing
• Notes
• p is not the significance level of the test
• p is not the confidence level of a confidence interval
• p is not the probability that H is true
• That’s Bayesian speak
• p is the probability, under the assumption of H, of obtaining data (x or t(x)) having equal or lesser compatibility with H as xobs
Hypothesis Testing
• Flip coins
• Hypothesis H is coin is fair (random) so ph=pt=0.5
• We could take t=|nh-N/2|
• Toss coin N=20 times and observe nh=17
• Is H false?
• Don’t know
• We can say that probability of observing 17 or more heads assuming H is 0.0026
• p is the probability of observing this result “by chance”
Kolmogorov-Smirnov (K-S) Test
• The K-S test is an alternative to the c2test when the data sample is small
• It is also more powerful than the c2test since it does not rely on bins – though one commonly uses it that way
• A common use is to quantify how well data and Monte Carlo distributions agree
• It also does not depend on the underlying cumulative distribution function being tested
K-S Test
• Data – Monte Carlo comparison
K-S Test
• The K-S test is based on the empirical distribution function (ECDF) Fn(x)
• For n ordered data points yi
• This is a step function that increases by 1/N at the value of each ordered data point
K-S Test
• The K-S statistic is given by
• If D > some critical value obtained from tables, the hypothesis (data and theory distributions agree) is rejected
Statistics
• Suppose N independent measurements xi are drawn from a pdf f(x;q)
• We want want to estimate the parameters q
• The most important method for doing this is the method of maximum likelihood
• A related method in the case of least squares
Hypothesis Testing
• Example
• Properties of some selected events
• Hypothesis H is these are top quark events
• Working in x-space is hard so usually one constructs a test statistic t instead whose value reflects the compatibility between the data vector x and H
• Low t – data more compatible with H
• High t – data less compatible with H
• Since f(x,H) is known, g(t,H) can be determined
Hypothesis Testing
• Notes
• p is not the significance level of the test
• p is not the confidence level of a confidence interval
• p is not the probability that H is true
• That’s Bayesian speak
• p is the probability, under the assumption of H, of obtaining data (x or t(x)) having equal or lesser compatibility with H as xobs
• Since p is a function of r.v. x, p itself is a r.v
• If H is true, p is uniform in [0,1]
• If H is not true, p is peaked closer to 0
Hypothesis Testing
• Suppose we observe nobs=ns+nb events
• ns, nb are Poisson r.v.’s with means ns,nb
• nobs=ns+nb is Poisson r.v. with mean n=ns+nb
Hypothesis Testing
• Suppose nb=0.5 and we observe nobs=5
• Publish/NY Times headline or not?
• Often we take H to be the null hypothesis – assume it’s random fluctuation of background
• Assume ns=0
• This is the probability of observing 5 or more resulting from chance fluctuations of the background
Hypothesis Testing
• Another problem, instead of counting events say we measure some variable x
• Publish/NY Times headline or not?
Hypothesis Testing
• Again take H to be the null hypothesis – assume it’s random fluctuation of background
• Assume ns=0
• Again p is the probability of observing 11 or more events resulting from chance fluctuations of the background
• How did we know where to look / how to bin?
• Is the observed width consistent with the resolution in x?
• Would a slightly different analysis still show a peak?
• What about the fact that the bins on either side of the peak are low?
Least Squares
• Another approach is to compare a histogram with a hypothesis that provides expectation values
• In this case we’d compare a vector of Poisson distributed numbers (the histogram) with their expectation values ni=E[ni]
• This is called Pearson’s statistic
• If the ni are not too small (e.g. ni > 5) then the observed c2 will follow the chi-square pdf for N dof
• Or more generally for N – number of fitted parameters
• Same will hold true for N independent measurements yi that are Gaussian distributed
Least Squares
• We can calculate the p-value as
• In our example
Least Squares
• In our example though we have many bins with a small number of counts or 0
• We can still use Pearson’s test but we need to determine the pdf f(c2) by Monte Carlo
• Generate ni from Poisson, mean niin each bin
• Compute c2 and record in a histogram
• Repeat for a large number of times (see next slide)
Least Squares
• Using the modified pdf would give p=0.11 rather than p=0.073
• In either case, we won’t publish
K-S Test
• Usage in ROOT
• TFile * data
• TFile * MC
• TH1F * jet_pt = data → Get(“h_jet_pt”)
• TH1F * MCjet_pt = MC → Get(“h_jet_pt”)
• Double_t KS=MCjet_pt→KolmogorovTest(jet_pt)
• Notes
• The returned value is the probability of the test
• << 1 means the two histograms are not compatable
• The returned value is not the maximum KS distance though you can return this with option “M”
• Also available in statistical toolbox in MatLab
Limiting Cases

Binomial

Poisson

Gaussian

Kaplan-Meier Curve
• A patient is treated for a disease. What is the probability of an individual surviving or remaining disease-free?
• Usually patients will be followed for various lengths of time after treatment
• Some will survive or remain disease-free while others will not. Some will leave the study.
• A nonparametric method can be found using
• Kaplan-Meier curve
• Life table
• Survival curve

36

Kaplan-Meier Curve
• Calculate a conditional probability
• S(tN) = P(t1) x P(t2) x P(t3) x … P(tN)
• The survival function S(t) is equivalent to the empirical distribution function F(t)
• We can write this as

37

Kaplan-Meier Curve
• The square root of the variance of S(t) can be calculated as
• Assuming the pk follow a Gaussian (normal) distribution, then the 95% CL will be

39

Gaussian Distribution
• Some useful properties of the Gaussian distribution are
• P(x in range m±s) = 0.683
• P(x in range m±2s) = 0.9555
• P(x in range m±3s) = 0.9973
• P(x outside range m±3s) = 0.0027
• P(x outside range m±5s) = 5.7x10-7
• P(x in range m±0.6745s) = 0.5
Confidence Intervals
• Suppose you have a bag of black and white marbles and wish to determine the fraction f that are white. How confident are you of the initial composition? How does your confidence change after extracting n black balls?
• Suppose you are tested for a disease. The test is 100% accurate if you have the disease. The test gives 0.2% false positive if you do not. The test comes back positive. What is the probability that you have the disease?
Confidence Intervals
• Suppose you are searching for the Higgs and have a well-known expected background of 3 events. What 90% confidence limit can you set on the Higgs cross section
• if you observe 0 events?
• if you observe 3 events?
• if you observe 10 events?
• The ability to set confidence limits (or claim discovery) is an important part of frontier physics
• How to do this the “correct” way is somewhat/very controversial
Confidence Intervals
• Questions
• What is the mass of the top quark?
• What is the mass of the tau neutrino
• What is the mass of the Higgs
• Mt = 172.5 ± 2.3 GeV
• Mv < 18.2 MeV
• MH > 114.3 GeV
• Mt = 172.5 ± 2.3 GeV with CL = 0.683
• 0 < Mv < 18.2 MeV with CL = 0.95
• Infinity > MH > 114.3 GeV with CL = 0.95
Confidence Interval
• A confidence interval reflects the statistical precision of the experiment and quantifies the reliabiltiy of a measurement
• For a sufficiently large data sample, the mean and standard deviation of the mean provide a good provide a good interval
• What if the pdf isn’t Gaussian?
• What if there are physical boundaries?
• What if the data sample is small?
• Here we run into problems
Confidence Interval
• A dog has a 50% probability of being 100m from its master
• You observe the dog, what can you say about its master?
• With 50% probability, the master is within 100m of the dog
• But this assumes
• The master can be anywhere around the dog
• The dog has no preferred direction of travel
Confidence Intervals
• Neyman’s construction
• Consider a pdf f(x;θ) = P(x|θ)
• For each value of θ, we construct a horizontal line segment [x1,x2] such that P(x Î[x1,x2]|θ) = 1-a
• The union of such intervals for all values of θ is called the confidence belt
Confidence Intervals
• Neyman’s construction
• After performing an experiment to measure x, a vertical line is drawn through the experimentally measured value x0
• The confidence interval for θis the set of all values of θfor which the corresponding line segment [x1,x2] is intercepted by the vertical line
Confidence Interval
• Notes
• The coverage condition is not unique
• P(x<x1|θ) = P(x>x2|θ) = a/2
• Called central confidence intervals
• P(x<x1|θ) = a
• Called upper confidence limits
• P(x>x2|θ) = a
• Called lower confidence limits
Poisson Confidence Interval
• We previously mentioned that the number of events produced in a reaction with cross section σ and fixed luminosity L follows a Poisson distribution with mean n=σ∫Ldt
• P(n;v) = e-nnn / n!
• If the variables are discrete by convention one constructs the confidence belt by requiring P(x1<x<x2|θ) >= 1-a
• Example: Measuring the Higgs production cross section assuming no background
Poisson Confidence Interval

Poisson Distribution

Poisson Confidence Interval
• Assume signal s and background b
Confidence Intervals
• Sometimes though confidence intervals
• Are empty
• Reduce in size when the background estimate increases
• Are smaller for a poorer experiment
• Exclude parameters for which the experiment is insensitive
• Example
• We know that P(x=0|v=2.3) = 0.1
• v < 2.3 @ 90% CL
• If the number of background events b is 3, then since v = s + b, number of signal events s < -0.7 at 90% CL?
Confidence Interval
• Experiment X uses a fit to extract the neutrino mass
• Mv = -4 ± 2 eV
• => P (Mv < 0 eV) = 0.98?
Confidence Interval
• What is probability?
• Frequentist approach
• Developed by Venn, Fisher, Neyman, von Mises
• The relative frequency with which something happens
• number of successes / number of trials
• Venn limit (n trials to infinity)
• Assumes success appeared in the past and will occur in the future with the same probability
• It will rain tomorrow in Tucson and P(S) = 0.01
• The relative frequency it rains on Mondays in April is 0.01
Confidence Interval
• What is probability
• Bayesian approach
• Developed by Bayes, Laplace, Gauss, Jeffreys, de Finetti
• The degree of belief or confidence of a statement or measurement
• Closer to what is used in everyday life
• Is the Standard Model correct
• Similar to betting odds
• Not “scientific”?
• It will rain tomorrow in Tucson and P(S) = 0.01
• The plausibility of the above statement is 0.01 (ie the same as if I were to draw a white ball out of a container of 100 balls, 1 of which is white)
Confidence Interval
• Usually
• Confidence interval == frequentist confidence interval
• Credible interval == Bayesian posterior probability interval
• But you’ll also hear Bayesian confidence interval
• Probability
• P = 1 – a
• a = 0.05 => P = 95%
Confidence Interval
• Suppose you wish to determine a parameter θ whose true value is θt is unknown
• Assume we make a single measurement of an observable x whose pdf P(x|θ) depends on θ
• Recall this is the probability of obtaining x given θ
• Say we measure x0, then we obtain P(x0|θ)
• Frequentist
• Bayesian
• P(θt|x0) = P(x0|θt) P(θt) / P(x0)
• We’ll stick with the frequentist approach for the moment
Confidence Interval
• (Frequentist) confidence intervals are constructed to include the true value of the parameter (θt) with a probability of 1-α
• In fact this is true for any value of θ
• A confidence interval [θ1,θ2] is a member of a set, such that the set has the property that P(θÎ [θ1,θ2])= 1-α
• Perform an ensemble of experiments with fixed θ
• The interval [θ1,θ2] will vary and cover the fixed value θ in a fraction of 1-α of the experiments
• Presumably when we make a measurement we are selecting it at random from the ensemble that contains the true value of θ, θt
• Note we haven’t said anything about the probability of θt being in the interval [θ1,θ2] as a Bayesian would
Confidence Interval
• If P(θ Î[θ1,θ2]) = 1-a is true we say the intervals “cover” θat the stated confidence
• If there are values of θfor which P(θ Î[θ1,θ2]) < 1-a we say the intervals “undercover” for that θ
• If there are values of θfor which P(θ Î[θ1,θ2]) > 1-a we say the intervals “overcover” for that θ
• Overcoverage is conservative
Confidence Intervals
• Neyman’s construction
• Consider a pdf f(x;θ) = P(x|θ)
• For each value of θ, we construct a horizontal line segment [x1,x2] such that P(x Î[x1,x2]|θ) = 1-a
• The union of such intervals for all values of θ is called the confidence belt
Confidence Intervals
• Neyman’s construction
• After performing an experiment to measure x, a vertical line is drawn through the experimentally measured value x0
• The confidence interval for θis the set of all values of θfor which the corresponding line segment [x1,x2] is intercepted by the vertical line
Confidence Interval
• Notes
• The coverage condition is not unique
• P(x<x1|θ) = P(x>x2|θ) = a/2
• Called central confidence intervals
• P(x<x1|θ) = a
• Called upper confidence limits
• P(x>x2|θ) = a
• Called lower confidence limits
Confidence Intervals
• These confidence intervals have a confidence level = 1-a
• By construction, P(θ Î[θ1,θ2]) > 1-a is satisfied for all θ including θt
• Another method is to consider a test of the hypothesis that the parameters true value is θ
• If the variables are discrete by convention one constructs the confidence belt by requiring P(x1<x<x2|θ) >= 1-a
Examples
• Data consisting of a single random variable x that follows a Gaussian distribution
• Counting experiments
Poisson Confidence Interval
• We previously mentioned that the number of events produced in a reaction with cross section σ and fixed luminosity L follows a Poisson distribution with mean v=σ∫Ldt
• P(n;v) = e-v vn / n!
• If the variables are discrete by convention one constructs the confidence belt by requiring P(x1<x<x2|θ) >= 1-a
• Example: Measuring the Higgs production cross section assuming no background
Poisson Confidence Interval

Poisson Distribution

Confidence Intervals
• Sometimes though confidence intervals
• Are empty
• Reduce in size when the background estimate increases
• Are smaller for a poorer experiment
• Exclude parameters for which the experiment is insensitive
• Example
• We know that P(x=0|v=2.3) = 0.1
• v < 2.3 @ 90% CL
• If the number of background events b is 3, then since v = s + b, number of signal events s < -0.7 at 90% CL?