1 / 39

Probability and Statistics

Probability and Statistics. What is probability? What is statistics?. Probability and Statistics. Probability Formally defined using a set of axioms Seeks to determine the likelihood that a given event or observation or measurement will or has happened

tdent
Download Presentation

Probability and Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probability and Statistics • What is probability? • What is statistics?

  2. Probability and Statistics • Probability • Formally defined using a set of axioms • Seeks to determine the likelihood that a given event or observation or measurement will or has happened • What is the probability of throwing a 7 using two dice? • Statistics • Used to analyze the frequency of past events • Uses a given sample of data to assess a probabilistic model’s validity or determine values of its parameters • After observing several throws of two dice, can I determine whether or not they are loaded • Also depends on what we mean by probability

  3. Probability and Statistics • We perform an experiment to collect a number of top quarks • How do we extract the best value for its mass? • What is the uncertainty of our best value? • Is our experiment internally consistent? • Is this value consistent with a given theory, which itself may contain uncertainties? • Is this value consistent with other measurements of the top quark mass?

  4. Probability and Statistics • CDF “discovery” announced 4/11/2011

  5. Probability and Statistics

  6. Probability and Statistics • Pentaquark search - how can this occur? • 2003 – 6.8s effect 2005 – no effect

  7. Probability • Let the sample space S be the space of all possible outcomes of an experiment • Let x be a possible outcome • Then P(x found in [x,x+dx]) = f(x)dx • f(x) is called the probability density function (pdf) • It may be called f(x;q) since the pdf could depend on one or more parameters q • Often we will want to determine q from a set of measurements • Of course x must be somewhere so

  8. Probability • Definitions of mean and variance are given in terms of expectation values

  9. Probability • Definitions of covariance and correlation coefficient

  10. Probability • Error propagation

  11. Probability • This gives the familiar error propagation formulas for sums (or differences) and products (or ratio)

  12. Uniform Distribution • Let • What is the position resolution of a silicon or multiwire proportional chamber with detection elements of space x?

  13. Binomial Distribution • Consider N independent experiments (Bernoulli trials) • Let the outcome of each be pass or fail • Let the probability of pass = p

  14. Permutations • Quick review

  15. Binomial Distribution • For the mean and variance we obtain (using small tricks) • And note with the binomial theorem that

  16. Binomial Distribution • Binomial pdf

  17. Binomial Distribution • Examples • Coin flip (p=1/2) • Dice throw (p=1/6) • Branching ratio of nuclear and particle decays (p=Br) • Detector or trigger efficiencies (pass or not pass) • Blood group B or not blood group B

  18. Binomial Distribution • It’s baseball season! What is the probability of a 0.300 hitter getting 4 hits in one game?

  19. Poisson Distribution • Consider when

  20. Poisson Distribution

  21. Poisson Distribution • Poisson pdf

  22. Poisson Distribution • Examples • Particles detected from radioactive decays • Sum of two Poisson processes is a Poisson process • Particles detected from scattering of a beam on target with cross section s • Cosmic rays observed in a time interval t • Number of entries in a histogram bin when data is accumulated over a fixed time interval • Number of Prussian soldiers kicked to death by horses • Infant mortality • QC/failure rate predictions

  23. Poisson Distribution • Let

  24. Gaussian Distribution • Gaussian distribution • Important because of the central limit theorem • For n independent variables x1,x2,…,xN that are distributed according to any pdf, then the sum y=∑xi will have a pdf that approaches a Gaussian for large N • Examples are almost any measurement error (energy resolution, position resolution, …)

  25. Gaussian Distribution • The familiar Gaussian pdf is

  26. Gaussian Distribution • Some useful properties of the Gaussian distribution are • P(x in range m±s) = 0.683 • P(x in range m±2s) = 0.9555 • P(x in range m±3s) = 0.9973 • P(x outside range m±3s) = 0.0027 • P(x outside range m±5s) = 5.7x10-7 • P(x in range m±0.6745s) = 0.5

  27. c2 Distribution • Chi-square distribution

  28. c2 Distribution

  29. Probability

  30. Probability • Probability can be defined in terms of Kolmogorov axioms • The probability is a real-valued function defined on subsets A,B,… in sample space S • This means the probability is a measure in which the measure of the entire sample space is 1

  31. Probability • We further define the conditional probability P(A|B) read P(A) given B • Bayes’ theorem

  32. Probability • For disjoint Ai • Usually one treats the Ai as outcomes of a repeatable experiment

  33. Probability • Usually one treats the Ai as outcomes of a repeatable experiment • Then P(A) is usually assigned a value equal to the limiting frequency of occurrence of A • Called frequentist statistics • But Ai could also be interpreted as hypotheses, each of which is true or false • Then P(A) represents the degree of belief that hypothesis A is true • Called Bayesian statistics

  34. Bayes’ Theorem • Suppose in the general population • P(disease) = 0.001 • P(no disease) = 0.999 • Suppose there is a test to check for the disease • P(+, disease) = 0.98 • P(-, disease) = 0.02 • But also • P(+, no disease) = 0.03 • P(-, no disease) = 0.97 • You are tested for the disease and it comes back +. Should you be worried?

  35. Bayes’ Theorem • Apply Bayes’ theorem • 3.2% of people testing positive have the disease • Your degree of belief about having the disease is 3.2%

  36. Bayes’ Theorem • Is athlete A guilty of drug doping? • Assume a population of athletes in this sport • P(drug) = 0.005 • P(no drug) = 0.995 • Suppose there is a test to check for the drug • P(+, drug) = 0.99 • P(-, drug) = 0.01 • But also • P(+, no drug) = 0.004 • P(-, no drug) = 0.996 • The athlete is tested positive. Is he/she involved in drug doping?

  37. Bayes’ Theorem • Apply Bayes’ theorem • ???

  38. Binomial Distribution • Calculating efficiencies • Usually use e instead of p

  39. Binomial Distribution • But there is a problem • If n=0, d(e’) = 0 • If n=N, d(e’) = 0 • Actually we went wrong in assuming the best estimate for e is n/N • We should really have used the most probable value of e given n and N • A proper treatment uses Bayes’ theorem but lucky for us (in HEP) the solution is implemented in ROOT • h_num->Sumw2() • h_den->Sumw2() • h_eff->Divide(h_num,h_den,1.0,1.0,”B”)

More Related