1 / 38

Statistical Data Analysis and Simulation

Statistical Data Analysis and Simulation. Jo ã o R. T. de Mello Neto. Jorge Andre Swieca School Campos do Jord ão, January,2003. Questions. What is probability? How to quantify it? What is the probability of something happens? What is the value of a given parameter?

Download Presentation

Statistical Data Analysis and Simulation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Data Analysis and Simulation João R. T. de Mello Neto Jorge Andre Swieca School Campos do Jordão, January,2003

  2. Questions • What is probability? How to quantify it? • What is the probability of something happens? • What is the value of a given parameter? • What is the uncertainty in a given parameter? • Is this fit acceptable? • What is the likelihood of a given signal be physics and not background? • How one separates signal from background?

  3. Chance The conception of chance enters into the very first steps of scientific activity, in virtue of the fact that no observation is absolutely correct. Max Born Natural Philosophy of Cause and Chance, p. 47 O acaso é um diabo e um deus ao mesmo tempo. Machado de Assis

  4. Lectures • Basics: random variables, probability, distributions • Random numbers, minimization techniques • Maximum likelihood and chi-square methods • Goodness of fit, limits • Applications: pattern recognition in the LHCb muon system, sigma particle fitting in E791, bayesian coin,…

  5. First lecture Basics: random variables, probabilities and distributions Jorge Andre Swieca School Campos do Jordão, January,2003

  6. References • Statistical Data Analysis, G. Cowan, Oxford, 1998; • Statistics, A guide to the Use of Statistical Methods in the Physical Sciences, R. Barlow, J. Wiley & Sons, 1989; • Computational Statistics Handbook with MATLAB, W. L. Martinez, A. R. Martinez, Chapman&Hall, 2002

  7. Random Variables • Random experiment: the outcome cannot be predicted with certainty • Statistics: model and analyze the outcomes • Sample space S = set of all possible outcomes • Die X = { 1, 2, 3, 4, 5, 6} • Period of a pendulum Errors in the measuring process Fundamental unpredictability Discrete random variable Continous random variable

  8. Probability • Quantify the degree of randomness; • Definition in terms of set theory: S composed of elements A (subsets of S) • P(A) real number that satisfy three axioms: • for every A, P(A) ≥ 0 • if A∩B = Ø (disjoints) P(AUB) = P(A) + P(B) • P(S) = 1 P(Ā) = 1 – P(A) P(Ø) = 0 P(AUĀ) = 1 If A C B, P(A) ≤ P(B) 0 ≤ P(A) ≤ 1 P(AUB) = P(A) + P(B) – P(A∩B)

  9. S A B ∩ ∩ ∩ ∩ ∩ ∩ ∩ ∩ ∩ ∩ events in A and B total events in B total Intuitiveapproach Conditional probability P(A|B) : prob. of event A given B P(A∩B) P(B) events in A and B Events in B P(A|B) = = = 2 3 P(B∩A) P(A) P(B|A) = = P(A∩B) = P(B|A)P(A) = 2/3 x 3/10 = 2/10 = P(A|B)P(B)

  10. S A B ∩ ∩ ∩ ∩ ∩ ∩ ∩ ∩ ∩ ∩ Intuitive approach P(A|B) = P(A) Independent probabilities not independent S A B ∩ ∩ ∩ independent ∩ ∩ ∩ ∩ ∩ ∩ ∩

  11. BayesTheorem disjoints

  12. 90% signal π signal K 10% Cherenkov counter 95% efficiency 6% false signals = 99.3% = 0.7% = 67.6% = 32.4%

  13. AIDS positive “About 0.01 percent of men with no known risk behaviour are infected with HIV (base rate). If such a man has the virus, there is a 99.9 percent chance that the test result will be positive (sensitivity). If a man is not infected, there is a 99.99 percent chance that the test result will be negative (specificity)” What is the chance that a man who tests positive actually has the virus? = 0.5 Reckoning with Risk, G. Gigerenger, 2002

  14. AIDS positive natural frequencies (no known risk behaviour) 10000 9999 no HIV 1 HIV 1 positive 9998 negative 1 positive 0 negative Many examples: mamography screening 1 out of 10 positives! Gigerenger, 2002

  15. Probability What is the meaning of P(A)? Frequentist: limit of relative frequencies S: possible outcomes of an experiment (repeatable) A: occurrence of a given outcome (event) P(A) = lim number of occurrences of A in n measurements n→∞ n • consistent with the probability axioms • usual interpretation in standard textbooks • appropriate to particle physics (many repeatable events) • more problematic for unique phenomena • big-bang • rain tomorrow

  16. Probability Bayesian (subjective) Element of S: hypotheses or propositions (true or false) P(A) = degree of belief that hypothesis A is true Hypothesys: a measurement will yield a given outcome a certain fraction of the time subjective probabilities include the frequentist interpretation m1≤ me ≤ m2 Bayesian interpretation! P=95% Bayesian statistics: interpretation of Bayes theorem

  17. Probability A: a given theory is correct; likelihood B: data will yield a particular result; P(theory|data) = P(data|theory) P(theory) P(data) apriori posteriori

  18. Distributions f(x) prob. density function x: random continuos variable probability to observe x in the interval [x, x+ dx] = f(x)dx cumulative distribution function

  19. P(A∩B) = prob. of x in [x, x + dx] and y in [y, y + dy] = Distributions joint p.d.f f(x,y)

  20. Distributions

  21. Distributions expectation value population variance covariance correlation coeficient

  22. Distributions

  23. binomial • process with a given number of identical trials (N) with two possible outcomes : success (p), failure (1-p) • what is the probability of n success? ( N-n failures) probability for a particular sequence: order does not matter: number of sequences probability, not prob. density

  24. binomial

  25. binomial

  26. binomial C1 C2 C3 C4 C5 Individual efficiency: 0.95 track: at least 3 points 3 chambers: f(3;3,0.95) = 0.953 = 0.857 f(3;4,0.95) + f(4;4,0.95) = 0.171 + 0.815 = 0.986 4 chambers: f(3;5,0.95) + f(4;5,0.95) + f(5;5,0.95) = 5 chambers: 0.021 + 0.204 + 0.774 = 0.999

  27. Poisson binomial: N large, p very small, Np→ν particular events, but no idea of number of trials sharp events occurring in a continuum Geiger counter near a radioactive source; Number of flashes of lightning in a storm;

  28. Poisson Proof: ν events in some interval split interval in N sections prob. that a given section contains an event prob. of n events in N sections N→∞ with n finite

  29. Poisson

  30. number of deaths one corps X year Poisson Fatal horse kicks: number of Prussian soldiers kicked to death by horses. In ten different army corps, over 20 years, there were 122 deaths: = = 0.610 no deaths: P(0, 0.61) = 0.5434 number of (corpsXyears) with no deaths: 200X0.5434 = 108.7 one death: P(1, 0.61) = 0.3315 number of (corpsXyears) with one death: 200X0.3515 = 66.3 deaths actual number Poisson corpsXyear 0 109 108.7 1 65 66.3 2 22 20.2 3 3 4.1 4 1 0.6

  31. Gaussian standard gaussian: evaluated numerically cumulative

  32. Gaussian

  33. Gaussian

  34. Gaussian

  35. Gaussian in N dimensions: column vectors V: symmetric NXN matrix in 2 dimensions:

  36. Gaussian

  37. Central limit theorem the sum of N independent continous random variables xiwith means µiand variances σi (N→∞) becomes a Gaussian random variable with regardless of the form of the individual p.d.f. of the xi formal justification for treating measurement errors as Gaussian random variables: total error: sum of a large number of small contributions

  38. Central limit theorem Actually used: algorithm R632 Cern library

More Related