1 / 12

Statistical Toolkit

Statistical Toolkit. S.Donadio, B.Mascialino. July 2 nd , 2003. Status of algorithms. Chi2 (binned distributions) Chi2 (curves – sets of points) Kolmogorov-Smirnov-Goodman Kolmogorov-Smirnov Cramer-von Mises (binned) Cramer-von Mises (unbinned) Anderson-Darling (binned)

yanka
Download Presentation

Statistical Toolkit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Toolkit S.Donadio, B.Mascialino July 2nd, 2003

  2. Status of algorithms • Chi2 (binned distributions) • Chi2 (curves – sets of points) • Kolmogorov-Smirnov-Goodman • Kolmogorov-Smirnov • Cramer-von Mises (binned) • Cramer-von Mises (unbinned) • Anderson-Darling (binned) • Anderson-Darling (unbinned) • Kuiper

  3. Status of Quality Checkers • Chi2 • Kolmogorov-Smirnov-Goodman • Kolmogorov-Smirnov • Cramer-von Mises • Anderson-Darling • Kuiper

  4. Last algorithm (to be added still) Lilliefors test is similar to Kolmogorov test, but is based on the null hypothesis that the random continuous variable is distributed as a normal N(m,s2), when m and s2 are unknown. In practice, being the parameters unknown, the researcher must estimate them from the sample itself (x1,x2,...,xn) and in this way it becomes possible to study the standardized sample (z1,z2,...,zn). The test is performed comparing the empirical repartition function F of (z1,z2,...,zn) with the one of the standardized normal distribution F(z): D* = sup |FO(z) - F(z)| 

  5. Lilliefors needs a theoretical function in input DISTRIBUTION 2 DISTRIBUTION 1 TOOLKIT INPUT: BINNED DISTRIBUTIONS UNBINNED DISTRIBUTIONS THEORETICAL DISTRIBUTIONS Test for Normality, … THEORETICAL FUNCTION

  6. New algorithmCramer von Mises Tiku It approximates Cramer von Mises test statistics with a 2. It uses 2 Quality Checker. Tiku M.L. Chi-squared approximation for the distributions of goodness of fit UN2 and WN2. Biometrika, 52, (1965b), 630.

  7. New AlgorithmKolmogorov-Smirnov (binned) It allows the calculation of Kolmogorov-Smirnov test statistics in case of binned distributions. It uses a different quality checker (see Conover (1971), Gibbons and Chakraborti (1992) ). We must find it!

  8. Uncertainties treatment We must decide how to treat errors inside the statistical toolkit. Distributions are entered as a couple of DataPointSets: Data  Weight The handling of Data and Weight in the computation of the test statistics is different in case of distributions, of curves or of sets of points.

  9. An example 2 =  {(y1i – y2i)2 / [(1i)2 + (2i)2]} In the case of two distributions 2 is computed using only “Weights”. In the case of two curves or sets of points, the numerator involves “Data”and the denominator uses “Weights”. THIS COULD BE MISLEADING!

  10. Data Weights Errors • So, in order to have a coherent language for all the algorithms, • we should have: • Data • Weights • Errors • Whenever errors are not necessary for the computation of the • test statistics we could fill them as a null vector.

  11. Selecting data 1 Elimination of data points if n30 CRITERION OF 3-SIGMA: If a point is 3-standard deviation away from the mean of data points, there is about a 0.001 probability of obtaining in a single measurement a value that is that far from the mean. We can choose the elimination of this data point.

  12. Selecting data 2 Elimination of data points if n10 CHAUVENET’S CRITERION: There are n sample observation from a gaussian distribution N(0,1), we should expect n’ to be in error by  or more, where P(z-/ )=P(/)= 1 – n’/n “If n’=0.5 means that even one observation with this amount of error is unlikely. We can discard a data point if we expect less than half an event to be further from the mean than the suspect data point.”

More Related