200 likes | 335 Views
This presentation by Stu Nagourney from NJDEP dives into essential statistical concepts such as precision, accuracy, and bias in measurements. It explores the nature of systematic and random errors, highlighting how they affect data collection. Key statistical principles are discussed, including properties of data distributions, normal distribution, and estimation of standard deviation. The presentation emphasizes the importance of confidence intervals, criteria for rejecting observations, and the significance of measurement variability in scientific analysis.
E N D
DATA & STATISTICS 101 Presented by Stu Nagourney NJDEP, OQA
Precision, Accuracy and Bias • Precision: Degree of agreement between a series of measured values under the same conditions • Accuracy: Degree of agreement between the measured and the true value • Bias: Error caused by some aspect of the measurement system
Sources of Error • Systematic Errors: Bias always in the same direction, and constant no matter how many measurements are made • Random Errors: Vary in sign and are unpredictable. Average to 0 if enough measurements are made • Blunders: The occasional mistake that produces erroneous results; can be minimized but never eliminated
Applying Statistics • One cannot sample every entity of an entire system or population. Statistics provides estimates of the behavior of an entire system or population, provided that: • Measurement system is stable • Individual measurements are all independent • Individual measurements are random representatives of the system or population
Distributions • Data generated by a measurement process generally have the following properties: • Results spread symmetrically around a central value • Small deviations from the central value occur more often than large deviations • The frequency distribution of a large amount of data approximates a bell-shaped curve • The mean of even small sets of data represent the overall better than individual values
Issues with Distributions • For large amounts of data, distributions are easy to define. For smaller data sets, it is harder to define a distribution. • Deviations from “normal” distributions: • Outliers that are not representative of the population • Shifts in operational characteristics that skew the distribution • Large point-to-point variations that cause broadening
Estimation of Standard Deviation • The basic parameters that characterize a population are • Mean () • Standard Deviation () • Unless the entire population is examined, and cannot be known. They can only be estimated from a representative sample by • Sample Mean (X) • Estimate of Standard Deviation (s)
Measures of Central Tendency & Variability • Central Tendency: the value about which the individual results tend to “cluster • Mean: X = [X1 + X2 + X3 + … Xn] / n • Median: Middle value of an odd number of results when listed in order • s = [(Xi - X)2 / n-1]1/2
Statistics • If you make several sets of measurements from a normal distribution, you will get different means and standard deviations • Even the best scientist and/or laboratory will have measurement differences when examining the same sample (system) • What needs to be defined is the confidence in measurement data and the significance of any differences
Does a Measured Value Differ from an Expected Value? • Confidence Interval of the Mean (CI) : The probability where a sample mean lies relative to the population mean • CI = X ± (t) (s) / (n)1/2: value of t depends upon level of confidence desired & # of degrees of freedom (n-1)
Criteria for Rejecting an Observation • One can always reject a data point if there is an assignable cause • If not, evaluate using statistical techniques • Common Outlier Tests • Dixon (Q) Test • Grubbs Test • Youdon Test • Student t Test