Practical Applications of Statistical Methods in the Clinical Laboratory Roger L. Bertholf, Ph.D., DABCC Associate Professor of Pathology Director of Clinical Chemistry & Toxicology UF Health Science Center/Jacksonville
“[Statistics are] the only tools by which an opening can be cut through the formidable thicket ofdifficulties that bars the path of those who pursue the Science of Man.” [Sir] Francis Galton (1822-1911)
“There are three kinds of lies: Lies, damned lies, and statistics” Benjamin Disraeli (1804-1881)
What are statistics, and what are they used for? • Descriptive statistics are used to characterize data • Statistical analysis is used to distinguish between random and meaningful variations • In the laboratory, we use statistics to monitor and verify method performance, and interpret the results of clinical laboratory tests
“Do not worry about your difficulties in mathematics, I assure you that mine are greater” Albert Einstein (1879-1955)
“I don't believe in mathematics” Albert Einstein
The Mean (average) The mean is a measure of the centrality of a set of data.
Use of the Geometric mean: The geometric mean is primarily used to average ratios or rates of change.
Example of the use of Harmonic mean: Suppose you spend $6 on pills costing 30 cents per dozen, and $6 on pills costing 20 cents per dozen. What was the average price of the pills you bought?
Example of the use of Harmonic mean: You spent $12 on 50 dozen pills, so the average cost is 12/50=0.24, or 24 cents. This also happens to be the harmonic mean of 20 and 30:
For the data set: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10:
Other measures of centrality • Mode
The Mode The mode is the value that occurs most often
Other measures of centrality • Mode • Midrange
The Midrange The midrange is the mean of the highest and lowest values
Other measures of centrality • Mode • Midrange • Median
The Median The median is the value for which half of the remaining values are above and half are below it. I.e., in an ordered array of 15 values, the 8th value is the median. If the array has 16 values, the median is the mean of the 8th and 9th values.
Example of the use of median vs. mean: Suppose you’re thinking about building a house in a certain neighborhood, and the real estate agent tells you that the average (mean) size house in that area is 2,500 sq. ft. Astutely, you ask “What’s the median size?” The agent replies “1,800 sq. ft.” What does this tell you about the sizes of the houses in the neighborhood?
Measuring variance Two sets of data may have similar means, but otherwise be very dissimilar. For example, males and females have similar baseline LH concentrations, but there is much wider variation in females. How do we express quantitatively the amount of variation in a data set?
The Variance The variance is the mean of the squared differences between individual data points and the mean of the array. Or, after simplifying, the mean of the squares minus the squared mean.
The Variance In what units is the variance? Is that a problem?
The Standard Deviation The standard deviation is the square root of the variance. Standard deviation is not the mean difference between individual data points and the mean of the array.
The Standard Deviation In what units is the standard deviation? Is that a problem?
The Coefficient of Variation* *Sometimes called the Relative Standard Deviation (RSD or %RSD)
Standard Deviation (or Error) of the Mean The standard deviation of an average decreases by the reciprocal of the square root of the number of data points used to calculate the average.
Exercises How many measurements must we average to improve our precision by a factor of 2?
Answer To improve precision by a factor of 2:
Exercises • How many measurements must we average to improve our precision by a factor of 2? • How many to improve our precision by a factor of 10?
Answer To improve precision by a factor of 10:
Exercises • How many measurements must we average to improve our precision by a factor of 2? • How many to improve our precision by a factor of 10? • If an assay has a CV of 7%, and we decide run samples in duplicate and average the measurements, what should the resulting CV be?
Answer Improvement in CV by running duplicates:
Population vs. Sample standard deviation • When we speak of a population, we’re referring to the entire data set, which will have a mean :
Population vs. Sample standard deviation • When we speak of a population, we’re referring to the entire data set, which will have a mean • When we speak of a sample, we’re referring to a subset of the population, customarily designated “x-bar” • Which is used to calculate the standard deviation?
“Sir, I have found you an argument. I am not obliged to find you an understanding.” Samuel Johnson (1709-1784)
Distributions • Definition
Statistical (probability) Distribution • A statistical distribution is a mathematically-derived probability function that can be used to predict the characteristics of certain applicable real populations • Statistical methods based on probability distributions are parametric, since certain assumptions are made about the data
Distributions • Definition • Examples
Binomial distribution The binomial distribution applies to events that have two possible outcomes. The probability of r successes in n attempts, when the probability of success in any individual attempt is p, is given by: