Basic statistics: a survival guide

Basic statistics: a survival guide Tom Sensky

HOW TO USE THIS POWERPOINT PRESENTATION • The presentation covers the basic statistics you need to have some understanding of. • After the introductory slides, you’ll find two slides listing topics. • When you view the presentation in ‘Slide show’ mode, clicking on any topic in these lists gets you to slides covering that topic. • Clicking on the symbol (in the top right corner of each slide – still in ‘slide show’ mode) gets you back to the list of topics.

HOW TO USE THIS POWERPOINT PRESENTATION • You can either go through the slide show sequentially from the start (some topics follow on from those before) or review specific topics when you encounter them in your reading. • A number of the examples in the presentation are taken from PDQ Statistics, which is one of three basic books I would recommend (see next page).

RECOMMENDED RESOURCES • The books below explain statistics simply, without excessive mathematical or logical language, and are available as inexpensive paperbacks. • Geoffrey Norman and David Steiner. PDQ1 Statistics. 3rd Edition. BC Decker, 2003 • David Bowers, Allan House, David Owens. Understanding Clinical Papers (2nd Edition). Wiley, 2006 • Douglas Altman et al. Statistics with Confidence. 2nd Edition. BMJ Books, 2000 1 PDQ stands for ‘Pretty Darn Quick’ – a series of publications

AIM OF THIS PRESENTATION • The main aim has been to present the information in such a way as to allow you to understand the statistics involved rather than having to rely on rote learning. • Thus formulae have been kept to a minimum – they are included where they help to explain the statistical test, and (very occasionally) for convenience. • You may have to go through parts of the presentation several times in order to understand some of the points

BASIC STATISTICS Types of data ANOVA Normal distribution Repeated measures ANOVA Describing data Non-parametric tests Mann-Whitney U test Boxplots Summary of common tests Standard deviations Summaries of proportions Skewed distributions Odds and Odds Ratio Parametric vs Non-parametric Sample size Absolute and Relative Risks Statistical errors Number Needed to Treat (NNT) Power calculations Confidence intervals (CIs) Clinical vs statistical significance CI (diff between two proportions) Two-sample t test Correlation Problem of multiple tests Regression Subgroup analyses Logistic regression Paired t test Mortality statistics Chi-square test Survival analysis

TYPES OF DATA VARIABLES QUANTITATIVE QUALITATIVE RATIO Pulse rate Height INTERVAL 36o-38oC ORDINAL Social class NOMINAL Gender Ethnicity

NORMAL DISTRIBUTION THE EXTENT OF THE ‘SPREAD’ OF DATA AROUND THE MEAN – MEASURED BY THE STANDARD DEVIATION MEAN CASES DISTRIBUTED SYMMETRICALLY ABOUT THE MEAN AREA BEYOND TWO STANDARD DEVIATIONS ABOVE THE MEAN

DESCRIBING DATA • In a normal distribution, mean and median are the same • If median and mean are different, indicates that the data are not normally distributed • The mode is of little if any practical use

BOXPLOT (BOX AND WHISKER PLOT) 97.5th Centile 75th Centile MEDIAN (50th centile) 25th Centile 2.5th Centile Inter-quartile range

STANDARD DEVIATION – MEASURE OF THE SPREAD OF VALUES OF A SAMPLE AROUND THE MEAN THE SQUARE OF THE SD IS KNOWN AS THE VARIANCE • SD decreases as a function of: • smaller spread of values about the mean • larger number of values IN A NORMAL DISTRIBUTION, 95% OF THE VALUES WILL LIE WITHIN 2 SDs OF THE MEAN

STANDARD DEVIATION AND SAMPLE SIZE As sample size increases, so SD decreases n=150 n=50 n=10

SKEWED DISTRIBUTION MEAN MEDIAN – 50% OF VALUES WILL LIE ON EITHER SIDE OF THE MEDIAN

DOES A VARIABLE FOLLOW A NORMAL DISTRIBUTION? • Important because parametric statistics assume normal distributions • Statistics packages can test normality • Distribution unlikely to be normal if: • Mean is very different from the median • Two SDs below the mean give an impossible answer (eg height <0 cm)

DISTRIBUTIONS: EXAMPLES

DISTRIBUTIONS AND STATISTICAL TESTS • Many common statistical tests rely on the variables being tested having a normal distribution • These are known as parametric tests • Where parametric tests cannot be used, other, non-parametric tests are applied which do not require normally distributed variables • Sometimes, a skewed distribution can be made sufficiently normal to apply parametric statistics by transforming the variable (by taking its square root, squaring it, taking its log, etc)

EXAMPLE: IQ Say that you have tested a sample of people on a validated IQ test The IQ test has been carefully standardized on a large sample to have a mean of 100 and an SD of 15 94 97 100 103 106

EXAMPLE: IQ Say you now administer the test to repeated samples of 25 people Expected random variation of these means equals the Standard Error 94 97 100 103 106

STANDARD DEVIATION vs STADARD ERROR • Standard Deviation is a measure of variability of scores in a particular sample • Standard Error of the Mean is an estimate of the variability of estimated population means taken from repeated samples of that population(in other words, it gives an estimate of the precision of the sample mean) See Douglas G. Altman and J. Martin Bland. Standard deviations and standard errors. BMJ 331 (7521):903, 2005.

EXAMPLE: IQ One sample of 25 people yields a mean IQ score of 107.5 What are the chances of obtaining an IQ of 107.5 or more in a sample of 25 people from the same population as that on which the test was standardized? 94 97 100 103 106

EXAMPLE: IQ How far out the sample IQ is in the population distribution is calculated as the area under the curve to the right of the sample mean: This ratio tells us how far out on the standard distribution we are – the higher the number, the further we are from the population mean 94 97 100 103 106

EXAMPLE: IQ Look up this figure (2.5) in a table of values of the normal distribution From the table, the area in the tail to the right of our sample mean is 0.006 (approximately 1 in 160) This means that there is a 1 in 160 chance that our sample mean came from the same population as the IQ test was standardized on 94 97 100 103 106

EXAMPLE: IQ This is commonly referred to as p=0.006 By convention, we accept as significantly different a sample mean which has a 1 in 20 chance (or less) of coming from the population in which the test was standardized (commonly referred to as p=0.05) Thus our sample had a significantly greater IQ than the reference population (p<0.05) 94 97 100 103 106

EXAMPLE: IQ If we move the sample mean (green) closer to the population mean (red), the area of the distribution to the right of the sample mean increases Even by inspection, the sample is more likely than our previous one to come from the original population 94 97 100 103 106

COMPARING TWO SAMPLES In this case, there is very little overlap between the two distributions, so they are likely to be different SAMPLE A MEAN SAMPLE B MEAN SAMPLE A SAMPLE B

COMPARING TWO SAMPLES Returning to the IQ example, let’s say that we know that the sample we tested (IQ=107.5) actually came from a population with a mean IQ of 110 100 107.5 110

SAMPLES AND POPULATIONS Repeatedly measuring small samples from the same population will give a normal distribution of means The spread of these small sample means about the population mean is given by the Standard Error, SE

COMPARING TWO SAMPLES We start by assuming that our sample came from the original population Our null hypothesis (to be tested) is that IQ=107.5 is not significantly different from IQ=100 100 107.5 110

COMPARING TWO SAMPLES The area under the ‘standard population’ curve to the right of our sample IQ of 107.5 represents the likelihood of observing this sample mean of 107.5 by chance under the null hypothesis ie that the sample is from the ‘standard population’ This is known as the a level and is normally set at 0.05 If the sample comes from the standard population, we expect to find a mean of 107.5 in 1 out of 20 estimates 100 107.5 110

COMPARING TWO SAMPLES It is perhaps easier to conceptualise a by seeing what happens if we move the sample mean Sample mean is closer to the ‘red’ population mean Area under the curve to the right of sample mean(a) is bigger The larger a, the greater the chance that the sample comes from the ‘Red’ population 100 110

COMPARING TWO SAMPLES The a level represents the probability of finding a significant difference between the two means when none exists This is known as a Type I error 100 107.5 110

COMPARING TWO SAMPLES The area under the ‘other population’ curve (blue) to the left of our sample IQ of 107.5 represents the likelihood of observing this sample mean of 107.5 by chance under the alternative hypothesis (that the sample is from the ‘other population’) This is known as the b level and is normally set at 0.20 100 107.5 110

COMPARING TWO SAMPLES The b level represents the probability of not finding a significant difference between the two means when one exists This is known as a Type II error (usually due to inadequate sample size) 100 107.5 110

COMPARING TWO SAMPLES Note that if the population sizes are reduced, the standard error increases, and so does b (hence also the probability of failing to find a significant difference between the two means) This increases the likelihood of a Type II error – inadequate sample size is the most common cause of Type II errors 100 107.5 110

STATISTICAL ERRORS: SUMMARY Remember that power is related to sample size because a larger sample has a smaller SE thus there is less overlap between the curves

SAMPLE SIZE: POWER CALCULATIONS Using the standard a=0.05 and b=0.20, and having estimates for the standard deviation and the difference in sample means, the smallest sample size needed to avoid a Type II error can be calculated with a formula

POWER CALCULATIONS • Intended to estimate sample size required to prevent Type II errors • For simplest study designs, can apply a standard formula • Essential requirements: • A research hypothesis • A measure (or estimate) of variability for the outcome measure • The difference (between intervention and control groups) that would be considered clinically important

STATISTICAL SIGNIFICANCE IS NOT NECESSARILY CLINICAL SIGNIFICANCE

CLINICALLY SIGNIFICANT IMPROVEMENT

MEASURES OF CLINICALLY SIGNIFICANT IMPROVEMENT ABNORMAL POPULATION a DISTRIBUTION OF DYSFUNCTIONAL SAMPLE FIRST POSSIBLE CUT-OFF: OUTSIDE THE RANGE OF THE DYSFUNCTIONAL POPULATION AREA BEYOND TWO STANDARD DEVIATIONS ABOVE THE MEAN

MEASURES OF CLINICALLY SIGNIFICANT IMPROVEMENT SECOND POSSIBLE CUT-OFF: WITHIN THE RANGE OF THE NORMAL POPULATION ABNORMAL POPULATION NORMAL POPULATION b c a THIRD POSSIBLE CUT-OFF: MORE WITHIN THE NORMAL THAN THE ABNORMAL RANGE DISTRIBUTION OF FUNCTIONAL (‘NORMAL’) SAMPLE

UNPAIRED OR INDEPENDENT-SAMPLE t-TEST: PRINCIPLE The two distributions are widely separated so their means clearly different The distributions overlap, so it is unclear whether the samples come from the same population In essence, the t-test gives a measure of the difference between the sample means in relation to the overall spread

UNPAIRED OF INDEPENDENT-SAMPLE t-TEST: PRINCIPLE With smaller sample sizes, SE increases, as does the overlap between the two curves, so value of t decreases

THE PREVIOUS IQ EXAMPLE • In the previous IQ example, we were assessing whether a particular sample was likely to have come from a particular population • If we had two samples (rather than sample plus population), we would compare these two samples using an independent-sample t-test

MULTIPLE TESTS AND TYPE I ERRORS • The risk of observing by chance a difference between two means (even if there isn’t one) is a • This risk is termed a Type I error • By convention, a is set at 0.05 • For an individual test, this becomes the familiar p<0.05 (the probability of finding this difference by chance is <0.05 or less than 1 in 20) • However, as the number of tests rises, the actual probability of finding a difference by chance rises markedly

SUBGROUP ANALYSIS • Papers sometimes report analyses of subgroups of their total dataset • Criteria for subgroup analysis: • Must have large sample • Must have a priori hypothesis • Must adjust for baseline differences between subgroups • Must retest analyses in an independent sample

TORTURED DATA - SIGNS • Did the reported findings result from testing a primary hypothesisof the study? If not, was the secondary hypothesis generatedbefore the data were analyzed? • What was the rationale for excluding various subjects from theanalysis? • Were the following determined before looking at the data: definitionof exposure, definition of an outcome, subgroups to be analyzed,and cutoff points for a positive result? Mills JL. Data torturing. NEJM 329:1196-1199, 1993.

TORTURED DATA - SIGNS • How many statistical tests were performed, and was the effectof multiple comparisons dealt with appropriately? • Are both P values and confidence intervals reported? • And have the data been reported for all subgroups and at allfollow-up points? Mills JL. Data torturing. NEJM 329:1196-1199, 1993.

COMPARING TWO MEANS FROM THE SAME SAMPLE-THE PAIRED t TEST • Assume that A and B represent measures on the same subject (eg at two time points) • Note that the variation between subjects is much wider than that within subjects ie the variance in the columns swamps the variance in the rows • Treating A and B as entirely separate, t=-0.17, p=0.89 • Treating the values as paired, t=3.81, p=0.03

SUMMARY THUS FAR …

Basic statistics: a survival guide