Can I Believe It? Understanding Statistics in Published Literature

Can I Believe It?Understanding Statistics in Published Literature Keira Robinson – MOH Biostatistics Trainee David Schmidt – HETI Rural and Remote Portfolio

Agenda • Welcome • Understanding the context • Data types • Presenting data • Common tests • Tricks and hints • Practice • Wrap up

Understanding statistics • Never consider statistics in isolation • Consider the rest of the article • Who was studied • What was measured • Why was that measure used • Where was the study completed • When was it done • It is the author’s role to convince you that their results can be believed!

Types of Data

Examples of data – Table 1Diamond et al. 2006

Types of data • Numeric • Continuous (height, cholesterol) • Discrete (number of floors in a building) • Categorical • Binary (yes/no, ie born in Australia?) • Categorical (cancer type) • Ordinal categorical (cancer stage)

Histograms • Represents continuous variables • Areas of the bars represent the frequency (count) or percent • Indicates the distribution of the data

Measures of association

Stem and leaf plot- heights 6* 11 6* 2 6* 3333333 6* 44444444444 6* 555555555555 6* 66666666666666666666666 6* 777777777777777777777777777777 6* 8888888888888888 6* 99999999999999999999999999999999 7* 0000000000000000000000000 7* 1111111111111111111 7* 222222222222 7* 333333 7* 44 7* 55

Skewed Data

Salient features- the mean • The average value:

Salient features- the median • The observation in the middle • Example- newborn birth weights • 3100, 3100,3200,3300,3400,3500,3600,3650 g • (3300+3400)/2 = 3350 • Not affected by extreme values • Wastes information

Salient features- the mean and median

Mean and Median • Mean is preferable • Symmetric distributions mean ~ median • Present the Mean • Skewed distributions • Mean is pulled toward the ‘tail’ • Present the Median

Mean and Median

Variability – Standard deviation and variance • The average distance between the observations and the mean • Standard deviation : • with original units , ie. 0.3 % • Variance = • With the original units squared

Range • Example, infant birth weight • 3100, 3100,3200,3300,3400,3500,3600,3650, 3800 • Range = (3100 to 3800) grams or 700 grams • Interquartile range: the range between the first and 3rd quartiles (Q1 and Q3) • 3100, 3100,3200,3300,3400,3500,3600,3650 , 3800 • IQR = (3200 to 3600) grams or 400 grams

Presenting variability • Present standard deviation if the mean is used • Present Interquartile range if the median is used

Graphics for Continuous Variables • Boxplot : outlier Maximum in Q3 75th percentile (Q3) IQR Median Minimum in Q1 25th percentile (Q1)

Categorical Variables- table summaries

Bar charts • Relative frequency for a categorical or discrete variable

Bar chart vs Histogram • Histogram • For continuous variables • The area represents the frequency • Bars join together • Bar chart • For categorical variables • The height represents the frequency • The bars don’t join together

Pie chart • Areas of “slices” represent the frequency

Precision

Presenting statistics • Tables should need no further explanation • Means • No more than one decimal place more than the original data • Standard deviations may need an extra decimal place • Percentages • Not more than one decimal place (sometimes no decimal place) • Sample size <100, decimal places are not necessary • If sample size <20, may need to report actual numbers

Statistical Inference

Sampling Inference Sampling

Sampling, cont’d • A statistic that is used as an estimate of the population parameter. • Example: average parity Population Mean Sample Mean

Confidence intervals • We are confident the true mean lies within a range of values • 95% Confidence Interval: We are 95% confident that the true mean lies within the range of values • If a study is repeated numerous times, we are confident the mean would contain the true mean 95% of the time • How does confidence interval change as the sample size increases?

Confidence intervals cont’d

Hypothesis testing • Is our sample of babies consistent with the Australian population with a known mean birth weight of 3500 grams? • Sample mean = 3800 grams, 95% CI of 3650 to 3950 grams • 3800 lies outside of this confidence interval range, indicating our sample mean is higher than the true Australian population

Hypothesis testing • State a null hypothesis: • There is no difference between the sample mean and the true mean: Ho = 3500 • Calculate a test statistic from the data t = 2.65 • Report the p-value = 0.012

What is a p-value? • The probability of obtaining the data, ie a mean weight of 3800 grams or greater if the null hypothesis is true • The smaller the p-value, the more evidence against the null hypothesis • < 0.0001 to 0.05 – evidence to reject the null hypothesis (statistically significant difference) • > 0.05 – evidence to accept the null hypothesis (not statistically significant)

Summary – Confidence intervals and p values • P –value: Indicates statistical significance • Confidence interval: range of values for which we are 95% certain our true value lies • Recommended to present confidence intervals where possible

Analysing Continuous Outcomes

T tests • What are they used for? • Analyse means • Provide estimate of the difference in means between the two groups and the 95% confidence interval of this difference • P-value – a measure of the evidence against the null hypothesis of no difference between the two groups

T tests- paired vs independent • Paired: • Outcome is measured on the same individual • Eg: before and after, cross-over trial • Pairs may be two different individuals who are matched on factors like age, sex etc.

Paired T-tests • Calculate the difference for each of the pairs • The mean weight at baseline was 93 kg and the mean weight at 3 months was 88 kg. The weight at 3 months was 5 kg less compared to the baseline weight 95% CI (-3, 12)

Paired T-tests • There was no evidence that there was a significant change in weight after 3 months (p value = 0.19) • Assumptions • Bell shaped curve with no outliers • Assess shape by graphing the difference • Use a histogram or stem and leaf plot

Independent T tests • Two groups that are unrelated • Eg: weights of different groups of people

Independent samples t-tests • Same assumption as for paired t tests plus the assumption of independence and equal variance

Interpretation –independent t tests • The mean weight in NW Public was 62 kg and the mean weight in SW Public was 61 kg • The mean difference in weight between the two schools was 1 kg (-22, 24) • There was no evidence of a significant difference in weight between the two schools (p=0.92)

One-way Analysis of Variance (ANOVA) • What happens when there are more than two groups to compare? • Null hypothesis: means for all groups are approximately equal • No way to measure the difference in means between more than two groups, so the variance between the groups is analysed • Can measure variance within a group as well as variance between groups

One-way ANOVA • Comparing multiple groups

Interpretations – One-way ANOVA • There was evidence of a difference between the average student weight between the four schools p<0.05 • There was evidence of no difference between the average student weight between the four schools p>0.05 • Not advised to compare all means against each other because there is an increased chance of finding at least 1 result that is significant the more tests that are done

Assumptions ANOVA • Normality, - observations for all groups are normally distributed, • Variance in all groups are equal • Independence – all groups are independent of each other

Extensions of one-way ANOVA • Two way-ANOVA: • Multiple factors to be considered. Eg school and type of school (public/private) • ANCOVA – Analysis of Covariance • Tests group differences while adjusting for a continuous variables (eg. age) and categorical variables

Linear Regression • Measures the association between two continuous variables (weight and height) • Or one continuous variable and several continuous variables (mutliple linear regression) • What is the relationship between height and weight?

Scatter plot of weight and height • Correlation between height and weight = 0.75

Can I Believe It? Understanding Statistics in Published Literature

Can I Believe It? Understanding Statistics in Published Literature

Presentation Transcript

ENTC 3030

Understanding Statistics

Recent FBI Statistics on Terrorism

Understanding Literature through Theater

The Winter Institute on Statistical Literacy for Librarians

Literature review

Research using published sources

Colonial literature

Pre-AP Literature Circles 2014

Unit Objectives

Octavio Paz

American Colonial Literature

Using the Chemical Education Literature … And Beyond

Preparatory Statistics

Dramatic Literature

Methodologies

Learn Language Through Literature

Finding Government Gray Literature

Meta-analysis of Hazard Ratios

FINAL PROJECT (CE3216)