can i believe it understanding statistics in published literature
Download
Skip this Video
Download Presentation
Can I Believe It? Understanding Statistics in Published Literature

Loading in 2 Seconds...

play fullscreen
1 / 85

Can I Believe It? Understanding Statistics in Published Literature - PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on

Can I Believe It? Understanding Statistics in Published Literature. Keira Robinson – MOH Biostatistics Trainee David Schmidt – HETI Rural and Remote Portfolio. Agenda. Welcome Understanding the context Data types Presenting data Common tests Tricks and hints Practice Wrap up.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Can I Believe It? Understanding Statistics in Published Literature' - adele


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
can i believe it understanding statistics in published literature

Can I Believe It?Understanding Statistics in Published Literature

Keira Robinson – MOH Biostatistics Trainee

David Schmidt – HETI Rural and Remote Portfolio

agenda
Agenda
  • Welcome
  • Understanding the context
  • Data types
  • Presenting data
  • Common tests
  • Tricks and hints
  • Practice
  • Wrap up
understanding statistics
Understanding statistics
  • Never consider statistics in isolation
  • Consider the rest of the article
    • Who was studied
    • What was measured
    • Why was that measure used
    • Where was the study completed
    • When was it done
  • It is the author’s role to convince you that their results can be believed!
types of data1
Types of data
  • Numeric
    • Continuous (height, cholesterol)
    • Discrete (number of floors in a building)
  • Categorical
    • Binary (yes/no, ie born in Australia?)
    • Categorical (cancer type)
    • Ordinal categorical (cancer stage)
histograms
Histograms
  • Represents continuous variables
      • Areas of the bars represent the frequency (count) or percent
  • Indicates the distribution of the data
stem and leaf plot heights
Stem and leaf plot- heights

6* 11

6* 2

6* 3333333

6* 44444444444

6* 555555555555

6* 66666666666666666666666

6* 777777777777777777777777777777

6* 8888888888888888

6* 99999999999999999999999999999999

7* 0000000000000000000000000

7* 1111111111111111111

7* 222222222222

7* 333333

7* 44

7* 55

salient features the mean
Salient features- the mean
  • The average value:
salient features the median
Salient features- the median
  • The observation in the middle
    • Example- newborn birth weights
    • 3100, 3100,3200,3300,3400,3500,3600,3650 g
          • (3300+3400)/2 = 3350
    • Not affected by extreme values
    • Wastes information
mean and median
Mean and Median
  • Mean is preferable
  • Symmetric distributions mean ~ median
      • Present the Mean
  • Skewed distributions
    • Mean is pulled toward the ‘tail’
      • Present the Median
variability standard deviation and variance
Variability – Standard deviation and variance
  • The average distance between the observations and the mean
  • Standard deviation :
  • with original units , ie. 0.3 %
  • Variance =
    • With the original units squared
range
Range
  • Example, infant birth weight
  • 3100, 3100,3200,3300,3400,3500,3600,3650, 3800
    • Range = (3100 to 3800) grams or 700 grams
  • Interquartile range: the range between the first and 3rd quartiles (Q1 and Q3)
    • 3100, 3100,3200,3300,3400,3500,3600,3650 , 3800
      • IQR = (3200 to 3600) grams or 400 grams
presenting variability
Presenting variability
  • Present standard deviation if the mean is used
  • Present Interquartile range if the median is used
graphics for continuous variables
Graphics for Continuous Variables
  • Boxplot :

outlier

Maximum in

Q3

75th percentile (Q3)

IQR

Median

Minimum in Q1

25th percentile (Q1)

bar charts
Bar charts
  • Relative frequency for a categorical or discrete variable
bar chart vs histogram
Bar chart vs Histogram
  • Histogram
    • For continuous variables
    • The area represents the frequency
    • Bars join together
  • Bar chart
    • For categorical variables
    • The height represents the frequency
    • The bars don’t join together
pie chart
Pie chart
  • Areas of “slices” represent the frequency
presenting statistics
Presenting statistics
  • Tables should need no further explanation
  • Means
    • No more than one decimal place more than the original data
    • Standard deviations may need an extra decimal place
  • Percentages
    • Not more than one decimal place (sometimes no decimal place)
    • Sample size <100, decimal places are not necessary
    • If sample size <20, may need to report actual numbers
sampling
Sampling

Inference

Sampling

sampling cont d
Sampling, cont’d
  • A statistic that is used as an estimate of the population parameter.
  • Example: average parity

Population

Mean

Sample

Mean

confidence intervals
Confidence intervals
  • We are confident the true mean lies within a range of values
  • 95% Confidence Interval: We are 95% confident that the true mean lies within the range of values
  • If a study is repeated numerous times, we are confident the mean would contain the true mean 95% of the time
  • How does confidence interval change as the sample size increases?
hypothesis testing
Hypothesis testing
  • Is our sample of babies consistent with the Australian population with a known mean birth weight of 3500 grams?
  • Sample mean = 3800 grams, 95% CI of 3650 to 3950 grams
  • 3800 lies outside of this confidence interval range, indicating our sample mean is higher than the true Australian population
hypothesis testing1
Hypothesis testing
  • State a null hypothesis:
    • There is no difference between the sample mean and the true mean: Ho = 3500
    • Calculate a test statistic from the data t = 2.65
    • Report the p-value = 0.012
what is a p value
What is a p-value?
  • The probability of obtaining the data, ie a mean weight of 3800 grams or greater if the null hypothesis is true
  • The smaller the p-value, the more evidence against the null hypothesis
    • < 0.0001 to 0.05 – evidence to reject the null hypothesis (statistically significant difference)
    • > 0.05 – evidence to accept the null hypothesis (not statistically significant)
summary confidence intervals and p values
Summary – Confidence intervals and p values
  • P –value: Indicates statistical significance
  • Confidence interval: range of values for which we are 95% certain our true value lies
  • Recommended to present confidence intervals where possible
t tests
T tests
  • What are they used for?
    • Analyse means
    • Provide estimate of the difference in means between the two groups and the 95% confidence interval of this difference
    • P-value – a measure of the evidence against the null hypothesis of no difference between the two groups
t tests paired vs independent
T tests- paired vs independent
  • Paired:
    • Outcome is measured on the same individual
      • Eg: before and after, cross-over trial
      • Pairs may be two different individuals who are matched on factors like age, sex etc.
paired t tests
Paired T-tests
  • Calculate the difference for each of the pairs
  • The mean weight at baseline was 93 kg and the mean weight at 3 months was 88 kg. The weight at 3 months was 5 kg less compared to the baseline weight 95% CI (-3, 12)
paired t tests1
Paired T-tests
  • There was no evidence that there was a significant change in weight after 3 months (p value = 0.19)
  • Assumptions
    • Bell shaped curve with no outliers
    • Assess shape by graphing the difference
      • Use a histogram or stem and leaf plot
independent t tests
Independent T tests
  • Two groups that are unrelated
  • Eg: weights of different groups of people
independent samples t tests
Independent samples t-tests
  • Same assumption as for paired t tests plus the assumption of independence and equal variance
interpretation independent t tests
Interpretation –independent t tests
  • The mean weight in NW Public was 62 kg and the mean weight in SW Public was 61 kg
  • The mean difference in weight between the two schools was 1 kg (-22, 24)
  • There was no evidence of a significant difference in weight between the two schools (p=0.92)
one way analysis of variance anova
One-way Analysis of Variance (ANOVA)
  • What happens when there are more than two groups to compare?
  • Null hypothesis: means for all groups are approximately equal
  • No way to measure the difference in means between more than two groups, so the variance between the groups is analysed
  • Can measure variance within a group as well as variance between groups
one way anova
One-way ANOVA
  • Comparing multiple groups
interpretations one way anova
Interpretations – One-way ANOVA
  • There was evidence of a difference between the average student weight between the four schools p<0.05
  • There was evidence of no difference between the average student weight between the four schools p>0.05
  • Not advised to compare all means against each other because there is an increased chance of finding at least 1 result that is significant the more tests that are done
assumptions anova
Assumptions ANOVA
  • Normality, - observations for all groups are normally distributed,
  • Variance in all groups are equal
  • Independence – all groups are independent of each other
extensions of one way anova
Extensions of one-way ANOVA
  • Two way-ANOVA:
    • Multiple factors to be considered. Eg school and type of school (public/private)
  • ANCOVA – Analysis of Covariance
    • Tests group differences while adjusting for a continuous variables (eg. age) and categorical variables
linear regression
Linear Regression
  • Measures the association between two continuous variables (weight and height)
  • Or one continuous variable and several continuous variables (mutliple linear regression)
  • What is the relationship between height and weight?
scatter plot of weight and height
Scatter plot of weight and height
  • Correlation between height and weight = 0.75
scatter plot of body fat and height
Scatter plot of body fat and height
  • Correlation between body fat and height = -0.23
linear regression1
Linear regression
  • Fits a straight line to describe the relationship
  • Assumes
    • Independence for each measure (each person)
    • Linearity (check with scatter plots)
    • Normality (check residuals with a graph)
      • Residuals are the difference between the data point and the regression line
    • Homscedasticity
      • Variability in weight does not change as height changes, ie
multiple linear regression
Multiple Linear Regression
  • Extends the simple linear regression
  • Adjusts for confounding variables
  • Example: Does smoking while pregnant affect infant birth weight?
    • Outcome variable: infant birth weight
    • Exposure variable: maternal smoking
    • Covariates (other variables of interest):
      • Sex of the baby, gestational age
confounding variables
Confounding variables
  • A variable (factor) associated with both the outcome and exposure variables
  • Gestational age is associated with both smoking (exposure) and the outcome (birth weight)
  • Confounders can be assessed by checking the correlation between the variable of interest and the outcome variable
  • Correlation coefficient : -1.0 <r<1.0
  • Rule of thumb: >0.5 or <-0.5 should be considered a confounder
summary for continuous outcomes
Summary for continuous outcomes
  • Comparing means from two group
    • Use t- tests (paired for same person comparison, independent for independent groups comparison)
  • Comparing means for more than two groups
    • One-way ANOVA
  • Comparing means for two or more groups and adjusting for other variables (ANCOVA)
summary for continuous outcomes1
Summary for continuous outcomes
  • Assessing the relationship between two continuous variables
    • Simple linear regression
  • Assessing the relationship between two or more variables
    • Multiple linear regression
chi square tests

Chi-square tests

What can a chi-square test answer?

chi square tests1
Chi-Square tests
  • 2x2 tables:
chi square tests2
Chi-square tests
  • Can be used for paired (same person under two different conditions) or independent samples (unrelated people in different groups)
  • Used often in case-control studies where the outcome is categorical (or dichotomous)
  • Tests no association between row and column factors
    • Smoking and low birth weight association
  • The study design defines the appropriate measure of effect
cohort studies
Cohort studies
  • Exposure is determined by
    • Randomisation to different groups
    • followed over time
  • Outcome is determined at the end of follow up
  • Rate of outcome can be estimated
cohort studies continued
Cohort studies continued
  • Eg. Rate of low birth weight in:
    • Smokers: rate = 25/100 = 0.25 = 25%
    • Non-smokers: = 5/105 = 5%
  • Relative risk (RR) = 25/5=5 times higher risk of low birth rate in smokers relative to non-smokers
  • Risk Difference (RD) = 25-5 = 20
  • No relative difference between the low birth rate in smokers and non-smokers RR =1.0
  • No absolute difference in the low birth rate in smokers and non-smokers = RD
cross sectional studies
Cross-Sectional Studies
  • People observed at one point in time (questionnaire)
  • Exposure and outcome are measured at the same time
  • Causal associations cannot be deduced
  • Rate ratio (RR) = 25/5=5 times higher risk of low birth rate in smokers relative to non-smokers
  • Rate Difference (RD) = 25-5 = 20
  • No relative difference between the low birth rate in smokers and non-smokers RR =1.0
  • No absolute difference in the low birth rate in smokers and non-smokers = RD
case control studies
Case-control studies
  • Use for rare outcomes (example: child prodigies)
  • Children are selected based on being a prodigy
    • Eg. 100 child prodigies and 100 children with normal intelligence
  • Determine exposure retrospectively
  • Cannot obtain a rate
  • Must obtain the odds of the outcome and compare using an odds ratio
case control studies2
Case-control studies
  • Odds of being a prodigy:
    • In exposed: 70/50 = 1.4
    • In unexposed: 0.6
    • Odds ratio:
      • 1.4/0.6 = 2.3
      • 2.3 times more likely to have a child prodigy if maternal fish oil supplements were taken during pregnancy
    • Null hypothesis
      • No association between the exposure and the outcome
      • Odds Ratio = 1
summary of rr and or
Summary of RR and OR
  • Both compare the relative likelihood of an outcome between 2 groups
  • RR=1 or OR = 1
    • Outcome is as likely in the exposed and unexposed groups
  • RR>1 or OR >1
    • The outcome is more likely in the exposed group compared to the unexposed group
    • The exposure is a risk factor
summary of rr and or1
Summary of RR and OR
  • RR<1 or OR<1
    • The outcome is less likely in the exposed group compared to the unexposed group
    • The exposure is protective
  • RR cannot be calculated for a case-control study
  • OR ~ RR when the outcome is rare
extensions of chi square
Extensions of Chi-square
  • Small sample sizes
    • Fisher’s exact test
      • Recommended when n<20 or 20 <n<40 and the smallest expected cell count is <5
  • Paired data
    • Exact binomial test for small sample sizes
    • McNemar’s test
  • Multiple regression:
    • Logistic regression
fact or fiction
Fact or Fiction
  • Vaccines and autism?
  • Cell phones and brain tumours?
common errors
Common errors
  • 60.182 kg or 61kg?
    • Reporting measurements with unnecessary precision
  • Age divided into 20-44 years, 45-59 years, 60-74 years, 75+ years
    • Dividing continuous data without explaining why or how
    • Certain boundaries may be chosen to favour certain results
  • Presenting Means and SD for non-normal data
    • What should be presented instead?
common errors1
Common Errors
  • “The effect of more exercise was significant”
  • “The effect of 40 minutes of exercise per day was statistically significant for decreasing weight (p<0.05)”
  • “40 minutes of exercise per day lowered the mean weight of the group from 95 kg to 89 kg, (95% CI = 75-105 kg, p= 0.03)
  • Checking the distribution of the data to determine the appropriate statistical test
    • Using parametric tests when data is not normal
    • Using tests for independent data when the data is paired
common errors2
Common Errors
  • Using linear regression without confirming linearity
  • Not reporting what happened to all patients
    • Leads to bias of the results
  • Data dredging
    • Multiple statistical comparisons until a significant result is found
  • Not accounting for the denominator or adjusting for baseline
common errors3
Common Errors
  • Selection Bias
    • Sampling from a bag of candy where the larger candies are more likely to be chosen
    • On November 13, 2000, Newsweek published the following poll results:
common errors4
Common Errors
  • Other biases (measurement bias, intervention bias)
  • Using cross sectional studies to infer causality
    • More likely to have a c-section if attending a private hospital instead of a public hospital
practical example
Practical example
  • Working in groups quickly read the article provided
  • Summarise
    • What data they used
    • What test
    • Do you believe their findings?
    • Can you explain why?
summary
Summary
  • Statistics must be understood in the context of the whole article
  • Statistical tests must fit the data type
  • Findings should be presented appropriately
  • Beware flashy stats!
  • It’s the author’s job to justify their choices
  • If you don’t believe it- can you base your practice on it?
ad