Interactive learning modules and videos of abridged lectures can be accessed by smart phone as well. http://sph.bu.edu/otlt/lamorte/EP713/. Rounding errors. When calculating ratios, don’t round off the numerator and denominator too much; it can cause the result to vary quite a bit.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Interactive learning modules and videos of abridged lectures can be accessed by smart phone as well.
http://sph.bu.edu/otlt/lamorte/EP713/
When calculating ratios, don’t round off the numerator and denominator too much; it can cause the result to vary quite a bit.
Keep 34 decimal places in the numerator & denominator. Round off the final result.
0.1199 rounded to
0.1449
0.1 = 1.0
0.1
0.1199 = 0.827 = 0.83
0.1449
Using a reference group for comparison:
You can do this for both cohort studies and casecontrol studies.
100,000 PYrs
(incidence)
wgt kg
hgt m2
# MIs
(nonfatal)
Personyears
of observation
Relative
Risk
BMI:
41
177,356
23.1
1.0
<21
57
194,243
29.3
1.3
2123
56
155,717
36.0
1.6
2325
67
148,541
45.1
2.0
2529
85
99,573
85.4
3.7
>29
Using a Reference Group in a CaseControl Study
Cancer
Cancer
Case Control
Case Control
some
complete
Tooth
Loss
Tooth
Loss
none
none
Testing Whether a Factor Is Associated With Disease
Target
Population
Inference
Sample
Study
population
Is there an association?
When selection of one subject doesn’t affect the chance of other subjects being chosen, i.e. selection is be independent (uninfluenced by other factors). To achieve this we should select by a process such that selection is independent of other factors… an independent selection process.
In reality, we usually take convenience samples… we take the subjects that are available. Example: BWHS recruiting via subscribers to a magazine.
Is H1N1 vaccine effective?
Suppose you were reading two studies: one that addressed the first question and another that addressed the second question… what would be the study design of each?
What was being estimated in each?
Frequencies
Comparing two or more groups
Do smokers weigh less than nonsmokers?
[measurement
difference]
Increased risk of wound infection?
[relative risk]
Estimates in one group.
Casefatality rate from bird flu (% of cases that die)?
[proportion]
Mean body weight of a population?
[measurement]
1
2+
Are the groups different?
How precise is measure of association?
How Precise?
Frequencies
Comparing two or more groups
Do smokers weigh less than nonsmokers?
[measurement
difference]
Increased risk of wound infection?
[relative risk]
Estimates in one group.
Casefatality rate from bird flu (% of cases that die)?
[proportion]
Mean body weight of a population?
[measurement]
1
2+
Are the groups different?
How precise is measure of association?
A sample provides an estimate of population mean weight.
Distribution of Body Weights in a PopulationTotal population
(True Mean)
Number of people with each body weight
100 120 140 160 180 200 220 240 260 280 300 320
Body Weight
Estimates may vary from sample to sample.
The estimates will be more precise with larger samples, i.e., more consistent, and they are more likely to be close to the true mean.
Confidence Interval for a Mean
The estimated mean was 202. With 95% confidence, the true mean lies in the range of 184220.
184 202 220
The range within which the true value lies with a specified degree of confidence.
Frequencies
Comparing two or more groups
Do smokers weigh less than nonsmokers?
[measurement
difference]
Increased risk of wound infection?
[relative risk]
Estimates in one group.
Casefatality rate from bird flu (% of cases that die)?
[proportion]
Mean body weight of a population?
[measurement]
1
2+
Are the groups different?
How precise is measure of association?
What is the casefatality rate?
No comparisons; just a single estimate in a group of cases.
What type of measure of disease frequency are we estimating?
See Epi_Tools.XLS : Worksheet “CI – One Group”
Interpretation: The estimated casefatality rate was 50% (4/8). With 95% confidence the true CFR lies in the range 21.578.5%.
95% Confidence Interval For A Proportion
The range within which the true value lies, with 95% confidence.
95% Confidence Interval
0.22
0.78
0.0
1.0
4/8 = 0.5 = point estimate
It is very unlikely (less than 5% probability) that the true value is outside this range.
… the narrower the confidence interval (and the more precise the estimate).
N
N
N
N
N
Precision Depends on Sample Size
Weymouth, MA conducted an extensive health survey several years ago. One question asked, “Have you experienced violence because you or someone else was drinking alcohol?”
What type of measure of disease frequency are we estimating?
Comparing Two or More Groups
Frequencies
Estimates in one group.
Mean body weight of a population?
[measurement]
Casefatality rate from bird flu (% of cases that die)?
[proportion]
1
2+
Comparing two or more groups
Do smokers weigh less than nonsmokers?
[measurement
difference]
Increased risk of wound infection?
[relative risk]
Are the groups different?
How precise is measure of association?
Null Hypothesis (H0): the groups are not different
Alternative Hypothesis (HA): the groups are different
HA: Mean Wgt. Smokers = Mean Wgt. Nonsmokers
True population
distribution
Note that, here, smokers & nonsmokers have the same distribution of body weights and the mean weights are the same.
Random
Samples
100 120 140 160 180 200 220 240 260 280 300 320
Random sample of nonsmokers
Random sample of smokers
The sample means may differ just by chance (random error).
True means
Population
of non=smokers
Population
of smokers
100 120 140 160 180 200 220 240 260 280 300 320
sample of smokers
sample of nonsmokers
Even if they are really different, there may be some overlap.
205 lbs.
Smokers Nonsmokers
Statistical TestingHow do we decide if they are really different or the differences were just due to chance?
If the null hypothesis were true, what would be the probability of getting a difference this great (or greater) just by chance (random error)?
205 lbs.
Smokers Nonsmokers
P– Values“p” is for probability.
Definition of a pvalue:
The probability of seeing a difference this big or bigger, if the groups were not different, i.e. just by chance.
Pvalues are generated by performing a statistical test. The particular statistical test used depends on study type, type of measurement, etc.
pvalueProbability of occurring by chance
.55 55%
.25 25%
.12 12%
.05 5%
.01 1 in 100 (1% )
The differences may
be due to chance.
p < or = .05
Chance is an
unlikely explanation
If “p” < 0.05 it is unlikely that the difference is due to chance.
If so, we reject the null hypothesis
in favor of the alternative hypothesis,
i.e. the groups probably are different.
If we wanted to compare two groups on measurement data (e.g. Is the mean weight of smokers less than that of nonsmokers?), we might use a statistical test known as a “ttest” to determine whether there was a “statistically significant” difference.
Epi_Tools.XLS Spreadsheet Describes TTests
With Categorical Data
Frequencies
Estimates in one group.
Mean body weight of a population?
[measurement]
Casefatality rate from bird flu (% of cases that die)?
[proportion]
1
2+
Comparing two or more groups
Do smokers weigh less than nonsmokers?
[measurement]
Increased risk of wound infection?
[relative risk]
Are the groups different?
How precise is measure of association?
Is the risk of wound infection really different, or were the observed differences just due to “the luck of the draw”?
Wound Infection
Cumulative
Incidence
Yes No
5.3%
Yes
7 124 131
(7/131)
Had Incidental
Appendectomy
1.3%
No
1 78 79
(1/79)
8 202 210 Subjects
Do incidental appendectomy?
RR = 7/131 = 5.3 = 4.2
1/79 1.3
What would the null hypothesis be for the study on wound infections after incidental appendectomy?
How confident are we that the groups are different?
How accurate is this estimate?
The “null” value; i.e., no difference.
Estimated RR = 4.2
0 1.0 10
Possible values of RR or OR
Null Hypothesis (H0): the groups are not different
Alternative Hypothesis (HA): the groups are different
5.3%
No appy. Appendectomy
Statistical TestingHow do we decide if they are really different or the differences were just due to chance?
If the null hypothesis were true, what would be the probability of getting a difference this great (or greater) just by chance (random error)?
(62% of 8)
126
3
(38% of 8)
76
If the Null Hypothesis Were True, What Would You Have Expected?
Expected if no association (Null Hypothesis)
Observed Data:
Infection
No Inf.
Infection
No Inf.
Appy
Appy
7
124
131 (62%)
No
Appy
No
Appy
1
78
79 (38%)
8
202
210 total
(62% of 8)
126
3
(38% of 8)
76
If the Null Hypothesis Were True, What Would You Have Expected?
Expected if no association:
Observed Data:
Infection
No Inf.
Infection
No Inf.
Appy
Appy
7
124
131 (62%)
No
Appy
No
Appy
1
78
79 (38%)
8
202
210 total
Expected with
null hypothesis
c2 = S (OE)2
E
= 2.24
If the null hypothesis were true, there would be a 13% probability of obtaining a difference this big (or bigger) just by chance (“luck of the draw”).
p = 0.13
(“pvalue”)
Data:
PersonYears
of Observation
Death
Yes
No
+
30

699

36

1,399
66
2,098 total
Chi Squared Test with PersonYears of Observation
c2 = S (OE)2
E
Yes
No

+
30



36

c2 = 82 + 82 =2.91 + 1.45 = 4.36
22 44
(p < 0.05)
Chi Squared Test with PersonYears of Observation
Data Expected
by Chance:
Observed
Data:
33% of 66 = 22
Death
Yes
No
Expected with
null hypothesis
22
+
699 (33%)

44
1,399 (67%)
67% of 66 = 44
66
2,098 total
c2 = S (OE)2
E
E
= 2.24
p = 0.13
Caution: c2 Test Exaggerates Significance with Small Samples
Expected with null hypothesis:
Infection
No Inf.
Appy
5
126
No
Appy
3
76
Fisher’s Exact Test (for small samples, i.e. if expected number in any cell is <5):
Here, p = 0.26 with Fisher’s.
Fisher’s requires an iterative calculation that is tough for a spreadsheet. See http://www.langsrud.com/fisher.htm for a 2x2 table that performs Fisher’s Exact Test.
There is a tendency for pvalues to devolve into a conclusion of "significant" or not based on the pvalue.
If an effect is small and clinically unimportant, the pvalue can be "significant" if the sample size is large. Conversely, an effect can be large, but fail to meet the p<0.05 criterion if the sample size is small.
When many possible associations are examined using a criterion of p< 0.05, the probability of finding at least one that meets the “critical point” increases in proportion to the number of associations that are tested.
The pvalue does not represent the probability that the null hypothesis is true. The pvalue is the probability that the data could deviate from the null hypothesis as much as they did or more.
Statistical significance does not take into account the evaluation of bias and confounding.
How precise is my estimate of the risk ratio?
Wound Infection
Cumulative
Incidence
Yes No
5.3%
Yes
7 124 131
(7/131)
Had Incidental
Appendectomy
1.3%
No
1 78 79
(1/79)
8 202 210 Subjects
Do incidental appendectomy?
RR = 7/131 = 5.3 = 4.2
1/79 1.3
RR or OR
95% CI for a Proportion (one group)
95% CI for Risk Ratioor an Odds Ratio
For RR or OR:
For A Proportion:
95% CI =
95% CI =
RR exp(1 +1.96 )
p + 1.96
p(1  p)
n
c
0.6 4.2 28
.6 .7 .8
Estimated
relative risk
or odds ratio
Estimated
proportion
or frequency
95% CI for a Risk Ratio or an Odds Ratio
The range in which the true RR or OR lies, with 95% confidence.
We’re 95% confident that the true value is in this range, so there is <5% probability that the true value is outside that range.
Here, the null value is outside the interval, so there is <5% chance that the null is the true value. The null is excluded, so p<0.05.
Estimated RR
The true RR
probably isn’t 1.
(probability <5%)
Relative Risk
0 1.0 10
Protective effect No Difference Increased risk
95% CI for RR or OR (continued)
Estimated RR
The true RR
could be 1.
Here, the null value is inside the interval, so there is >5% chance that the null is the true value. The null could be true, so p>0.05.
Relative Risk
0 1.0 10
Protective effect No Difference Increased risk
The 95% confidence interval for relative risk gives the range of RR values that are compatible with your sample.
If the 95% CI includes the null value, then the findings are compatible with RR=1.0, so there is no significant difference).
But, if the 95% CI does NOT include the null value, RR=1.0 is unlikely (p<0.05).
Relative Risk
0 1.0 10
(protective effect) No Difference (increased risk)
The 95% Confidence Interval for a RR or OR Tells You:
RR= 1.0 means
no difference, i.e. no association.
Significant protective effect (p<.05)
Significant increase in risk (p<.05)
No significant difference (p>.05)
A 95% Confidence Interval Puts
Nonsignificant Results in Perspective
Bothof these results are nonsignificant.
Do you view them differently?
Relative Risk
0 1.0 10
Having a larger sample size does not ensure that you will get a “statistically significant” difference.
Use the Cohort Study sheet in “EpiTools” to calculate c2, the pvalue, and the 95% confidence interval.
Wound Infection
Cumulative
Incidence
Yes No
5.3%
Yes
7 124 131
(7/131)
Had Incidental
Appendectomy
1.3%
No
1 78 79
(1/79)
8 202 210 Subjects
Do incidental appendectomy?
RR = 7/131 = 5.3 = 4.2
1/79 1.3
.53 – 36.5
95% Confidence Interval for Estimated Risk Ratio in the Appendectomy Study
Estimated RR= 4.2
Relative Risk
1.0 10
(protective effect) (increased risk)
You can use the Sample Size sheet (Part II) in “Epi_Tools.XLS” to calculate how many subjects would have been needed, assuming equal numbers in each group.
It’s possible that incidental appendectomy really does carry an increased risk of wound infection, but the sample size was too small to demonstrate a significant difference.
If the true difference were really 5.3% vs. 1.3% how many subjects would it have taken to demonstrate a significant difference (with 90% power)?
What if the true difference were really 10% vs. 1.3% how many subjects would it have taken to demonstrate a significant difference (with 90% power)?
Among the first 10 people who developed bird flu 2 died ( casefatality rate= 20%).
Among 210 people who developed bird flu to date 122 died (58% casefatality rate).
How frequently is bird flu fatal?
How precise are these estimates?
An association between DES exposure and breast cancer?
Results: RR=1.4 pvalue = 0.006
the differences were due to chance.
If “p” < or = 0.05 it is unlikely that the difference is due to chance.
Are Glucosamine + Chondroitin Associated with Pain Relief?
Results: RR=1.1 pvalue = 0.10
If “p” < or = 0.05 it is unlikely that the difference is due to chance.
A study compared 35 diabetics to a comparable group of 35 nondiabetics and found that the diabetics had a higher incidence of femoral arterial occlusion.
The relative risk was 7.0, and the 95% CI was 1.2 to 14.3.
How would you interpret these findings?
Suppose you did a larger study and found a RR = 3.0
and a 95% confidence interval from 2.0 4.0.
How would you interpret these findings?
A study compared 35 diabetics to a comparable group of 35 nondiabetics and found that the diabetics had a higher incidence of femoral arterial occlusion. The relative risk was 7.0, and the 95% CI was 1.2 to 14.3.
A) The association is statistically significant because p <0.05.
B) The association is statistically significant becausethe 95% confidence interval does not contain the “null value.”
C) A 95% confidence interval for an OR is the range within which the true OR lies with 95% confidence.
D) All of the above.
For Lenny's bakery: OR=3.46
P= 0.0033; 95% CI = (1.498.06)
A) The association is statistically significant because p <0.05.
B) The association is not statistically significant because p <0.05 & the null is excluded from the 95% confidence interval.
C) The association is not statistically significant because the 95% confidence interval contains the estimated OR.
26. Please list the names and locations of the restaurants, delis, or cafeterias where you
[your child] ate lettuce during those 7 days.
Name: _________________ Location: _____________ Date: _______
Name: _________________ Location: _____________ Date: _______
Name: _________________ Location: _____________ Date: _______
27. Did you [your child] eat sprouts, such as alfalfa or bean sprouts, in a restaurant, deli
or cafeteria? These may have been eaten as part of a salad or as part of other food
items such as sandwiches, tacos, etc.
Yes No Don't know
If yes: Did you [your child] eat alfalfa sprouts? (These are the threadlike sprouts
with green tops the size of the head of a pin.)
Yes No Don't know
28. Please list the names of the restaurants, delis, cafeterias, etc. where you [your child]
ate alfalfa sprouts in those 7 days?
Name: _________________ Location: _____________ Date: _______
Name: _________________ Location: _____________ Date: _______
Name: _________________ Location: _____________ Date: _______
Example:
The Excel file “Hepatitis Outbreak” illustrates the use of Excel to record & store data from an outbreak investigation.
External Validity
Causality