sph.bu/otlt/lamorte/EP713/

1 / 79

# sph.bu/otlt/lamorte/EP713/ - PowerPoint PPT Presentation

Interactive learning modules and videos of abridged lectures can be accessed by smart phone as well. http://sph.bu.edu/otlt/lamorte/EP713/. Rounding errors. When calculating ratios, don’t round off the numerator and denominator too much; it can cause the result to vary quite a bit.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'sph.bu/otlt/lamorte/EP713/' - halle

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Interactive learning modules and videos of abridged lectures can be accessed by smart phone as well.

http://sph.bu.edu/otlt/lamorte/EP713/

Rounding errors

When calculating ratios, don’t round off the numerator and denominator too much; it can cause the result to vary quite a bit.

Keep 3-4 decimal places in the numerator & denominator. Round off the final result.

0.1199 rounded to

0.1449

0.1 = 1.0

0.1

0.1199 = 0.827 = 0.83

0.1449

Using a reference group for comparison:

You can do this for both cohort studies and case-control studies.

Rate of MI per

100,000 P-Yrs

(incidence)

wgt kg

hgt m2

# MIs

(non-fatal)

Person-years

of observation

Relative

Risk

BMI:

41

177,356

23.1

1.0

<21

57

194,243

29.3

1.3

21-23

56

155,717

36.0

1.6

23-25

67

148,541

45.1

2.0

25-29

85

99,573

85.4

3.7

>29

Using a Reference Group in a Case-Control Study

Cancer

Cancer

Case Control

Case Control

some

complete

Tooth

Loss

Tooth

Loss

none

none

Testing Whether a Factor Is Associated With Disease

Target

Population

Inference

Sample

• Are the results valid?
• Chance (Random Error)
• Bias
• Confounding

Study

population

• Collect data
• Make comparisons

Is there an association?

What is a “random sample”?

When selection of one subject doesn’t affect the chance of other subjects being chosen, i.e. selection is be independent (uninfluenced by other factors). To achieve this we should select by a process such that selection is independent of other factors… an independent selection process.

In reality, we usually take convenience samples… we take the subjects that are available. Example: BWHS recruiting via subscribers to a magazine.

Types of Data
• 1) Discrete Variables: Variables that assume only a finite number of values:
• Dichotomous variables (male/female, alive/died)
• Categorical variables (or nominal variables, e.g., race)
• Ordinal variables: ordered categories (Excellent/Good/Fair/Poor)
• 2) Continuous Variables: Can take on any value within a range of plausible values, e.g., serum cholesterol level, weight & systolic blood pressure. (Also called quantitative or measurement variables.)

Case fatality rate for H1N1?

Is H1N1 vaccine effective?

Suppose you were reading two studies: one that addressed the first question and another that addressed the second question… what would be the study design of each?

What was being estimated in each?

Goals

• One Group:
• Estimating a prevalence, risk, or rate in one group
• (e.g., case-fatality rate with bird flu)
• Estimating mean (e.g. average body weight)
• (How precise are the estimates?)
• Comparing 2 or more groups:
• Comparing risks, rates
• Comparing means (body weigh in smokers versus non-smokers)
• (Are differences just due to sampling error?)
• (If we calculate a measure of association,
• how precise is it?)

Measurements

Frequencies

Comparing two or more groups

Do smokers weigh less than non-smokers?

[measurement

difference]

Increased risk of wound infection?

[relative risk]

Estimates in one group.

Case-fatality rate from bird flu (% of cases that die)?

[proportion]

Mean body weight of a population?

[measurement]

1

2+

Are the groups different?

How precise is measure of association?

Measurements

Frequencies

Comparing two or more groups

Do smokers weigh less than non-smokers?

[measurement

difference]

Increased risk of wound infection?

[relative risk]

Estimates in one group.

Case-fatality rate from bird flu (% of cases that die)?

[proportion]

Mean body weight of a population?

[measurement]

1

2+

Are the groups different?

How precise is measure of association?

True

mean

100 120 140 160 180 200 220 240 260 280 300 320

Estimated Mean = 205

True

mean

100 120 140 160 180 200 220 240 260 280 300 320

Estimated Mean = 140

True

mean

100 120 140 160 180 200 220 240 260 280 300 320

Estimated Mean = 265

A sample provides an estimate of population mean weight.

Distribution of Body Weights in a Population

Total population

(True Mean)

Number of people with each body weight

100 120 140 160 180 200 220 240 260 280 300 320

Body Weight

Estimates may vary from sample to sample.

The estimates will be more precise with larger samples, i.e., more consistent, and they are more likely to be close to the true mean.

Confidence Interval for a Mean

The estimated mean was 202. With 95% confidence, the true mean lies in the range of 184-220.

184 202 220

The range within which the true value lies with a specified degree of confidence.

Measurements

Frequencies

Comparing two or more groups

Do smokers weigh less than non-smokers?

[measurement

difference]

Increased risk of wound infection?

[relative risk]

Estimates in one group.

Case-fatality rate from bird flu (% of cases that die)?

[proportion]

Mean body weight of a population?

[measurement]

1

2+

Are the groups different?

How precise is measure of association?

Humans Who Got Bird Flu

What is the case-fatality rate?

No comparisons; just a single estimate in a group of cases.

What type of measure of disease frequency are we estimating?

See Epi_Tools.XLS : Worksheet “CI – One Group”

Interpretation: The estimated case-fatality rate was 50% (4/8). With 95% confidence the true CFR lies in the range 21.5-78.5%.

95% Confidence Interval For A Proportion

The range within which the true value lies, with 95% confidence.

95% Confidence Interval

0.22

0.78

0.0

1.0

4/8 = 0.5 = point estimate

It is very unlikely (less than 5% probability) that the true value is outside this range.

Suppose it had been 32 deaths out of 64 cases, i.e., same proportion but larger sample? The confidence interval would be….?
• Narrower
• Wider
• Same width
• Can’t say

The greater the sample size…

… the narrower the confidence interval (and the more precise the estimate).

N

N

N

N

N

Precision Depends on Sample Size

Weymouth, MA conducted an extensive health survey several years ago. One question asked, “Have you experienced violence because you or someone else was drinking alcohol?”

What type of measure of disease frequency are we estimating?

Hypothesis Testing:

Comparing Two or More Groups

Measurements

Frequencies

Estimates in one group.

Mean body weight of a population?

[measurement]

Case-fatality rate from bird flu (% of cases that die)?

[proportion]

1

2+

Comparing two or more groups

Do smokers weigh less than non-smokers?

[measurement

difference]

Increased risk of wound infection?

[relative risk]

Are the groups different?

How precise is measure of association?

Null Hypothesis (H0): the groups are not different

• H0: Mean Wgt. smokers = Mean Wgt. Non-smokers

Alternative Hypothesis (HA): the groups are different

HA: Mean Wgt. Smokers = Mean Wgt. Non-smokers

Null Hypothesis: Smokers & Non-Smokers Do Not Differ in Weight on Average

True population

distribution

Note that, here, smokers & non-smokers have the same distribution of body weights and the mean weights are the same.

Random

Samples

100 120 140 160 180 200 220 240 260 280 300 320

Random sample of non-smokers

Random sample of smokers

The sample means may differ just by chance (random error).

Alternative Hypothesis: They Differ

True means

Population

of non=smokers

Population

of smokers

100 120 140 160 180 200 220 240 260 280 300 320

sample of smokers

sample of non-smokers

Even if they are really different, there may be some overlap.

190 lbs.

205 lbs.

Smokers Non-smokers

Statistical Testing

How do we decide if they are really different or the differences were just due to chance?

If the null hypothesis were true, what would be the probability of getting a difference this great (or greater) just by chance (random error)?

190 lbs.

205 lbs.

Smokers Non-smokers

P– Values

“p” is for probability.

Definition of a p-value:

The probability of seeing a difference this big or bigger, if the groups were not different, i.e. just by chance.

P-values are generated by performing a statistical test. The particular statistical test used depends on study type, type of measurement, etc.

P-Values

• Range between 0 and 1.
• Small p-values(< 0.05) mean a low probability that the observed difference occurred just by chance (so they’re likely to be different).

p-valueProbability of occurring by chance

.55 55%

.25 25%

.12 12%

.05 5%

.01 1 in 100 (1% )

The differences may

be due to chance.

p < or = .05

Chance is an

unlikely explanation

Statistical Significance

If “p” < 0.05 it is unlikely that the difference is due to chance.

If so, we reject the null hypothesis

in favor of the alternative hypothesis,

i.e. the groups probably are different.

• This criterion for significance (< 0.05) is arbitrary.
• The p-value is a probability, not a certainty.
• P-values are influenced by:
• The magnitude of difference between groups.
• The sample size

If we wanted to compare two groups on measurement data (e.g. Is the mean weight of smokers less than that of non-smokers?), we might use a statistical test known as a “t-test” to determine whether there was a “statistically significant” difference.

Evaluating the Role of Chance

With Categorical Data

Measurements

Frequencies

Estimates in one group.

Mean body weight of a population?

[measurement]

Case-fatality rate from bird flu (% of cases that die)?

[proportion]

1

2+

Comparing two or more groups

Do smokers weigh less than non-smokers?

[measurement]

Increased risk of wound infection?

[relative risk]

Are the groups different?

How precise is measure of association?

Is the risk of wound infection really different, or were the observed differences just due to “the luck of the draw”?

Wound Infection

Cumulative

Incidence

Yes No

5.3%

Yes

7 124 131

(7/131)

Appendectomy

1.3%

No

1 78 79

(1/79)

8 202 210 Subjects

Do incidental appendectomy?

RR = 7/131 = 5.3 = 4.2

1/79 1.3

What would the null hypothesis be for the study on wound infections after incidental appendectomy?

How confident are we that the groups are different?

How accurate is this estimate?

The “null” value; i.e., no difference.

Estimated RR = 4.2

0 1.0 10

Possible values of RR or OR

Null Hypothesis (H0): the groups are not different

• H0: Incid. Infectionexposed = Incid. infection not exposed
• or H0: RR = 1.0 (or OR = 1)
• or H0: RD = 0
• or H0: Attrib. fraction = 0

Alternative Hypothesis (HA): the groups are different

• HA: RR or OR = 1.0
• or HA: RD = 0
• or HA: Attrib. fraction = 0

1.3%

5.3%

No appy. Appendectomy

Statistical Testing

How do we decide if they are really different or the differences were just due to chance?

If the null hypothesis were true, what would be the probability of getting a difference this great (or greater) just by chance (random error)?

5

(62% of 8)

126

3

(38% of 8)

76

If the Null Hypothesis Were True, What Would You Have Expected?

Expected if no association (Null Hypothesis)

Observed Data:

Infection

No Inf.

Infection

No Inf.

Appy

Appy

7

124

131 (62%)

No

Appy

No

Appy

1

78

79 (38%)

8

202

210 total

• How big a difference was there between what you expected under the null hypothesis and what was observed?
• If the null hypothesis were true, what is the probability of seeming a difference this big just due to sampling variability (the luck of the draw)?

5

(62% of 8)

126

3

(38% of 8)

76

If the Null Hypothesis Were True, What Would You Have Expected?

Expected if no association:

Observed Data:

Infection

No Inf.

Infection

No Inf.

Appy

Appy

7

124

131 (62%)

No

Appy

No

Appy

1

78

79 (38%)

8

202

210 total

Expected with

null hypothesis

c2 = S (O-E)2

E

= 2.24

If the null hypothesis were true, there would be a 13% probability of obtaining a difference this big (or bigger) just by chance (“luck of the draw”).

p = 0.13

(“p-value”)

Observed

Data:

Person-Years

of Observation

Death

Yes

No

+

30

-

699

-

36

-

1,399

66

2,098 total

Chi Squared Test with Person-Years of Observation

c2 = S (O-E)2

E

Death

Yes

No

-

+

30

-

-

-

36

-

c2 = 82 + 82 =2.91 + 1.45 = 4.36

22 44

(p < 0.05)

Chi Squared Test with Person-Years of Observation

Data Expected

by Chance:

Observed

Data:

33% of 66 = 22

Death

Yes

No

Expected with

null hypothesis

22

+

699 (33%)

-

44

1,399 (67%)

67% of 66 = 44

66

2,098 total

c2 = S (O-E)2

E

c2 = S (O-E)2

E

= 2.24

p = 0.13

Caution: c2 Test Exaggerates Significance with Small Samples

Expected with null hypothesis:

Infection

No Inf.

Appy

5

126

No

Appy

3

76

Fisher’s Exact Test (for small samples, i.e. if expected number in any cell is <5):

Here, p = 0.26 with Fisher’s.

Fisher’s requires an iterative calculation that is tough for a spreadsheet. See http://www.langsrud.com/fisher.htm for a 2x2 table that performs Fisher’s Exact Test.

The Limitations of p-Values

There is a tendency for p-values to devolve into a conclusion of "significant" or not based on the p-value.

If an effect is small and clinically unimportant, the p-value can be "significant" if the sample size is large. Conversely, an effect can be large, but fail to meet the p<0.05 criterion if the sample size is small.

When many possible associations are examined using a criterion of p< 0.05, the probability of finding at least one that meets the “critical point” increases in proportion to the number of associations that are tested.

The p-value does not represent the probability that the null hypothesis is true. The p-value is the probability that the data could deviate from the null hypothesis as much as they did or more.

Statistical significance does not take into account the evaluation of bias and confounding.

How precise is my estimate of the risk ratio?

Wound Infection

Cumulative

Incidence

Yes No

5.3%

Yes

7 124 131

(7/131)

Appendectomy

1.3%

No

1 78 79

(1/79)

8 202 210 Subjects

Do incidental appendectomy?

RR = 7/131 = 5.3 = 4.2

1/79 1.3

p

RR or OR

95% CI for a Proportion (one group)

95% CI for Risk Ratioor an Odds Ratio

For RR or OR:

For A Proportion:

95% CI =

95% CI =

RR exp(1 +1.96 )

p + 1.96

p(1 - p)

n

c

0.6 4.2 28

.6 .7 .8

Estimated

relative risk

or odds ratio

Estimated

proportion

or frequency

95% CI for a Risk Ratio or an Odds Ratio

The range in which the true RR or OR lies, with 95% confidence.

We’re 95% confident that the true value is in this range, so there is <5% probability that the true value is outside that range.

Here, the null value is outside the interval, so there is <5% chance that the null is the true value. The null is excluded, so p<0.05.

Estimated RR

The true RR

probably isn’t 1.

(probability <5%)

Relative Risk

0 1.0 10

Protective effect No Difference Increased risk

95% CI for RR or OR (continued)

Estimated RR

The true RR

could be 1.

Here, the null value is inside the interval, so there is >5% chance that the null is the true value. The null could be true, so p>0.05.

Relative Risk

0 1.0 10

Protective effect No Difference Increased risk

The 95% confidence interval for relative risk gives the range of RR values that are compatible with your sample.

If the 95% CI includes the null value, then the findings are compatible with RR=1.0, so there is no significant difference).

But, if the 95% CI does NOT include the null value, RR=1.0 is unlikely (p<0.05).

Relative Risk

0 1.0 10

(protective effect) No Difference (increased risk)

The 95% Confidence Interval for a RR or OR Tells You:

• Whether the groups being compared differ
• significantly (p<.05)
• 2) How precise was the estimate of relative risk.

RR= 1.0 means

no difference, i.e. no association.

Significant protective effect (p<.05)

Significant increase in risk (p<.05)

No significant difference (p>.05)

A 95% Confidence Interval Puts

Non-significant Results in Perspective

Bothof these results are non-significant.

Do you view them differently?

Relative Risk

0 1.0 10

Having a larger sample size does not ensure that you will get a “statistically significant” difference.

• It does mean:
• The estimate of RR or OR will be more precise.
• If there really is a difference between groups, you will have a better chance of detecting it (better statistical power).

How precise was the RR?

Use the Cohort Study sheet in “EpiTools” to calculate c2, the p-value, and the 95% confidence interval.

Wound Infection

Cumulative

Incidence

Yes No

5.3%

Yes

7 124 131

(7/131)

Appendectomy

1.3%

No

1 78 79

(1/79)

8 202 210 Subjects

Do incidental appendectomy?

RR = 7/131 = 5.3 = 4.2

1/79 1.3

95% CI:

.53 – 36.5

95% Confidence Interval for Estimated Risk Ratio in the Appendectomy Study

Estimated RR= 4.2

Relative Risk

1.0 10

(protective effect) (increased risk)

Sample Size Calculations

You can use the Sample Size sheet (Part II) in “Epi_Tools.XLS” to calculate how many subjects would have been needed, assuming equal numbers in each group.

It’s possible that incidental appendectomy really does carry an increased risk of wound infection, but the sample size was too small to demonstrate a significant difference.

If the true difference were really 5.3% vs. 1.3% how many subjects would it have taken to demonstrate a significant difference (with 90% power)?

What if the true difference were really 10% vs. 1.3% how many subjects would it have taken to demonstrate a significant difference (with 90% power)?

Among the first 10 people who developed bird flu 2 died ( case-fatality rate= 20%).

Among 210 people who developed bird flu to date 122 died (58% case-fatality rate).

How frequently is bird flu fatal?

How precise are these estimates?

A cohort study examined the association between DES and breast cancer & found RR=1.4 (p=0.006).
• 40% of women on DES got breast cancer but this was not ‘statistically significant.’
• 40% of women on DES got breast cancer & this was ‘statistically significant.’
• Women on DES had 1.4 times the risk of getting breast cancer, but this was not statistically significant because the p-value was less than 0.05.
• Women on DES had 1.4 times the risk of developing breast cancer, and this was statistically significant because the p-value was less than 0.05.

An association between DES exposure and breast cancer?

Results: RR=1.4 p-value = 0.006

• The RR suggests an increased risk with DES.
• The small p-value indicates it is unlikely that

the differences were due to chance.

• Therefore, difference is statistically significant.

If “p” < or = 0.05 it is unlikely that the difference is due to chance.

A RCT compared % pain relief from glucosame + chondroitin vs. placebo in osteoarthritis patients. RR=1.1 (p=0.10).
• Those on G+C were 10% more likely to report pain relief, but it was not ‘statistically significant.’
• Those on G+C were 10% more likely to report pain relief, and it was ‘statistically significant.’
• Those on G+C had 1.1 times the risk of pain, but it was not statistically significant.
• Those on G+C had 1.1 times the risk of pain, & it was statistically significant.

Are Glucosamine + Chondroitin Associated with Pain Relief?

Results: RR=1.1 p-value = 0.10

• The RR suggests a small increase in pain relief.
• But the p-value indicates a 10% probability that the difference was just due to sampling error.
• Therefore, NOT statistically significant.

If “p” < or = 0.05 it is unlikely that the difference is due to chance.

A study compared 35 diabetics to a comparable group of 35 non-diabetics and found that the diabetics had a higher incidence of femoral arterial occlusion.

The relative risk was 7.0, and the 95% CI was 1.2 to 14.3.

How would you interpret these findings?

Suppose you did a larger study and found a RR = 3.0

and a 95% confidence interval from 2.0- 4.0.

How would you interpret these findings?

A study compared 35 diabetics to a comparable group of 35 non-diabetics and found that the diabetics had a higher incidence of femoral arterial occlusion. The relative risk was 7.0, and the 95% CI was 1.2 to 14.3.

• Not statistically significant since the RR was inside the conf. int.
• Not statistically significant since the null was outside the conf. int.
• Was statistically significant since the RR was inside the conf. int.
• Was statistically significant since the null was outside the conf. int.

B) The association is statistically significant becausethe 95% confidence interval does not contain the “null value.”

C) A 95% confidence interval for an OR is the range within which the true OR lies with 95% confidence.

D) All of the above.

For Lenny's bakery: OR=3.46

P= 0.0033; 95% CI = (1.49-8.06)

For Lenny's bakery: OR=3.46, and P= 0.0033; 95% CI = (1.49-8.06)

A) The association is statistically significant because p <0.05.

B) The association is not statistically significant because p <0.05 & the null is excluded from the 95% confidence interval.

C) The association is not statistically significant because the 95% confidence interval contains the estimated OR.

Data Collection

26. Please list the names and locations of the restaurants, delis, or cafeterias where you

[your child] ate lettuce during those 7 days.

Name: _________________ Location: _____________ Date: _______

Name: _________________ Location: _____________ Date: _______

Name: _________________ Location: _____________ Date: _______

27. Did you [your child] eat sprouts, such as alfalfa or bean sprouts, in a restaurant, deli

or cafeteria? These may have been eaten as part of a salad or as part of other food

items such as sandwiches, tacos, etc.

Yes No Don't know

If yes: Did you [your child] eat alfalfa sprouts? (These are the thread-like sprouts

with green tops the size of the head of a pin.)

Yes No Don't know

28. Please list the names of the restaurants, delis, cafeterias, etc. where you [your child]

ate alfalfa sprouts in those 7 days?

Name: _________________ Location: _____________ Date: _______

Name: _________________ Location: _____________ Date: _______

Name: _________________ Location: _____________ Date: _______

Creating a Database

Example:

The Excel file “Hepatitis Outbreak” illustrates the use of Excel to record & store data from an outbreak investigation.

• Excel provides a simple, user-friendly platform that many people are familiar with.
• Also facilitates analysis
• Sorting
• Counts, averages
• Statistical tests (Chi square, t-tests, etc.)
Internal Validity
• The observed estimate is true (unbiased & precise).
• The observed estimate was misleading due to:
• Chance or sampling variability
• Bias
• Confounding

External Validity

• The findings can be generalized.

Causality

• A judgment based on a body of evidence.