1 / 38

EVALUATING THE ROLE OF RANDOM ERROR

Random Error 2. Introduction . Most epidemiological studies measure disease frequency in two (or more) groups that differ only on the exposure of interest. The two measures of disease frequency are combined into a single measure of association risk or rate ratio, odds ratio, risk or rate differenc

howie
Download Presentation

EVALUATING THE ROLE OF RANDOM ERROR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Random Error 1 EVALUATING THE ROLE OF RANDOM ERROR

    2. Random Error 2 Introduction Most epidemiological studies measure disease frequency in two (or more) groups that differ only on the exposure of interest. The two measures of disease frequency are combined into a single measure of association – risk or rate ratio, odds ratio, risk or rate difference

    3. Random Error 3 Introduction The next step is to evaluate whether the result that has been observed in the data is true, or whether the observed result is false and there is an alternate explanation. This is the process of assessing validity of a study result.

    4. Random Error 4 Three alternate explanations are: 1) Random Error. Not a systematic error, but chance or the luck of the draw 2) Bias. A systematic error in design or conduct of study which leads to error in estimation of association 3) Confounding. A third variable that is distorting the association

    5. Random Error 5 Only after deciding that an observed association is valid is it proper to form an opinion on whether the exposure really causes the disease -- or whether results are generalizable or applicable to a larger population.

    6. Random Error 6 Assessing causality involves making a judgment. The results of one study almost never establish causality. Knowledgeable minds can differ. Many believe that Hill's guidelines are a useful way to assess causality.

    7. Random Error 7 Statistical Inference Goal of any epidemiological study is to learn about the true relation between an exposure and disease based on data from a sample. This is inference. The actual study result will vary depending on who is actually in the study sample. This is known as sampling variability.

    8. Random Error 8 Statistical Inference Basic statistics: One urn holds 50 white and 50 red balls. You draw a sample of 4 to estimate the proportion of red balls.

    9. Random Error 9 Statistical Inference Basic statistics (cont’d): You could draw 2 red and 2 white, in which case your inference that 50% of the balls are red is correct! Or you could draw 4 red balls, in which case your inference that 100% of the balls are red is dead wrong, just by chance.

    10. Random Error 10 Statistical Inference Basic statistics (cont’d): You will make different inferences about the true proportion of red balls in the urn based on various possible samples that are drawn. Again, this is known as sampling variability.

    11. Random Error 11 Lessons from the balls and urns Given some hypothesis (in this case, the hypothesis is that the chance of drawing a red ball on any one try is 50%) and information about sample size, you can figure the probability of drawing a bad sample, or observing any given result from a set of sample data.

    12. Random Error 12 Lessons from the balls and urns As sample size increases, the sampling variability decreases and the chance of drawing a really unrepresentative sample decreases.

    13. Random Error 13 Back to epidemiology… The goal of a study is to estimate a measure of association based on a sample, and you may draw a bad (unrepresentative sample) due to chance alone. That is, the measure of association you observe in your data may differ from the true measure of association by chance alone. You can calculate the probability that the measure of association you observed was due to chance.

    14. Random Error 14 Hypothesis Testing Hypothesis testing means that you are performing a statistical test in order to get a P value. A statistical test quantifies the degree to which sampling variability or chance may explain the observed association.

    15. Random Error 15 Hypothesis Testing The assumption made about the result before you start the test is the null hypothesis (H0): RR=1, OR=1, RD=0. You are assuming that the H0 is true, NOT some alternative hypothesis (HA) The H0 is assessed by a statistical test that gives you a P value. The P value tells how likely it is that the observed result would occur, if the null hypothesis is really the truth.

    16. Random Error 16 Hypothesis Testing Definition of P value: Given that H0 is true, the p-value is the probability of seeing the observed result, and results more extreme, by chance alone.

    17. Random Error 17 Hypothesis Testing P value ranges from 0 to 1. The particular statistical test that is used depends on type of study, type of measurement, etc.

    18. Random Error 18 Hypothesis Testing Small P values indicate a low degree of compatibility between H0 and observed data because of the low probability that a result as extreme (or more so) would have been generated if H0 was true. A small P value implies that the alternate hypothesis is a better explanation for the data. Small P values indicate that chance is an unlikely explanation for the result.

    19. Random Error 19 Statistical Conventions p<=.05 is an arbitrary cutoff for statistical significance If p<= .05, we say results are unlikely to be due to chance, and we reject H0 in favor of HA.

    20. Random Error 20 Statistical Conventions If p>.05, we say that chance is a likely explanation for the finding and we do not reject H0. However, you cannot exclude chance no matter how small a P value is. In addition, you cannot mandate chance no matter how large a P value.

    21. Random Error 21 More on the P value…. P values reflect two things: the magnitude of the association and sample size (sample variability) It is possible to have huge sample where even a trivial risk increase or decrease is statistically significant It is possible to have a small sample where even a large risk increase or decrease is not statistically significant

    22. Random Error 22 P Value EX: In Utero DES Exposure and the Risk of Breast Cancer RR = 1.4 P value = .10 These results indicate that the best estimate of the increased breast cancer risk associated with DES is 1.4. The P value indicates a moderate degree of compatibility of these data with the null hypothesis. Since the P value is not less than .05, these results are not considered "statistically significant.”

    23. Random Error 23 Confidence Intervals Another approach to quantifying sampling variability is confidence intervals The actual measure of association given by the data is the point estimate. The point estimate has variability that can be expressed mathematically, just as a mean has a variance and standard deviation.

    24. Random Error 24 Confidence Intervals (cont’d) Given sampling variability, it is important to indicate the precision of the point estimate, i.e., give some indication of sampling variability This is indicated by the confidence interval.

    25. Random Error 25 Confidence Intervals One definition of a confidence interval: Range within which the true magnitude of effect lies with a stated probability, or a certain degree of assurance (usually 95%)

    26. Random Error 26 Confidence Intervals The strict statistical definition: If you did the study 100 times and got 100 point estimates and 100 CIs, in 95 of the 100 results, the true point estimate would lie within the given interval. In 5 instances, the true point estimate would not lie within the given interval. Note that the point estimate is RR, OR, or RD

    27. Random Error 27 Confidence Intervals Another way to think of a confidence interval: A range of possible values for the measure of association that are compatible with the observed data within specified limits (usually 95%)

    28. Random Error 28 Confidence Intervals Width of confidence interval indicates amount of sampling variability in the data. Width is determined by variability in the data and an arbitrary "certainty factor” (usually 95%, but you can choose any % you want)

    29. Random Error 29 Confidence Intervals The P value tells you the extent to which the null hypothesis is compatible with the data. The CI tells you much more: the range of hypotheses that are compatible with the data.

    30. Random Error 30 Confidence Intervals EX: DES and breast cancer RR = 1.4 95% CI = 0.7 – 2.6 Again, the results indicate that the best estimate of the increased breast cancer risk associated with DES is 1.4. However, we are 95% confident that the true RR lies between 0.7 and 2.6. That is, the data are also consistent with hypotheses of 0.7 to 2.6.

    31. Random Error 31 Confidence Intervals Since the CI contains the null value (RR=1.0), then these data are compatible with the null hypothesis. This means that P value associated with the result is greater than .05, and that the results are not statistically significant.

    32. Random Error 32 Confidence Intervals P values and CIs tell you nothing about the other possible explanations for an observed result: bias and confounding. P values and CIs tell you nothing about biological, clinical or public health significance.

    33. Random Error 33 Horse Race Analogy It is useful to think about statistical significance in the context of a horse race. When you go to the race track to bet on a race you need to make two decisions: which horse to bet on and how much money to bet. The point estimate should be used to guide your decision about which horse to bet on. For example, in the DES and breast cancer study, the best bet for a winning RR is 1.4. Don’t bet on any other number.

    34. Random Error 34 Horse Race Analogy (cont’d) The P value and confidence interval now can be used to guide your decision about how much money to bet. Since the P value is relatively large and the confidence interval is wide, you should not bet a lot of money on this "horse.” If the P value were small and confidence interval were narrow, you should bet more money on this "horse.”

    35. Random Error 35 Practice Exercise for interpreting P values and confidence intervals Five studies were conducted on the same exposure-disease relationship. Assume that there is no bias and confounding in these studies. The following results were seen (on next slide).

    36. Random Error 36 Practice Exercise

More Related