1 / 22

Critical review of significance testing

Critical review of significance testing. F.D’Ancona from a Alain Moren’s lecture. 2006. Botulism outbreak in Italy. “The relative risk of illness was higher among diners who ate home preserved green olives (RR=2.9)” Is it statistically significant ?. Tests of statistical significance.

traci
Download Presentation

Critical review of significance testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Critical review of significance testing F.D’Anconafrom a Alain Moren’s lecture 2006

  2. Botulism outbreak in Italy “The relative risk of illness was higher among diners who ate home preserved green olives (RR=2.9)” Is it statistically significant ?

  3. Tests of statistical significance • Many of them regarding differences between means or proportions • These tests help to establish if the observed difference is real (= if it is not due to the chance alone)

  4. The two hypothesis! When you perform a test of statistical significance you usually reject or not reject the Null Hypothesis (H0)

  5. Hypothesis, testing and null hypothesis • If data provide evidence against the Null Hypothesis then this hypothesis can be rejected in favour of some alternative hypothesis H1 (the objective of our study). • If you don’t reject the Null Hypothesis never you can say that the Null Hypothesis is true. You can only reject it or not reject it.

  6. Significance testing: H0 rejected using reported p value p = probability that a result (for example a difference between proportions or a RR) or more extreme values can be observed by chance alone Small p values = low degree of compatibility between H0 and the observed data: you reject H0 and the test is significant. Large p values = high degree of compatibility between H0 and the observed data: you don’t reject H0, the test is not significant Never we can reduce to zero the probability that our result was not observed by chance alone

  7. Levels of significance We need of a cut-off ! 0.01 0.05 0.10 p value > 0.05 = H0 non rejected (non significant) p value ≤ 0.05 = H0 rejected (significant) Avoid to submit for publication if p > 0.05 Referees commonly relied on tests of significance

  8. p = 0.05 and its errors • Level of significance, usually p = 0.05 • p value was used for decision making but still 2 possible errors • H0 should not be rejected, but it was rejected (Type I or alpha error or “false positive”) • H0 should be rejected but it was not rejected (Type II or beta error or “false negative”)

  9. Types of errors Truth Test result • H0 is “true” but rejected: Type I or  error • H0 is “false” but not rejected: Type II or  error The p value level is the level of  error that we could accept (usually 5%)

  10. p > 0.05 p = 0.06 Different ways to write the same concept but with more information Hypothetical data from a clinical trial of a new treatment Treatment Successful Unsuccessful Total B 14 8 22 A 7 13 20 Treatment B, success = 64 % Treatment A, success = 35% 2 = 3.44 p = NS

  11. The epidemiologist needs measurements rather than probabilities 2 is a test of association. OR, RR are measure of association on a continuous scale (infinite number of possible values) • The best estimate = point estimate • Range of values allowing for random variability=confidence interval (precision of the point estimate)

  12. Width of confidence interval depends on … • the amount of variability in the data • the dimension of the sample • the arbitrary level of confidence (usually 90%, 95%, 99%) One way to use confidence interval is : If1 is included in CI, then NON SIGNIFICANT If 1 is not included in CI, then SIGNIFICANT

  13. Confidence interval provide more information than p value • magnitude of the effect (strength of association) • direction of the effect (RR > or < 1) • precision around the point estimate of the effect (variability) p value can not provide them !

  14. Level of confidence interval at 95% • If the data collection and analysis could be replicated many times, the CI should include within it the TRUE value of the measure 95% of the time • The only thing that should bring variability is the chance!

  15. Hypothetical data from a clinical trial of a new treatment Treatment Successful Unsuccessful Total B 14 8 22 A 7 13 20 Treatment B, success = 64 % Treatment A, success = 35% p = NS RR = 1.82 95% CI ( 0.93 - 3.57) p > 0.05 p = 0.06 Different ways to write the same concept but with more information

  16. 1 RR  20 studies with different results... More studies are better or worse? • Decision based on results from a collection of studies are not facilitated when each study is classified as a YES or NO decision. • You have to look the CI and the punctualestimation • But also consider its clinical or biological significance

  17. A B Large RR RR = 1 Looking the CI Study A, large sample, precise results, narrow CI - SIGNIFICANT Study B, small size, large CI - NON SIGNIFICANT Study A, effect close to NO EFFECT Study B, no information about absence of large effect

  18. What we have to evaluate the study • 2 = A test of association. It depends on sample size • p value = Probability that equal (or more extreme) results can be observed by chance alone • OR, RR = Direction & strength of association if > 1 risk factor if < 1 protective factor (independently from sample size) • CI = Magnitude and precision of effect Remember that these values not provide any information on the possibility that the observed association is due to a bias or confounding. This possibility should be investigated

  19. 2 and Relative Risk Cases Non cases Total 2 = 1.3 E 9 51 60 p = 0.13 NE 5 55 60 RR = 1.8 Total 14 106 120 95% CI [ 0.6 - 4.9 ] Cases Non cases Total 2 = 12 E 90 510 600 p = 0.0002 NE 50 550 600 RR = 1.8 Total 140 1060 1200 95% CI [ 1.3-2.5 ] Cases Non cases Total 2 = 12 E 600 1400 2000 p = 0.0002 NE 500 1500 2000 RR = 1.2 Total 1100 2900 4000 95% CI [ 1.1-1.3 ]

  20. Common source outbreak suspected Exposure cases non cases AR% Yes 15 20 42.8% No 50 200 20.0% Total 65 220 2 = 9.1 p = 0.002 RR = 2.1 95%CI= 1.4-3.4 Remember that these values do not provide any information on the possibility that the observed association is due to a bias or confounding. HOW YOU COULD EXPLAIN THAT ONLY 23% OF CASES WERE EXPOSED ?

  21. Recommendations • Hypothesis testing and CI evaluate only the role of chance as alternative explanation of the association. • Interpret with caution every association that achieves statistical significance. • Double caution if this statistical significance is not expected.

  22. P < 0.05 Rothman It is not a good description of the information in the data

More Related