1 / 42

Section IV

Section IV. Sampling distributions Confidence intervals Hypothesis testing and p values. Population and sample. We wish to make inferences (generalizations) about an entire target population (ie, generalize to “everyone”) even though we only study one sample.

alisa
Download Presentation

Section IV

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Section IV Sampling distributions Confidence intervals Hypothesis testing and p values

  2. Population and sample We wish to make inferences (generalizations) about an entire target population (ie, generalize to “everyone”) even though we only study one sample. Population parameters=summary values for the entire population (ex: μ,σ,ρ,β ) Sample statistics=summary values for a sample (ex: Y, S, r, b)

  3. Samples drawn from a population Population Sample is drawn “at random”. Everyone in the target population is eligible for sampling. sample

  4. Population distribution of Y(individuals)- not Gaussian Mean Y=μ= 2.5, SD=σ=1.12

  5. Distribution of the sample means (Ys) - Sampling distribution-each observation is a SAMPLE statistic __ Y Mean Y = 2.5, SEM = 0.56, n=4 SEM = SD/n the square root n law

  6. Central Limit Theorem For a large enough n, the distribution of any sample statistic (mean, mean difference, OR, RR, hazard, correlation coeff,regr coeff, proportion…) from sample to sample has a Gaussian (“Normal”) distribution centered at the true population value. The standard error is proportional to 1/√n. (Rule of thumb: n> 30 is usually enough. May need non parametric methods for small n)

  7. Funnel plot - true difference is δ= 5Each point is one study(meta analysis)

  8. Confidence interval (for μ) We do not know μ from a sample. For a sample mean Y and standard error SE, a confidence interval for the population mean μ is formed by Y - Z SE, Y + Z SE (sample statistic is in the middle) For a 95% confidence interval, we use Z=1.96 (Why?) and compute Y – 1.96 SE, Y + 1.96 SE mean lower upper

  9. Confidence Intervals (CI)and sampling dist of Y -1.96(/n) 1.96(/n) 95% CI: Y  1.96 (/n)

  10. 95% Confidence intervals 95% of the intervals will contain the true population value But which ones?

  11. Z vs t (technical note) Confidence intervals made with Z assume that the population σ is known. Since σ is usually not known and is estimated with the sample SD, the Gaussian table areas need to be adjusted. The adjusted tables are called “t” tables instead of Gaussian tables (t distribution). For n > 30, they are about the same.

  12. Z distribution vs t distribution, about the same for n > 30

  13. t vs Gaussian Z percentiles What did the z distribution say to the t distribution? You may look like me but you're not normal.

  14. Confidence Intervals Sample Statistic ± Ztabled SE (using known variance) Sample Statistic ± ttabled SE (using estimate of variance) Example: CI for the difference between two means: __ __ (Y1 – Y2) ± ttabled (SEd) Tabled t uses degrees of freedom, df=n1+n2-2

  15. CI for a proportion“law” of small numbers n=10, Proportion = 3/10 = 30% What do you think are the 95% confidence bounds? Is is likely that the “real” proportion is more than 50%?

  16. CI for a proportion“law” of small numbers n=10, Proportion = 3/10 = 30% What do you think are the 95% confidence bounds? Is is likely that the “real” proportion is more than 50%? Answer: 95% CI: 6.7% to 65.3%

  17. Standard error for the difference between two means __ Y1 has mean μ1 and SE = √σ12/n1 = SE1 __ Y2 has mean μ2 and SE = √σ22/n2 = SE2 For the difference between two means (δ=1 - 2) SEδ = √(σ12/n1 + σ22/n2) SEd = (SE12 + SE22)

  18. Statistics for IOP improvementIntraocular pressure in mm Hg __ Mean difference = d = 5.5 – 4.0 = 1.5 mm Hg Std error of mean difference= SEd=[0.492 + 0.502] = 0.70 mm Hg Using t{df=23}=2.069 for the 95% confidence interval: CI: 1.5 ± 2.069(0.70) or (0.05 mmHg, 2.95 mmHg)

  19. Null hypothesis & p values Null Hypothesis- Assume that, in the population, the two treatments give the same average improvement in IOP. So the average difference is δ=0. Under this assumption, how likely is it to observe a sample mean difference of d= 1.5 mm Hg (or more extreme) in any one sample? This probability is called the (one sided) p value. The p value is only defined for a given null hypothesis.

  20. Hypothesis testingfor a mean difference, d d =sample mean IOP difference, Treatment B – Treatment A _ d = 1.5 mm Hg, SEd = 0.70 mm Hg 95% CI for true mean difference = (0.05 mm Hg, 2.95 mm Hg) But, under the null hypothesis, the true mean difference (δ) should be zero. How “far” is the observed 1.5 mm Hg mean difference from zero (in SE units)? tobs = (mean difference – hypothesized difference) / SE tobs = (1.5 – 0) / 0.70 = 2.14 SEs p value: probability of observing t=2.14 or larger if null hypothesis is true. p = 0.0216 (one sided t with df=23) p = 0.0432 (two sided)

  21. Hypothesis test statisticsZobs = (Sample Statistic – null value) / Standard error Z (or t)=2.14

  22. Difference & Non inferiority (equivalence) hypothesis testing Difference Testing: Null Hyp: A=B (or A-B=0), Alternative: A≠B Zobs = (observed stat – 0) / SE Non inferiority (within δ) Testing: Null Hyp: A > B + δ, Alternative: A <= B + δ Zeq = (observed stat – δ )/ SE Must specify δ for non inferiority testing

  23. Non inferiority testing-IOP data For IOP data, assume we declare non inferiority if the true mean difference is δ=2 mm Hg or less. The observed mean difference is d=1.5, which is less than 2. However, the nullhypothesis is that the true difference is 2 ormore versus the alternative of 2 or less. So Zeq=(1.5 –2)/0.70=-0.714, p=0.237 (one sided) We cannot reject the “null hyp” that the true δ is larger than 2. Our 95% confidence interval of (0.05, 2.95) also does NOT exclude 2, even though it excludes zero.

  24. Confidence intervalsversus hypothesis testing Study equivalence demonstrated only from –D tp +D (1‑8) (brackets show 95% confidence intervals) Stat Sig 1. Yes ----------------------------------------------------------------------------------------------- < not equivalent > 2. Yes -----------------------------------------------------------------------------< uncertain >-------------------- 3. Yes ------------------------------------------------------------------< equivalent >----------------------------------- 4. No ---------------------------------------------------< equivalent >--------------------------------------------------- 5. Yes ----------------------------------< equivalent > ---------------------------------------------------------------- 6. Yes ---------------------< uncertain>---------------------------------------------------------------------------------- 7. Yes -< not equivalent >----------------------------------------------------------------------------------------------- 8. No ---------<___________________________uncertain________________________________>------ | | ‑D O +D true difference Ref: Statistics Applied to Clinical Trials- Cleophas, Zwinderman, Cleopahas 2000 Kluwer Academic Pub Page 35

  25. Non inferiorityJAMA 2006 - Piaggio et al, p 1152-1160

  26. Paired Mean ComparisonsSerum cholesterol in mmol/LDifference between baseline and end of 4 weeks Subject chol(baseline) chol(4 wks) difference(di) 1 9.0 6.5 2.5 2 7.1 6.3 0.8 3 6.9 5.9 1.0 4 6.9 4.9 2.0 5 5.9 4.0 1.9 6 5.4 4.9 0.5 mean 6.875.421.45 SD 1.24 0.97 0.79 SE 0.51 0.40 0.32 _ Difference (baseline – 4 weeks) = amount lowered: d = 1.45 mmol/L SD = 0.79 mmol/L SEd = 0.79/6 = 0.323 mmol/L, df = 6-1=5, t0.975 = 2.571 95% CI: 1.45 ± 2.571 (0.323) = 1.45 ± 0.830 or (0.62 mmol/L, 2.28 mmol/L) t obs = 1.45 / 0.32 = 4.49, p value < 0.001

  27. Confidence IntervalsHypothesis Tests Confidence intervals are of the form Sample Statistic +/- (Zpercentile*) (Standard error) Lower bound = Sample Statistic- (Zpercentile)(Standard error) Upper bound = Sample Statistic + (Zpercentile)(Standard error) Hypothesis test statistics (Zobs*) are of the form Zobs=(Sample Statistic – null value) / Standard error * t percentile or tobs for continuous data when n is small

  28. Sample statistics and their SEs SampleStatistic Symbol Standard error (SE) __ Mean Y S/√n = √[S2/n] = SEM __ __ _ Mean difference Y1 – Y2 =d √[S12/n1 + S22/n2]= SEd Proportion P √[P(1-P)/n] Proportion difference P1 – P2 √[P1(1-P1)/n1 + P2(1-P2)/n2] Log Odds ratio* logeOR √[ 1/a + 1/b + 1/c + 1/d] Log Risk ratio* logeRR √[1/a -1/(a+c) + 1/b - 1/(b+d)] Slope (rate) b Serror / Sx√(n-1) Hazard rate (survival) h h/√[number dead] Transform (z) of the Correlation coefficient r* z=½loge[(1+r)/(1-r)] SE(z)=1/√([n-3]) r = (e2z -1)/(e2z + 1) *Form CI bounds on transformed scale, then take anti-transform

  29. Handy Guide to Testing

  30. Nomenclature for Testing Delta (δ) = True difference or size of effect Alpha (α) = Type I error = false positive = Probability of rejecting the null hypothesis when it is true. (Usually α is set to 0.05) Beta (β) = Type II error = false negative =Probability of not rejecting the null hypothesis when delta is not zero ( there is a real difference in the population) Power = 1 – β = Probability of getting a p value less than α (ie declaring statistical significance) when, in fact, there really is a non-zero delta. We want small alpha levels and high power.

  31. Statistic/type of comparison Mean comparison-unpaired Mean comparison-paired Median comparison-unpaired Median comparison-paired Proportion comparison-unpaired Proportion comparison-paired Odds ratio Risk ratio Correlation, slope Survival curves, hazard rates Test/analysis procedure t test (2 groups), ANOVA (3+ groups) paired t test, repeated measures ANOVA Wilcoxon rank sum test, KruskalWallis test* Wilcoxon signed rank test on differences* chi-square test (or Fishers test) McNemar’s chi-square test chi-square test, Fisher test chi-square test, Fisher test regression, t statistic log rank test* ANOVA = analysis of variance * non parametric – Gaussian distribution theory is not used to get the p value Statistical Hypothesis Testing

  32. Parametric vs non parametric Compute p values using ranks of the data. Does not assume stats follow Gaussian distribution. ParametricNonparametric 2 indep means- 2 indep medians- t test Wilcoxon rank sum test=MW 3+ indep mean- 3+ indep medians- ANOVA F test Kruskal Wallis test Paired means- Paired medians- paired t test Wilcoxon signed rank test Pearson correlation Spearman correlation

  33. Frequentist vs Bayesian The “frequentist” approach gives a p value. p value=probability of observing the data statistic (d) assuming the underlying population parameter has a null value (μd=0). p value=prob(data + | Hyp) Bayesian-the “probability” that μd=0 given the data (given d)? This is the Bayesian “posterior” probability = prob(H|data) Prob hyp is true given the data.

  34. Frequentists regard the unknown parameter (μd) as fixed (but unknown) and the observed data (d) as random (a random sample). Bayesians regard the unknown parameter (μd) as random and the observed data (d) as fixed. For both, μd=0 (null) or μd=δ (alternative) Both approaches have advantages and disadvantages

  35. In the IOP example, the observed mean difference in pressure between the two eye drops was d = 1.5 mm Hg. The p= 0.0216 is the probability that a sample d is 1.5 mm Hg or larger given the true population mean difference (μd) is zero. ABayesian wantstheprobability that μd is zero given d=1.5 not the probability d=1.5 or more given μd=0.

  36. Bayes rule applied to hypothesis testing H0= “μd=0”= null hypothesis Ha = “μd=δ” =alternative hypothesis P(H0) + P(Ha)=1 so P(Ha)=1-P(H0) P(H0|data) = P(data|H0) P(H0) P(data) = P(data|H0) P(H0) P(data|H0) P(H0) + P(data|Ha)P(Ha)

  37. Probabilities in the above are replaced by density functions. Frequentist and Bayesian results can be very different, but, in general, for large n, the Frequentist and Bayesian answers generally agree since the data terms dominate. For small n, they may or may not agree depending on the prior probabilities.

  38. Without getting into the computational details, Bayesians compute a probability distribution for μd, called the posterior density for μd. Similar to frequentist confidence intervals, Bayesians compute highest posterior density regions (HPDR) from this posterior density. For large sample sizes, the data “swamps” the prior information and therefore Bayesian results usually are about the same as the frequentist results. For example, the CI and the HPDR will be about the same for large n but may differ substantially for small n.

  39. Mean IOP comparison (n=12+13=25) μd=true mean difference in mm Hg Frequentist p value=0.0432 given μd=0 Frequentist confidence interval for true μd is 1.5 ± 2.07 (0.70) or (0.05, 2.95) Bayesian posteriorprobμd=0 given d=1.5 is 0.08 if prior probability P(H0)=0.5. Bayesian HPDR is (-2.2, 5.2) Less certain since high prior belief that μd=0.

  40. Alternate version of Bayes formula P(H0|data)=P(data|H0) P(H0) P(Ha|data) P(data|Ha) P(Ha) Posterior odds = LR x prior odds LR = data likelihood ratio = P(data|H0) P(data|Ha)

More Related