1 / 24

Statistics

Statistics. March 19, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility. Sampling Variability and Standard Error.

kanoa
Download Presentation

Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics March 19, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility Nemours Biomedical Research

  2. Sampling Variability and Standard Error • If we repeat an experiment or measurement on the same number of subjects, the statistic varies as sample varies. This variability is known sampling variability • Standard error (SE) measures the sampling variability or the precision of an estimate. • It indicates how precisely one can estimate a population value from a given sample. • For a large sample, approximately 68% of times sample estimate will be within one SE of the population value. Nemours Biomedical Research

  3. Statistical Inference • Statistical Inference • Sample Population • Statistical inference is the process by which we acquire information about populations from samples. • Two types of estimates for making inferences: • Point estimation: Calculates statistic and then uses for guessing population parameter. • Interval estimation: Calculates an interval using sample data with in which parameter lie with a certain probability. Nemours Biomedical Research

  4. Parameter Population distribution ? Sample distribution Point estimator Point Estimation • A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point. Nemours Biomedical Research

  5. Parameter Population distribution Sample distribution Interval Estimation • An interval estimator draws inferences about a population by estimating the value of an unknown parameter using an interval. Interval estimator Nemours Biomedical Research

  6. Hypothesis Testing (Quantitative) Nemours Biomedical Research

  7. Hypothesis Testing • Hypothesis • Statement or belief about some characteristic of a population such as it’s mean or standard deviation • Types of Hypotheses • Null Hypothesis (H0): The hypothesis that is assumed to be true unless contradicted by data observed in a sample • Alternative Hypothesis(H1): The hypothesis one must assumed if the H0 is rejected. • H0 & H1 are complementary • Normally want to reject H0 in favor of H1 Nemours Biomedical Research

  8. Test of hypothesis • Goal:Keep a, b reasonably small Nemours Biomedical Research

  9. Test of hypothesis • Level of Significance: The probability of type I error is known as the level of significance. • Power (1-β): The probability of rejecting H0 when alternative hypothesis is true at a fixed level of significance (α). That is, the probability of rejecting null hypothesis for a specified value of an alternative hypothesis. • Power of a test is a function of sample size and the parameter of interest. • We calculate power for a particular value of the parameter in alternative hypothesis. • An increased sample size increases power of a test. Nemours Biomedical Research

  10. Test of hypothesis • We aim to make inferences controlling both type I and type II errors. • The reduction in one results in an increase in the other • The consequence of type I error seems to be more severe than that of type II error. • That’s why we choose a test that minimizes type II error (maximize the power of the test) keeping type I error at a fixed low level (say 0.05). Nemours Biomedical Research

  11. Test of hypothesis • Reasons for estimating power prior to conducting a research study: • To determine the sample size required for a sufficient power to correctly detect a significant difference. Nemours Biomedical Research

  12. Test of hypothesis • Increasing Power: If power of a test is too small than following steps can increase power: • Increase the sample size • Increase the significance level (α ) • Reduce the variability • Enlarge the effect size Nemours Biomedical Research

  13. P-Value P-value isthe probability, assuming H0 is true, that the test statistic would take a value as extreme or more extreme than that actually observed. P value is a statement of the likelihood that the the null hypothesis is true. The smaller the P-value, the stronger the evidence against H0 provided by the data. A p-value of 0.05 implies that there is a 1 in 20 chance of a Type I error, i.e., rejecting H0 when it is actually true. Nemours Biomedical Research

  14. Confidence Interval • A confidence interval (CI) is an interval within which the value of the parameter lies with a specified probability • CI measures the precision of an estimate • A large sampling variability leads a wide interval reflecting the uncertainty of the estimate • A 95% CI implies that if one repeats a study 100 times, the true measure of association will lie inside the CI in 95 out of 100 measures. • If a parameter does not lie within 95% CI, indicates significance at 5% level of significance Nemours Biomedical Research

  15. Confidence Interval (CI) point estimate (measure of how confident we want to be)  (standard error) • What effect does larger sample size have on the confidence interval? • It reduces standard error and makes CI narrower indicating more precision of estimate Nemours Biomedical Research

  16. Steps in hypothesis Testing Hypothesis testing steps: 1. Specify null (H0) and alternative (H1) hypotheses 2. Select significance level (alpha) - say 0.05 or 0.01 3. Calculate test statistic – e.g. t, F, Chi-square 4. Calculate probability value (p-value) or confidence Interval (CI)? 5. Describing the result and statistic in an understandable way. Nemours Biomedical Research

  17. Procedures for sample size calculation • Select primary variables of interest and formulate of hypotheses • Determine/estimate standard deviation (if numeric) or proportion (if categorical) • Decide a tolerance level of significance () • Determine test statistic to use • Determine desired power or confidence level • Determine effect size -- a scientifically or clinically meaningful difference Nemours Biomedical Research

  18. Factors affecting sample size • The larger the standard deviation, the larger the sample size needs to be. • The smaller the likelihood of a Type I error, the larger the sample size must be • The greater the amount of power desired, the larger the sample size must be • The smaller the expected effect, the larger the sample size must be Nemours Biomedical Research

  19. Example: Sample size calculation • Aim of the study: To test the hypothesis that kids with a parental history of obesity are at a higher risk of being obese themselves. • Suppose prevalence of obesity is 2% among US kids, whereas it is 5% among US kids with a parental history. • How many kids should we recruit to detect this difference, (.05 - .02 = .03) with a 90% power using a two-sided test with α =.05? • Calculation shows that 340.2  341 kids are needed for this study. Nemours Biomedical Research

  20. Sample size calculation in R • power.t.test: For a hypothesis test of mean of single or two groups. • E.g. power.t.test( power=0.87, delta=1, sd=1, sig.level=0.05) (in one sample delta is the difference null hypothesis and alternative hypothesis and in two-sample delta is the true difference of means of two groups). Calculated n21. • power.prop.test: For a hypothesis test of a proportion • power.prop.test(power=.80, p1=.5, p2=.75, sig.level=0.05) (in one sample: p1 is the null and p2 is alternative and for two samples p1 and p2 proportion for two groups. Calculated n58. Nemours Biomedical Research

  21. Useful links for sample size Calculation • http://hedwig.mgh.harvard.edu/sample_size/size.html • http://www.stat.uiowa.edu/~rlenth/Power/index.html • http://cct.jhsph.edu/javamarc/index.htm • http://stat.ubc.ca/~rollin/stats/ssize/index.html • http://statpages.org/#Power Nemours Biomedical Research

  22. Power calculation in R • Some R functions for power calculation (same functions can be used for sample size calculation): • Power.t.test: For a hypothesis test of mean of single or two groups. • E.g. power.t.test( n=20, delta=1, sd=1, sig.level=0.05) (in one sample delta is the difference null hypothesis and alternative hypothesis and in two-sample delta is the true difference of means of two groups). Calculated power is about 0.87. • Power.prop.test: For a hypothesis test of a proportion • Power.prop.test(n=50, p1=.5, p2=.75, sig.level=0.05) (in one sample: p1 is the null and p2 is alternative and for two samples p1 and p2 proportion for two groups. Calculated power is about 0.74. Nemours Biomedical Research

  23. Rcmdr Demo: p-value and Confidence Interval (CI) calculation • Let us calculate p-value and 95% CI for mean of the variable age in our default data set • Load the csv data in R as well as in Rcmdr • Select menu ‘statistics’ from Rcmdr and then select ‘single.sample t-test’ from drop down menu and specify null hypothesis (say mean=80) and confidence level (.95) and select alternative hypothesis from single.sample t-test’ window.The resulting R output is pasted below: data: Dataset$age t = 2.6691, df = 59, p-value = 0.009807 alternative hypothesis: true mean is not equal to 80 95 percent confidence interval: 82.60748 98.22585 sample estimates: mean of x 90.41667 Nemours Biomedical Research

  24. Rcmdr Demo: p-value and Confidence Interval (CI) calculation • Let us calculate the 95% CI and p-value for the proportion of female or male kids in the default data set. • Load the csv data in R as well as in Rcmdr • Select menu ‘statistics’ from Rcmdr and then select ‘proportions’ from drop down menu and then specify null hypothesis and confidence level and select alternative hypothesis and test type from ‘single sample proportion’ window.The R output is pasted below: null probability 0.6 X-squared = 2.5, df = 1, p-value = 0.1138 alternative hypothesis: true p is not equal to 0.6 95 percent confidence interval: 0.3773502 0.6226498 sample estimates: p 0.5 Nemours Biomedical Research

More Related