1 / 54

Advanced Statistics I

Advanced Statistics I. Xiayu (Stacy) Huang Bioinformatics Shared Resource Sanford | Burnham Medical Research Institute. Outline. Summary of Previous Talk Descriptive & inferential statistics T test Common statistical tests and Applications (Continue) One-way ANOVA

retha
Download Presentation

Advanced Statistics I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Statistics I Xiayu (Stacy) Huang Bioinformatics Shared Resource Sanford | Burnham Medical Research Institute

  2. Outline • Summary of Previous Talk • Descriptive & inferential statistics • T test • Common statistical tests and Applications (Continue) • One-way ANOVA • Post hoc/multiple comparison tests • One-way ANOVA power analysis • Common data transformation methods

  3. Why statistics is important to biologists? • Designing experiment • Analyzing biological data and understanding analysis results • Preparing manuscript and grant applications How many ??? How many replicates for my microarray exp??? Identifying outlier Normalization/transformation Statistical test, etc. DEGs No replicates=No statistics?

  4. Summary of previous talk • Descriptive statistics • Measure of central tendency (mean, median, mode) • Measure of dispersion (standard deviation, range, etc). Previous ppt can be downloaded from http://bsrweb.burnham.org

  5. Summary of previous talk • Inferential statistics • Null hypothesis (H0) and alternative hypothesis(Ha) • Type I and type II errors (aka. false positives and false negatives) • P-value (statistically significant if less than cut off α at 0.05 level) • Power (probability of detecting a true difference, 0.80) • T test • 3 types of t tests • T test can be performed in excel and Graphpad Prism • Power analysis of t test in G*Power new drug effect = old drug effect new drug effect ≠ or > old drug effect

  6. Analysis of variance (ANOVA) Overview • History of ANOVA • Ronald Fisher proposed ANOVA in 1918 • His first application of ANOVA was published in 1921 • What does ANOVA do? • Comparing the means of 3 or more populations without knowing the exact difference between population means • Types of ANOVA • One-way and multi-way ANOVA (# of factors of interest) • Repeated measures ANOVA (repeated measurements on the same subject) • MANOVA (# of different measurements)

  7. ANOVA Overview • How to decide which ANOVA to use? • How many different measurements? =>ANOVA or MANOVA • How many factors of interest? =>One-way or multi-way • Are there any repeated measures? => Repeated or not • Examples • 1: how the tumor size changes among three different mice genotypes within 7 weeks? (tumor volume measured every week) Measurements: factors: repeated measures: • 2: how the tumor size and body weight change among three different mice genotypes within 7 weeks? tumor size=>ANOVA genotype and time=>two-way Decision: two-way repeated ANOVA repeated Measurements: tumor size and body weight=>MANOVA Factors: genotype and time=>two-way Repeated measures: repeated Decision: two-way repeated MANOVA

  8. ANOVA application in biology One-way ANOVA Two-way ANOVA One-way MANOVA Two-way repeated ANOVA

  9. One-way ANOVA • One measurement, no repeats, one factor with multiple levels (“groups”) e.g. comparing the effect of three different drug treatments (control, drug A and drug B) on body weight • Measurement: body weight; Factor: treatment; Level: control, drug A and drug B • Hypothesis • H0: • Having no treatment effect • Ha:at least 1 population mean is different • Having treatment effect

  10. One-way ANOVA • Basic idea • compares 2 types of variation to test equality of population means • Source of variation(error) Total variation (SS(Total)) Variation among treatments (SST) Variation within treatments (SSE) SS(total)=SST+SSE

  11. One-way ANOVA • Source of variation group1 group2 group3 :Mean of group1; n1:sample size in group1 :Mean of group2; n2:sample size in group2 :Mean of group3; n3:sample size in group3 : overall mean of all the samples group1 group2 group3 variation among treatments total variation group1 group2 group3 variation within treatments

  12. One-way ANOVA • F test statistic • F statistic is a ratio of two variations • variation among treatments/(k-1) • variation within treatments/(n-k) • k=number of populations, treatment groups, or levels • n=total sample size • Large F values indicating variation among treatments is significantly greater than variation within treatments and there is an treatment effect • P-value computed from F statistic F= = p

  13. One-way ANOVA • Assumptions • Sampling should be independent and randomized. • Homogeneity of variance • Populations (for each condition) have equal variances • Check by Bartlett’s or Levene’s test • Normality • Populations (for each condition) are normally distributed • Check by normality test (such as Kolmogorov-Smirnov test) • Normal distribution=Gaussian distribution=>“bell-shaped” curve Note: equal sample size is preferred

  14. One-way ANOVA example • Goal: determining whether there is a significant effect of different mice groups on their performance on rotarod. • Measurement: number of seconds staying on a rotarod

  15. Data summarization using descriptive statistics

  16. Statistical test decision tree Relationship between variables Two sample comparison One measurement Multiple sample comparison

  17. Statistical test decision tree Number of measurement Number of factors

  18. Normality check in graphpad prism

  19. Normality check in graphpad prism

  20. Variance check and ANOVA analysis in graphpad prism

  21. Variance check and ANOVA analysis in graphpad prism SST SSE SS(Total)

  22. ANOVA analysis of example data in excel

  23. ANOVA analysis results of example data • Conclusion • There is at least one group of mice with different performance on rotarod than that of the other group of mice • Next • What are the exact differences between different groups? • Are groups 1 and 2 significantly different? Are groups 1 and 3 significantly different?

  24. Post hoc/multiple comparison tests introduction • ANOVA does not differ groups from each other • Problem of doing multiple t tests • High combined false positive error rate • 6 multiple t tests will give 26% chance of having at least one false positive error • Multiple comparison tests • They are based on T test or F test by considering the number of comparisons and adjust the p-value obtained from regular T test • Pairwise comparisons • Pairwise comparisons are differences between any two population means. On k populations or groups, there are k(k-1)/2 possible pairwise comparisons

  25. Post hoc/multiple comparison tests introduction • Error rate • Familywise error rate(FWER) • Probability of making at least one false positive among all the comparisons • --Example: 4 groups, #of all possible pairwise comparisons=4(4-1)/2=6 • if the allowed false positive rate for each comparison is 0.05, the probability of making at lease one false positive is 1-(1-0.05)6 =0.26 • control FWER at given α level such as 0.05 • False discovery rate (FDR) • Used for large data sets such as microarray • Probability of declared significant results that are actually false positives • --Example: If 1000 genes were declared to have statistical significance, and FDR=0.05, then 50 genes would be expected to be false positives • Control FDR at given α level such as 0.05

  26. Common post hoc/multiple comparison tests • Fisher’s protected least significant difference(fisher’s LSD) • The first post hoc developed by Fisher to study pairwise comparisons • Assumptions: equal variances, normality, and significant F test results • Does not control FWER • Tukey’s test • Most popular post hoc and suitable for all pairwise comparisons (6 or more) • Assumptions: equal variances and normality • Control FWER and suitable for both equal and unequal sample size • Bonferroni test • Suitable for any set of preplanned comparison • αper comparison= α/total number of comparisons • Control FWER, too conservative in many situations

  27. Common post hoc/multiple comparison tests • Benjamini-hochberg false discovery rate(FDR) • Proposed by Benjamini and Hochberg in 1995 • Four step procedures • Conduct m separate t-tests for m genes, each at common significance level 0.05 • Order p-value of m genes from smallest to largest • Find the largest K such that Pk ≤(k/m)*α • Declare genes 1…K are statistically significantly different between two comparing groups. • Less conservative than Bonferroni test and more powerful than familywise error rate

  28. Post hoc /multiple comparison tests example • Goal: determining whether there is an overall significant difference of performance among the four mice groups and how the performance of each mice group is different from each other. • Measurement: number of seconds staying on a rotarod # of possible pairwise comparison: 4x(4-1)/2=6 Multiple comparison test: Tukey’s test(6 or more comparisons)

  29. Post hoc /multiple comparison tests in graphpad prism

  30. Comparing different multiple comparison tests

  31. Post hoc /multiple comparison tests example • Goal: determining whether there is a statistically significant difference between WT and KO for a set of m=12 genes at FDR (Q)=0.05. Order by p-value in increasing order Find largest K Pk ≤ (k/m)Q Conclusion: two genes (SLU7 and LGI1 ) are statistically significant different between WT and KO among the 12 genes tested at FDR(Q)=0.05

  32. Outline • Summary of Previous Talk • Descriptive & inferential statistics • T test • Common statistical tests and Applications (Continue) • One-way ANOVA • Post hoc/multiple comparison tests • One-way ANOVA power analysis • Common data transformation methods

  33. Power analysis • Power depends on: • Sample size ( ) • Standard deviation ( or ) • Minimal detectable difference ( ) • False positive rate ( ) • What you can do with power analysis • Minimal sample size required • Minimal detectable difference or effect size • Power of the test effect size

  34. Power analysis software/packages • G*Power (free!!!) • Optimal design (free!!!) • SPSS sample power • PASS • SAS proc power, Stata sampsi, etc • Mplus for more advanced/complicated analysis • Many free on-line programs • http://www.stat.uiowa.edu/~rlenth/Power/

  35. One-way ANOVA power analysis in G*Power • Test family • F test • Statistical test • ANOVA, fixed effect, omnibus, one-way • Type of power analysis • Compute sample size-given α, power, and effect size • Input parameters • Effect size ( f ) • False positive rate ( ) • usually 0.05 • Minimum Power ( ) • usually 0.80 • Number of groups • Output parameters • Noncetrality parameter ( ) • Critical F • Degree of freedom • Total sample size • Actual power

  36. One-way ANOVA sample size calculation • Goal: how many mice should I use so as to have an 80% of probability of detecting observed difference at the 0.05 level of significance?

  37. One-way ANOVA sample size calculation in G*Power

  38. One-way ANOVA sample size calculation in G*Power ANOVA analysis results

  39. One-way ANOVA sample size calculation in G*Power

  40. Outline • Summary of Previous Talk • Descriptive & inferential statistics • T test • Common statistical tests and Applications (Continue) • One-way ANOVA • Post hoc/multiple comparison tests • One-way ANOVA power analysis • Common data transformation methods

  41. Data Transformation • Why? • Many biological variables do not follow normal distribution • How? • Applying a mathematical function on each observation • Performing statistical tests using transformed data • Interpreting results using back transformation • Common data transformation methods in biology • Log transformation • Square root transformation • Arcsine transformation • Reciprocal transformation

  42. Log transformation • Usage • Convert a right skewed distribution into a symmetrical one • Applicable when there are unequal variances and standard deviations are proportional to the means • Mathematical function • Logarithms in any base are satisfactory • Back transformation:

  43. Square root transformation • Usage • Applicable when the group variances are proportional to the means • Samples taken from Poisson distribution such as counting data • Mathematical function • Back transformation:

  44. Arcsine transformation • Usage • Applicable when data (proportions or percentages) was taken from a binomial distribution • Mathematical function • Back transformation: • Shortcoming • Not good at the ends of the range (near 0 and 100%) • Adjustment needed when p near 0 and 100%

  45. Choosing transformation methods based on data distribution Shape Figure Transformation Reverse J A 1/X Severe skew right B Log (X) Moderate skew right C sqrt (X)

  46. Choosing transformation based on data distribution Shape Figure Transformation Moderate skew left D 1/sqrt(X) Severe skew left E -1/Log (X) J shape F -1/X

  47. Data transformation example frequency weight Histogram

  48. Data transformation example

  49. Choosing transformation based on data distribution Shape Figure Transformation Reverse J A 1/X Severe skew right B Log (X) Moderate skew right C sqrt (X)

  50. Data transformation example frequency frequency log2_weight weight before transformation after transformation

More Related