1 / 36

Laboratory & Professional skills for Bioscientists Term 2: Data Analysis in R

Laboratory & Professional skills for Bioscientists Term 2: Data Analysis in R. Two sample tests: two-sample t -test, two-sample Wilcoxon (also one-sample Wilcoxon) Practical: one and two sample tests examples, different data formats ggplot. Overview of topics. Also in previous lecture.

cybill
Download Presentation

Laboratory & Professional skills for Bioscientists Term 2: Data Analysis in R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Laboratory & Professional skills for BioscientistsTerm 2: Data Analysis in R Two sample tests: two-sample t-test, two-sample Wilcoxon (also one-sample Wilcoxon) Practical: one and two sample tests examples, different data formats ggplot.

  2. Overview of topics Also in previous lecture Foundation

  3. Summary • Tests for one-, two-, and paired-samples (t-tests and non-parametric equivalents) • Today • two-sample t-test for independent samples • Wilcoxon for one- and two-samples when t-test assumptions are not met • Choosing appropriate tests: type of question, type of data • Second of two lectures; single workshop

  4. Learning objectives Also in previous lecture By actively following the lecture and practical and carrying out the independent study the successful student will be able to: • Explain dependent and independent samples (MLO 2) • Select, appropriately, t-tests and their non-parametric equivalents (MLO 2) • Apply, interpret and evaluate the legitimacy of the tests in R (MLO 3 and 4) • Summarise and illustrate with appropriate R figures test results scientifically (MLO 3 and 4)

  5. Types of t-test Also in previous lecture • One-sample • Compares the mean of sample to a particular value (compares the response to a reference) • Includes paired-sample test for dependent samples (i.e., two linked measures) • Two-sample • Compares two (independent) means to each other

  6. Two-sample t-tests NEW! • Is there a difference between two independent means • Independent – values in one group not related to values in the other group • Example: is there a significant difference between the masses of male and female chaffinches? NOT LINKED Fringilla coelebs

  7. t-tests Also in previous lecture • Standard formula for all t-tests To aid understanding, not for remembering • d.f.= n1 + n2 - 2 NEW!

  8. t-tests in generalAssumptions Also in previous lecture • All t-tests assume the ‘data’ are normally distributed • 0ne-sample – the data themselves • Paired sample – the differences between the pairs • Two-sample – each variable should be n.d • Additionally the two-sample t-test assumes equal variances

  9. t-tests in general: assumptionsChecking normality Also in previous lecture • Common sense • Data should be continuous • There should not be lots of values the same • figure • Difficult to tell for a small sample • Using a test in R • shapiro.test()

  10. Two-sample t-tests: assumptionsChecking equality of variance NEW! • Common sense • Compare the two values • figures • Difficult to tell for small samples • Using a test in R • var.test()

  11. t-tests in general: assumptionsWhen data are not normally distributed • Transform (normality) • E.g. Log to remove skew, arcsinsquareroot on proportions • Welch’s t if variances non-equal • Use a non-parametric test • Fewer assumptions • Generally less powerful Extension of previous lect

  12. t-testsTwo-sample t-test example NEW! • Example: is there a significant difference between the masses of male and female chaffinches? Fringilla coelebs

  13. t-testsTwo-sample t-test example NEW! Note: these data are ‘tidy’ All the responses in one column with other variables indicating the group chaff <- read.table("../data/chaff.txt", header = T) Organise your data this way

  14. Tidy data Extension of previous lect • Each variable you measure should be in one column. • Each different observation of that variable should be in a different row. • There should be one table for each "kind" of variable. • If you have multiple tables, they should include a column in the table that allows them to be linked. Independent study: Wickham, H (2013). Tidy Data. Journal of Statistical Software. https://www.jstatsoft.org/article/view/v059i10

  15. t-testsTwo-sample t-test example Extension of previous lect • Checking the normality assumption - histograms tapply(chaff$mass, chaff$sex, hist)

  16. t-testsTwo-sample t-test example Extension of previous lect • Checking the normality assumption - testing • tapply(chaff$mass, chaff$sex, shapiro.test) • $females • Shapiro-Wilk normality test • data: X[[i]] • W = 0.96783, p-value = 0.7085 • $males • Shapiro-Wilk normality test • data: X[[i]] • W = 0.96261, p-value = 0.5973

  17. t-testsTwo-sample t-test example NEW! • Checking the equal variance assumption - testing • var.test(chaff$mass ~ chaff$sex) • F test to compare two variances • data: chaff$mass by chaff$sex • F = 0.98788, numdf = 19, denomdf = 19, p-value = 0.9791 • alternative hypothesis: true ratio of variances is not equal to 1 • 95 percent confidence interval: • 0.3910141 2.4958251 • sample estimates: • ratio of variances • 0.9878779

  18. t-testsTwo-sample t-test example Extension of previous lect Run the t-test Manual page: t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95, ...) • t.test(chaff$mass ~ chaff$sex, paired = FALSE, var.equal = TRUE) • Two Sample t-test • data: chaff$mass by chaff$sex • t = -2.6471, df = 38, p-value = 0.01175 • alternative hypothesis: true difference in means is not equal to 0 • 95 percent confidence interval: • -3.167734 -0.422266 • sample estimates: • mean in group females mean in group males • 20.480 22.275

  19. t-testsTwo-sample t-test example Extension of previous lect Reporting the result: “significance of effect, direction of effect, magnitude of effect” Male chaffinches () are significantly heavier than females () (t = 2.65; d.f. = 38; p = 0.012). See figure 1.

  20. t-testsTwo-sample t-test: figure NEW! Parametric tests: use mean and S.E (C.I. or s.d.)

  21. t-testsTwo-sample t-test: figure NEW! What is in the figure? You do not say Graph to show..

  22. t-testsTwo-sample t-test: figure NEW! Error bars of some kind Say what kind!

  23. t-testsTwo-sample t-test: figure NEW! Label with units Always refer to figure in the text

  24. NEW!

  25. NEW!

  26. ggplot • ‘grammar of graphics’ makes it easy to do many routine tasks (like drawing legends, adding errorbars, setting axis limits etc). • On your own pc you need to install ggplot2 using: install.packages("ggplot2") • You only need to install a package once but you need to load it in to your library each R session. This is done with: library(ggplot2)

  27. ggplot ggplot(data = dataframe, aes(x = yourxvar, y = youryvar) ) + geom_point() • ggplot(data = chaff, aes(x = sex, y = mass)) + • geom_point() • chaffsummary <- summarySE(data = chaff, measurevar = "mass", groupvars = "sex") • ggplot(data = chaffsummary, aes(x = sex, y = mass)) + • geom_bar(stat = "identity")

  28. NEW! When the t-test assumptions are not met: non- parametric tests • Non-parametric tests make fewer assumptions • Based on the ranks rather than the actual data • Null hypotheses are about the mean rank (not the mean)

  29. NEW! Non-parametric testst-test equivalents i,.e., the type of question is the same but the response variable is not normally distributed • one – sample t-test and paired-sample t-test: the one-sample Wilcoxon • Two-sample t-test (next lecture): two-sample Wilcoxon aka Mann-Whitney

  30. NEW! Non-parametric testsone/paired-sample Wilcoxon Let’s suppose that we decided to do a non-parametric test on the marks data > wilcox.test(data = marks, mark ~ subject, paired = TRUE) Wilcoxon signed rank test with continuity correction data: mark by subject V = 48.5, p-value = 0.03641 alternative hypothesis: true location shift is not equal to 0 Warning message: In wilcox.test.default(x = c(97L, 58L, 65L, 65L, 80L, 48L, 85L, : cannot compute exact p-value with ties No need to worry!

  31. Non-parametric testsone/paired-sample Wilcoxon Extension of previous lect Individual students score significantly higher in maths than in statistics (Wilcoxon: V = 48.5; n = 10; p = 0.036). Reporting the result: “significance of effect, direction of effect, magnitude of effect”

  32. NEW! Non-parametric teststwo-sample Wilcoxon (Mann-Whitney) Example: comparing the number of leaves on 8 mutant and 7 wild type plants (small samples, counts)

  33. NEW! Non-parametric teststwo-sample Wilcoxon (M-W): example • Carrying out the test two-sample Wilcoxon • wilcox.test(data = plants, leaves ~ type, paired = FALSE) • Wilcoxon rank sum test with continuity correction • data: leaves by type • W = 5, p-value = 0.008664 • alternative hypothesis: true location shift is not equal to 0 • Warning message: • In wilcox.test.default(x = c(3, 5, 6, 7, 3, 4, 5, 8), y = c(8, 9, : • cannot compute exact p-value with ties No need to worry!

  34. Non-parametric teststwo-sample Wilcoxon (M-W): example There are significantly more leaves on wild-type (median = 8) than mutant (median = 5) plants (Mann-Whitney: W=5, n1=7, n2=8, p = 0.009) Reporting the result: “significance of effect, direction of effect, magnitude of effect” 34

  35. Non-parametric teststwo-sample Wilcoxon (M-W): example Label with units Non-parametric tests: use median+IQR Measure of dispersion - IQR What is the figure? Always refer to figure in the text

  36. Learning objectives for the week By attending the lectures and practical the successful student will be able to • Explain dependent and independent samples (MLO 2) • Select, appropriately, t-tests and their non-parametric equivalents (MLO 2) • Apply, interpret and evaluate the legitimacy of the tests in R (MLO 3 and 4) • Summarise and illustrate with appropriate R figures test results scientifically (MLO 3 and 4)

More Related