1 / 22

Data Analysis in R: Introduction to Module and RStudio

This module provides an introduction to data analysis and statistical testing using R programming language. Students will learn how to choose appropriate statistical tests, perform analyses, interpret results, and present findings graphically.

djack
Download Presentation

Data Analysis in R: Introduction to Module and RStudio

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Laboratory & professional skills for BioscientistsTerm 2: Data Analysis in R Week 3: Introduction to module and RStudio

  2. Module Learning Outcomes (MLO) The successful student will be able to: • Explain the purpose of data analysis • Name, identify and choose classical univariate statistical tests (and some non-parametric equivalents) appropriate to a given scenario and recognise when these are not suitable • Use R to perform these analyses on data in a variety of formats • Interpret, report and graphically present the results of covered tests • Meeting the learning outcomes will enable you to: • Write-up your laboratory report • Design and analyse experiments including those for projects in stages 2, 3, and 4 and year-away • Evaluate and interpret the data analysis in papers • Perform well in assessments • Improve your employability!

  3. Interlinked delivery http://www-users.york.ac.uk/~er13/17C%20-%202017.yrk/BIO00017C.html Weekly Workshop • Preparatory independent study – Friday email • Lecture and slides – introduction to concepts • Workshop – apply concepts • Follow up Independent study - consolidate understanding Warnings! These do not standalone All are needed Progressive

  4. Learning Outcomes The successful student will be able to: • Explain the purpose of data analysis • Name, identify and choose classical univariate statistical tests (and some non-parametric equivalents) appropriate to a given scenario and recognise when these are not suitable • Use R to perform these analyses on data in a variety of formats • Interpret, report and graphically present the results of covered tests

  5. Overview of topics

  6. Other sources of information • Your previous work!!! • The help pages • Online – many! • Datacamphttps://www.datacamp.com/ • Winston Chang’s http://www.cookbook-r.com/ • http://www.rstudio.com/resources/training/online-learning/ • Google • Practice!! • Books? • Dytham. Choosing and Using Statistics: A Biologist's Guide • Beckerman & Petchey Getting Started with R: An introduction for biologists • Crawley. Statistics: An introduction using R “Young man, in mathematics you don't understand things. You just get used to them.”

  7. Summary of this week • We explain why we do statistical tests and perform our first significance test • Using DataCamp and RStudio we learn how to use the command line, basic functions and arguments; navigate the panes; and what the workspace, scripts and history are

  8. Learning objectives for the week By actively following the lecture and practical and carrying out the independent study the successful student will be able to: • to explain why we need statistical tests and the logic of hypothesis testing (MLO 1) • use the R command line as a calculator and to assign variables (MLO 3) • Create and use the basic data types in R (MLO 3) • find their way around the RStudio windows (MLO 3) • create, use and save a script file to run r commands (MLO 3) • search and understand manual pages (MLO 3)

  9. Overview • ‘Experiments’ Something we measure Some things we control, choose or set Dependent variables Response variables The ‘y’ s Independent variables Predictor / explanatory variables The ‘x’ s inferences made

  10. Why do we need statistics? • Responses vary; ‘coincidence’ can be common • Example: Birthweight • National average is 3300 grams • A sample 25 mothers who live in poverty has a mean of 3000 grams • Is that really different or just different by chance?

  11. The logic of ‘hypothesis’ testing • Do our results (3000 g) seem unusual? • That is, unusual compared to ‘normal’ where there is no effect • 3300 g (the National average) is the ‘no effect hypothesis’

  12. No effect or ‘null’ hypothesis. H0 What you expect to happen if nothing interesting biologically is occurring.E.g., average weight is 3300.This does not mean every baby is 3300 g or every sample of babies has a mean of 3300 g. It means we wouldn’t expect a sample mean to be “too far” from 3300 g

  13. How far is too far? Distributions We calculate the probability of 3000 g if we expect 3300 g on average What is P(3000) or lower from a distribution with mean 3300 Don’t worry how p was calculated for the moment p = 0.048

  14. But more appropriately What is P(3000) or a mean as unlikely or more unlikely from a distribution with mean 3300? Values >3600 are just as unlikely p = 0.096

  15. 0.096 is probability of a sample like this if this group of women have babies according to the national average • How far is too far? i.e., Is that a big or small probability? • Use 0.05 (1 in 20)

  16. 0.05 is the (arbitrary) significance level used • If probability of our result or one as extreme or more extreme is ≤ 0.05 we REJECT our null hypothesis. • If probability of our result or one as extreme or more extreme is > 0.05 we DO NOT REJECT our null hypothesis. p ≤ 0.05, test is significant p > 0.05, test is not significant

  17. Learning objectives for the week By actively following the lecture and practical and carrying out the independent study the successful student will be able to: • to explain why we need statistical tests and the logic of hypothesis testing (MLO 1) • use the R command line as a calculator and to assign variables (MLO 3) • Create and use the basic data types in R (MLO 3) • find their way around the RStudio windows (MLO 3) • create, use and save a script file to run r commands (MLO 3) • search and understand manual pages (MLO 3)

  18. R: What and Why • What is R? • A free, open source language for statistical computing and graphics. • R provides • data handling and storage, • many methods for standard calculations, • many tools for formatting, analysing and graphing data, • a programming language for user-defined methods. • R is often run in RStudio

  19. Command basics the prompt Command line > 2 + 3 [1] 5 > x <- 7 > x [1] 7 > x + 2 [1] 9 output assignment view contents of x Variables can be used in calculations > z <- x + 2 > z [1] 9 > num <- c(4,5,6) > num [1] 4 5 6 > mean(num) [1] 5 vector of values Results of calculations can be assigned Functions have brackets Arguments go inside the brackets

  20. Workshop • DataCampTutorial: create a free account.

  21. 1. Information: read # comments 3. Code: read and type 2. Instructions: read

  22. Workshop • RStudio: program • Suggest taking a look: Introduction to module and RStudio

More Related