1 / 40

Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1)

Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1). Scope of Introductory Workshop on. Vector manipulations and referencing Matrices – declaration and manipulation (rows/columns) – rbind

livvy
Download Presentation

Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic R Programming for Life Science Undergraduate Students Introductory Workshop(Session 1)

  2. Scope of Introductory Workshop on • Vector manipulations and referencing • Matrices – declaration and manipulation (rows/columns) – rbind • Data frames – import from xls/csv/txt files and statistical manipulation • Introducing data categorisation using R datatype - Factor • Simple graph plotting • More statistical analysis • Simple example of linear regression • Quick Revision • Future classes on R • How to install R platform on your machine • How to install R packages and dependencies • How to get help and instructions • How to use a library • Variables and assigning values to variables • Data types which R accepts • Arithmetic manipulations of variables (+ - * / % ** etc) • Browsing and managing your variables (ls, rm) • Assigning vectors - the c() command

  3. What is ? R = software and programming language R is mainly used for statistical analysis and for graphics generation Free Simple and intuitive ??? Available across difference platforms ( Mac, Unix/Linux/ Windows)

  4. Installation (administrator rights required) Starting with http://www.r-project.org/ Tip: install the latest version (or the last stable version)

  5. Installation Starting with http://cran.bic.nus.edu.sg

  6. Installation Starting with

  7. Your very first interface Default prompt in R

  8. Additional functions that are not included within the “base package” Installation (additional packages)  install.packages(“package name”) To use package, type “library(package name)” Starting with  Packages

  9. Starting with Confused on R commands, get help On the GUI  ?(function) or ??(function) Via WWW  http://cran.r-project.org or http://www.rseek.org/

  10. Fundamentals of Programming Simple data input and manipulation Declaration of object (variable) • Take note that object names are • case sensitive (i.e. x is different from X) • do not contain spaces, numbers or symbols • Comprehensible

  11. Data types See for example http://www.statmethods.net/input/datatypes.html for more details Rich set of datatypes in R Commonly encountered datatypes in R • Scalars • Vectors (numerical, character and logical) • Matrices (2D) • Arrays (can have more than 2 dimensions) • Data frames • Lists • Factors Previous slide

  12. Fundamentals of Programming Perform simple manipulations e.g. arithmetic calculations For more built-in R arithmetic functions, visit http://ww2.coastal.edu/kingw/statistics/R-tutorials/arithmetic.html

  13. Fundamentals of Programming Removing variables when they are not required Use “ls()” to check if object declared is still kept in memory To remove object from memory, do “rm(x)”

  14. Fundamentals of Programming More complex data inputs Data Vectors  list of objects X 1 2 3 4 5 X (object) X (vector)

  15. Fundamentals of Programming Assigning a data vector x <- c(1,2,3,4,5) 1 2 3 4 5 1 2 3 4 5

  16. Experiment for yourself http://www.statmethods.net/input/datatypes.html Define a vector var1 with values 1,2,3 Define a vector var2 with values 4,5,6 What value is var2[4] ? What is the sum of var1 ? What is the R code to assign object subsetvar1 with the first element of var1. What is the product of var1 and var2 ?

  17. Experiment for yourself Define a vector var1 with values 1,2,3  var1 <- c(1,2,3) Define a vector var2 with values 4,5,6  var2 <- c(4,5,6) What value is var2[4] ?  NA What is the sum of var1 ?  6 What is the R code to assign object subsetvar1 with the first element of var1.  subsetvar1 <- var1[1] What is the product of var1 and var2 ?  4 10 18

  18. Fundamentals of Programming More complex data structures  Matrices

  19. Fundamentals of Programming Declaring a matrix

  20. Fundamentals of Programming Simple manipulations of data matrix • > y [1,] – 1 3 8 1 2 3 • > y [,3] – 8 5 7 1 1 2 3 • Simple arithmetic manipulations • mean (y) – 4.666667 • sum(y[2,]) – 20 • Modify and add values • y[4,] <- c(6,2,2) • y <- rbind(y, c(3,9,8) ) • Tip: Think of rbind as “row combine” 4 5

  21. Fundamentals of Programming More complex data structures  Data frames

  22. Fundamentals of Programming Data frames

  23. Fundamentals of Programming Reading in from input files

  24. Fundamentals of Programming Simple manipulations with data frames • head(hfile,1) • summary(hfile) • Create subsets • new <- hfile[1,]

  25. Fundamentals of Programming Simple statistics with R Load file “Sampledata-1.txt” into R studentprofile <- read.table("B://Users/bchhuyng/Desktop/Sampledata-1.txt",sep="\t",header=TRUE) View the data loaded into R. studentprofile, head(studentprofile) How many categories are there in the field “Gender”? factor(studentprofile$Gender)

  26. Fundamentals of Programming “factor” function in R  store them as categorical variables M F F M M M F F M M M F F M F F F M M M M F M F F M M F F M M F F M M M F F F F F M M M F F M

  27. Fundamentals of Programming Usage of factor in plotting graphs Hu et. al, 2013

  28. Fundamentals of Programming Usage of factor in plotting graphs

  29. Fundamentals of Programming Calculate the mean and the standard deviation of the height and weight of the students. E.g.mean(studentprofile$Weight) median(studentprofile$Weight)

  30. Fundamentals of Programming Simple graph plotting with R View the distribution of height and weight of the 100 students ( data from “Sampledata-1.txt” ) plot(studentprofile$Weight, studentprofile$Height, main="Distribution of Height and Weight of students", xlab="Weight (Kg)", ylab="Height(cm)", pch=19, cex=0.5)

  31. Fundamentals of Programming

  32. Fundamentals of Programming What is the distribution of height and weight amongst students? hist(studentprofile$Weight,xlab="Weight (Kg)", main = "Distributional Frequency of student weight", ylim=c(0,8), xlim=c(40,90), breaks = 51)

  33. Fundamentals of Programming What is the distribution of height and weight amongst students? hist(studentprofile$Height,xlab="Weight (Kg)", main = "Distributional Frequency of student weight", ylim=c(0,8), xlim=c(140,190), breaks = 51)

  34. Fundamentals of Programming CAVEAT!!! http://www.r-bloggers.com/normality-tests-don%E2%80%99t-do-what-you-think-they-do/ Is height and weight of students sampled normally distributed? ks.test(studentprofile$Height, pnorm) ks.test(studentprofile$Weight, pnorm) H0: The data follow a specified distribution H1: The data do not follow the specified distribution p-value ≤ 0.05  Reject H0 p-value > 0.05  Do not reject H1

  35. Fundamentals of Programming Are the height and weight of students linearly correlated? reg1 <- lm(studentprofile$Height~ studentprofile$Weight)

  36. Fundamentals of Programming Are the height and weight of students linearly correlated?

  37. Fundamentals of Programming plot(studentprofile$Weight, studentprofile$Height, main="Distribution of Height and Weight of students", xlab="Weight (Kg)", ylab="Height(cm)", pch=19, cex=0.5) reg1 <- lm(studentprofile$Height~ studentprofile$Weight) abline(reg1,col=2)

  38. intro checklist: what have you learnt today? • Assigning vectors - the c() command • Vector manipulations and referencing • Matrices – declaration and manipulation (rows/columns) – rbind • Data frames – import from xls/csv/txt files and statistical manipulation • Introducing data categorization using R datatype - Factor • Simple graph plotting • More statistical analysis • Simple example of linear regression • How to install R platform on your machine • How to install R packages and dependencies • How to get help and instructions • How to use a library • Variables and assigning values to variables • Data types which R accepts • Arithmetic manipulations of variables (+ - * / % ** etc) • Browsing and managing your variables (ls, rm)

  39. Crawley, M.J. (2007) The R book. Macdonald, J., and Braun, W.J. (2010) Data Analysis and Graphics using R – an Example-based approach. Kabacoff, R.I. (2012) Quick-R : Data types http://www.statmethods.net/input/datatypes.html Accessed on 7/1/2014 King, W.B. (2010) Doing Arithmetic in R. http://ww2.coastal.edu/kingw/statistics/R-tutorials/arithmetic.htmlAccessed on 7/1/2014 Ian (2011) Normality tests don’t do what you think they do. http://www.r-bloggers.com/normality-tests-don%E2%80%99t-do-what-you-think-they-do/ Accessed on 7/1/2014 JorisMeys and Andried de Vries.How to Test Data Normality in a Formal Way in R. http://www.dummies.com/how-to/content/how-to-test-data-normality-in-a-formal-way-in-r.html Accessed on 7/1/2014 References

  40. Future classes on and packages • R has a very rich repertoire of packages • Statistical analysis • Microarray analysis • NGS • Etc etc.

More Related