1 / 52

Introduction to R Workshop By Stephen O. Opiyo

Introduction to R Workshop By Stephen O. Opiyo. MCIC-Computational Biology Laboratory Day 1. Outline. R workshop outline (http://kh-tps.osu.edu/ Routline.pdf) Introduction to R (Day 1) Graphics (data visualization) in R (Day 2) Basic Statistics using R (Day 2). Getting Started.

evan
Download Presentation

Introduction to R Workshop By Stephen O. Opiyo

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to R WorkshopBy Stephen O. Opiyo MCIC-Computational Biology Laboratory Day 1

  2. Outline • R workshop outline (http://kh-tps.osu.edu/Routline.pdf) • Introduction to R (Day 1) • Graphics (data visualization) in R (Day 2) • Basic Statistics using R (Day 2)

  3. Getting Started • Assume that R for Windows and Macs already installed on your laptop. (Instructions for installations given at the webpage below) http://kh-tps.osu.edu/RWindows.pdf http://kh-tps.osu.edu/RMac.pdf Open R on your Windows or Mac

  4. R on Windows

  5. R on MACs

  6. History of R • Idea of R came from S developed at Bell Labs since 1976 • S intended to support research and data analysis projects • S to S-Plus licensed to Insightful (“S-Plus”) • R: Open source platform similar to S developed by Robert Gentleman and Ross Ihaka (U of Auckland, NZ) during the 1990s. Since 1997: international “R-core” developing team • Updated versions available every two monthshttp://www.r-project.org/

  7. What is R? • Data handling and storage: numeric, textual • Matrix algebra • Tables and regular expressions • Graphics • Data analysis

  8. R is not • R is not • a database • a collection of “black boxes” • a spreadsheet software package • commercially supported

  9. Useful reading materials • R for Beginners • http://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf • An Introduction to R” by Longhow Lam • http://cran.r-project.org/doc/contrib/Lam-IntroductionToR_LHL.pdf • Practical Regression and Anova using R • http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf • An R companion to ‘Experimental Design • http://cran.r-project.org/doc/contrib/Vikneswaran-ED_companion.pdf • The R Guide • http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf • R for Biologists • http://cran.r-project.org/doc/contrib/Martinez-RforBiologistv1.1.pdf

  10. Useful reading materials • Multilevel Modeling in R • http://cran.r-project.org/doc/contrib/Bliese_Multilevel.pdf • R reference cards • http://cran.r-project.org/doc/contrib/refcard.pdf • http://cran.r-project.org/doc/contrib/Short-refcard.pdf • http://cran.r-project.org/doc/contrib/Baggott-refcard-v2.pdf • R reference card data mining • http://cran.r-project.org/doc/contrib/Short-refcard.pdf • RStudio - Documentation • http://www.rstudio.com/ide/docs/

  11. Useful books • Learning Rstudio for R Statistical Computing by Mark Van Der Loo, Edwin De Jonge Paperback, 126 pages Published December 25th 2012 by Packt Publishing ISBN 1782160604 (ISBN13: 9781782160601) • Getting Started with RStudio By: John Verzani Publisher: O'Reilly Media, Inc. Pub. Date: September 22, 2011 Print ISBN-13: 978-1-4493-0903-9 • R Graphics Cookbook by Winston Chang (Jan 3, 2013) • R For Dummies by Meys, Joris, de Vries • The R Book by Michael J. Crawley

  12. RStudio

  13. RStudio • RStudiois a free open source integrated development environment forR (http://www.rstudio.com/ide/) • Assume RStudiofor Windows and Macs already installed on your laptop. (Instructions for installations given at the webpage below) http://kh-tps.osu.edu/RSWindows.pdf http://kh-tps.osu.edu/RSMaC.pdf

  14. Using RStudio Script editor View variables in workspace and history file View help, plots & files; manage packages R console

  15. Open RStudio and set up a working directory

  16. Datasets and R scripts • Save all the files to R_workshop directory - Day 1 R files (D1_Example_1.R to D1_Example_12.R ) = 12 files - Day 1 data files (D1_Data_1.csv or txt to D1_Data_2.csv or txt) = 4 files - Day 2 R files (D2_Example_1.R to D2_Example_3.R ) = 3files - Day 2 data files (D1_Data_1.csv to D1_Data_3.csv) = 3 files

  17. R: Session management

  18. R: session management • You can enter a command at the command prompt in a console (>). • To quit R, use >q(). • Result is stored in an object using the assignment operator: (<-) or the equal character (=). Test<-2 and Test=2 • Test is an object with a value of 2 • To print (show) the object just enter the name of the object • Test

  19. Naming object in R • Object names cannot contain `strange' symbols like !, +, -, #. • A dot (.) and an underscore (_) are allowed, also a name starting with a dot. • Object names can contain a number but cannot start with a number.(E.g., Example_1, not 1Example_1) • R is case sensitive, X and x are two different objects, as well as temp.1 and temP.1

  20. R_Example_1 • Example1<-c(1,2,3,4,5) Example1 • 1Example<-c(1,2,3,4,5) • Example2<-c("Second", "Example") Example2 • 2Example<-c("Second", "Example") • Temp.1<-c("Second", "Example") • Temp.1 • Temp.1

  21. R: session management • Your R objects are stored in a workspace • Workspace is not saved on disk unless you tell R to do so. • Objects are lost when you close R and not save the objects. • R session are saved in a file .RData.

  22. R: session management • Just checking what the current working directory type getwd() • To list the objects in your workspace: ls() • To remove objects you no longer need: rm() • To remove ALL objects in your workspace: rm(list=ls()) or useRemove all objects • To save your workspace to a file, you may type save.image() or use Save Workspace… in the File menu • The default workspace file is called .RData

  23. Basic data types

  24. Basic data types • Basic data types: Logical, numeric, and character, etc.

  25. Basic data type (Example 2) • Logical (True or False) • Numerical (relating to numbers) • Characters (letters, numerical digits, common punctuation marks (such as "." or "-”), etc.

  26. Vectorsand Matrices • A vector • Ordered collection of data of the same type • Example: last names of all students in this workshop • In R, single number is a vector of length 1 • A matrix • Rectangular table of data of the same type • Example: Mean intensities of all genes measured during a microarray experiment

  27. Vectors (Example 3) • Vector: Ordered collection of data of the same data type > x <- c(5.2, 1.7, 6.3) > log(x) [1] 1.6486586 0.5306283 1.8405496 > y <- 1:5 > z <- seq(1, 1.4, by = 0.1) > yz<-y + z [1] 2.0 3.1 4.2 5.3 6.4 > length(y) [1] 5 > M<-mean(y + z) [1] 4.2

  28. Operation on vector elements (Example 4) > Mydata [1] 2 3.5 -0.2 > x4<-Mydata > 0 [1] TRUE TRUE FALSE > x5<-Mydata[Mydata>0] [1] 2 3.5 > x6<-Mydata[-c(1,3)] [1] 3.5 • Test on the elements • Extract the positive elements • Remove elements 1 and 3 • Mydata <- c(2,3.5,-0.2)Vector (c=“concatenate”)

  29. Operation on vector elements (Example 4) > Colors <- c("Red","Green","Red") Character vector > Colors[2] [1] "Green" > x1 <- 25:30 > x1 [1] 25 26 27 28 29 30Number sequences > x2<-x1[3:5] [1] 27 28 29 Various elements 3 to 5 >x3<-x1[c(2,6)] Elements 2 and 6

  30. Operation on vector elements (Example 5) > x <- c(5,-2,3,-7) > y <- c(1,2,3,4)*10Operation on all the elements > y [1] 10 20 30 40 > sort(x)Sorting a vector in ascending order [1] -7 -2 3 5 > sort(x, decreasing=TRUE) Sort in descending order [1] 5 3 -2 -7 > order(x) [1] 4 2 3 1Element order for sorting > y[order(x)] [1] 40 20 30 10Operation on all the components > rev(x)Reverse a vector [1] -7 3 -2 5

  31. Matrices (Example 6) • Matrix: Rectangular table of data of the same type > m <- matrix(1:12) Create matrix using the “matrix function” m [,1] [1,] 1 [2,] 2 [3,] 3 [4,] 4 [5,] 5 [6,] 6 [7,] 7 [8,] 8 [9,] 9 [10,] 10 [11,] 11 [12,] 12 V<-1:12 V1<-c(1,2,3,4,5,6,7,8,9,10,11,12)

  32. Matrices (Example 6) • Matrix: Rectangular table of data of the same type > mr <- matrix(1:12, 4) four rows mr [,1] [,2] [,3] [1,] 1 5 9 [2,] 2 6 10 [3,] 3 7 11 [4,] 4 8 12

  33. Matrices (Example 6) • Matrix: Rectangular table of data of the same type > mm <- matrix(1:12, 4, byrow = T); m By row creation [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9 [4,] 10 11 12 > tmm<-t(mm) t is transpose [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > dim(mm) dim is dimension [1] 4 3 four rows and three columns > dim(t(mm)) [1] 3 4 three rows and four columns

  34. Matrices (Example 6) Matrix: Rectangular table of data of the same type • > x <- c(3,-1,2,0,-3,6) • > x.mat <- matrix(x,ncol=2) Matrix with 2 cols > x.mat [,1] [,2] • [1,] 3 0 • [2,] -1 -3 • [3,] 2 6 • x.matr <- matrix(x,ncol=2, byrow=T) By row creation • > x.matr • [,1] [,2] • [1,] 3 -1 • [2,] 2 0 • [3,] -3 6

  35. 0peration on matrices (Example 6) > x.matr[,2]2ndcol [1] -1 0 6 > x.matr[c(1,3),]1st and 3rd lines [,1] [,2] [1,] 3 -1 [2,] -3 6 > x.mat[-2,] remove second row (No 2nd line) [,1] [,2] [1,] 3 -1 [2,] -3 6

  36. R as a calculator (Example 7) In R, plus (+), minus (-), division(/), and multiplication (*) > 5 + (6 + 7) * pi^2 [1] 133.3049 > log(exp(1)) [1] 1 > sin(pi/3)^2 + cos(pi/3)^2 [1] 1 > Sin(pi/3)^2 + cos(pi/3)^2 Error: couldn't find function "Sin” Take few minutes to do any calculations

  37. Listsand data frames

  38. Lists • A list is an ordered collection of data of arbitrary types. • Lists don’t need to be of the same mode or type and they can be a numeric vector, a logical value and a function and so on. • A component of a list can be referred as aa[[I]] or aa$x, where aa is the name of the list and x is a name of a component of aa.

  39. Lists (Example 8) • aa[[1]] is the first component of aa, while aa[1] is the sublist consisting of the first component of aa only. • list: an ordered collection of data of arbitrary types. > doe = list(name="john", age=28, married=F) • doe[[1]] • [1] "john" • > doe[1] • $name • [1] "john” • > doe$name • [1] "john“ • > doe$age • [1] 28 • Typically, vector elements are accessed by their index (an integer), list elements by their name (a character string). But both types support both access methods.

  40. Lists are very flexible (Example 8) >my.list <- list(c(5,4,-1),c("X1","X2","X3")) >my.list [[1]]: [1] 5 4 -1 [[2]]: [1] "X1" "X2" "X3" >my.list[[1]] [1] 5 4 -1 > my.list1 <- list(c1=c(5,4,-1),c2=c("X1","X2","X3")) >my.list1$c2[2:3] [1] "X2" "X3"

  41. Lists: Session (Example 8) Empl <- list(employee=“Anna”, spouse=“Fred”, children=3, child.ages=c(4,7,9)) Empl[[4]] [1] 4 7 9 Empl$child.ages [1] 4 7 9 Empl[4] # a sublist consisting of the 4th component of Empl $child.ages [1] 4 7 9 #letters is a function for a to z #toupper(letters) for capital A to Z #LETTERS for capital A to Z names(Empl) <- letters[1:4] names(Empl) [1] "a" "b" "c" "d" Empl1 <- c(Empl, service=8) $a [1] "Anna" $b [1] "Fred" $c [1] 3 $d [1] 4 7 9 $service [1] 8 unl<-unlist(Empl) # converts it to a vector. Mixed types will be converted to character, giving a character vector . unl a b c d1 d2 d3 service "Anna" "Fred" "3" "4" "7" "9" "8"

  42. More lists (Example 8) > x <- c(3,-1,2,0,-3,6) >x.mat <- matrix(x,ncol=2) Matrix with 2 cols > x.mat [,1] [,2] [1,] 3 -1 [2,] 2 0 [3,] -3 6 > dimnames(x.mat) <- list(c("L1","L2","L3"), c("R1","R2")) > x.mat R1 R2 L1 3 -1 L2 2 0 L3 -3 6

  43. Data frame Data frame: Rectangulartable with rows and columns; data within each column has the same type (e.g. number, text, logical), but different columns may have different types. Exampleof a data frame with 10 rows and 3 columns

  44. Importing and exporting data in R

  45. Importing Data (Exercise 9) • The easiest way to enter data in R is to work with a text file, in which the columns are separated by tabs; or comma-separated values (csv) files . Example of importing data are provided below. • mydata <- read.table("D1_Data_1.csv", header=TRUE) • mydata_correct <- read.table("D1_Data_1.csv", sep=”, ", header=TRUE) • mydata <- read.csv("D1_Data_1.csv", header=TRUE) • mydatab<- read.table("D1_Data_1.txt", header=TRUE) • Mydatab_correct<- read.table("D1_Data_1.txt", , sep=”,\t", header=TRUE) • mydatab<- read.delim("D1_Data_1.txt", header=TRUE) • Import from Web URL • Data_URL<-read.table(file=http://kh-tps.osu.edu/D1_Data_1.csv, header=TRUE, sep=“,”) • Importing data in Rstudio using (Import Dataset) on the Workspace Importing data from Excel, SAS, SPSS see the link below http://www.statmethods.net/input/importingdata.html

  46. Keyboard Input (Exercise 10) • # create a data frame from scratch • age <- c(25, 30, 56, 49, 12, 16, 60, 34, 45, 22) • gender <- c("male", "female", "male","male", "female", "male","male", "female", "male","male") • weight <- c(160, 110, 220, 100, 65, 120, 179, 134, 165, 153) • mydata<- data.frame(age,gender,weight)

  47. Viewing Data (Exercise 10) There are a number of functions for listing the contents of an object or dataset. # list the variables in mydatanames(mydata) # list the structure of mydatastr(mydata) # dimensions of an objectdim(mydata)

  48. Viewing Data (Example 10) # class of an object (numeric, matrix, dataframe, etc)class(mydata) # print mydata mydata # print first 6rows of mydatahead(mydata) # print first 2 rows of mydatahead(mydata, n=2) print last 6 rows of mydatatail(mydata) # print last 2 rows of mydatatail(mydata, n=2)

  49. Missing Data (Example 11) In R, missing values are represented by the symbol NA (not available) . Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Testing for Missing Values is.na(x) # returns TRUE of x is missing y <- c(1,2,3,NA) is.na(y) # returns a vector (F F F T) Excluding missing values from analyses Arithmetic functions on missing values yield missing values. x <- c(1,2,NA,3) mean(x)          # returns NA mean(x, na.rm=TRUE) # returns 2

  50. Exporting Data (Example 12) • To A csv File • write.table(mydata, "c: mydata.csv", sep=", ") • To To A Tab Delimited Text File • write.table(mydata, "c: mydata.csv”, , sep=“\t ") • Exporting R objects into other formats . For SPSS, SAS and Stata. you will need to load the foreign packages. • For Excel, you will need the xlsReadWrite package.

More Related