210 likes | 342 Views
BIO503: Lecture 2. Jess Mar Department of Biostatistics jmar@hsph.harvard.edu. Harvard School of Public Health Wintersession 2009. Announcements. I've corrected the typos that some of you picked up on in my slides (thank you!).
E N D
BIO503: Lecture 2 Jess Mar Department of Biostatistics jmar@hsph.harvard.edu Harvard School of Public Health Wintersession 2009
Announcements • I've corrected the typos that some of you picked up on in my slides (thank you!). • The code which appears in the slides is posted on the website – lecture1_code.R, lecture2_code.R.
Recap of Lecture 1 R has different methods of storing data. These are called R objects. Vectors: > weight <- c(115, 190, 160, 148) We can make vectors bigger by adding values to the end: > weight <- c(weight, 120) Or extract individual values: > weight[1] Or several values: > weight[1:3]
Matrices & Data Frames Matrices are 2D extensions of vectors: > height <- c(150, 175, 188, 170, 169) > bio.info <- cbind(weight, height) We can extract columns and rows, a bit like specifying (x,y) coordinates: > bio.info[,1] And rows: > bio.info[1,] Data frames can store different types of data in the same object. > names <- c("jess", "lingling", "john", "eric", "sara") > bio.df <- data.frame(id = names, h = height)
Lists Lists can have multiples components where there aren't any restrictions on what these components must be. > myList <- list(bio = bio.info, randomNumber = 2, threeCities = c("baltimore", "philadelphia", "pittsburg")) We can extract components using indices: > whereToGo <- myList[[3]] Or using the component's name: > names(myList) > whereToGo <- myList$threeCities
Workspace & Functions We can keep track of the objects we've created in R: > ls() Objects can be queried to find out what they are: > class(height) > is.matrix(bio.info) Functions follow the standard structure: > funcName <- function(x){ # code } To use the function: > funcName(z)
For loops A for loop iterates over a predetermined number of loops. Consider numbers 1 to 10. > a <- 1:5 > b <- 6:10 > x <- cbind(a,b) Note we can also create this matrix x using: > x <- matrix(1:10, ncol=2, nrow=5, byrow=F) Calculate the median for each row of x. > x.med <- NULL > for( i in 1:5 ){ m <- median(x[i,], na.rm=T) x.med <- c(x.med, m) }
Making (.R) History Keeping a copy of the R code you've written, even while just learning is valuable. R automatically keeps track of all the commands you've entered during the session. > history() Note: ?history tells us that to see all the commands, use > history(max.show=Inf) We can save these commands as a .Rhistory file. > savehistory(file=".Rhistory") In a new session, we can load an existing .Rhistory file. > loadhistory(file=".Rhistory")
Saving Your R Session You can also save the R objects that you generated during an R session. > save.image(file=".RData") In a new R session, you can load these objects back in. > load(".RData") These .RData files can get quite huge, so it might be more sensible to save only a few key objects (you probably don't need everything anyway). > save(list=c("bmi", "height", "weight", "names"), file="bmi.RData") To use these objects in a different R session, > load("bmi.RData")
Tutorial Download the tutorial from the course website. Open up a new file in Notepad. Write all your R code in here. Copy and paste over to R, line by line. Start comments with #.
Getting Data Into R Data comes in all shapes and sizes, so absolute advice on importing data invariably doesn't work. For rectangular (flat files) data, use read.table. > fileName <- "yeastData.txt" > yeastDat <- read.table(fileName, header=T, sep="\t") File has column labels. Denotes the delimiter. "\t" for tab-delimited files, "," for comma-separated.
Getting Data into R For data files that aren't rectangular or simpler than tables, use the scan function. > myG <- scan("GAPDH.fasta", what="txt", sep="\n") Both scan and read.table have a lot of input arguments. This means even if your data is quirky, you can probably tweak the arguments to read your data into R.
Implicit Loops - apply The implicit loops apply a function to elements of an object in parallel. These are much faster than using regular loops. The apply function works on matrices. The second argument lets you choose to apply the function to rows (1) or columns (2). > geneAvg <- apply(yeastDat, 1, mean, na.rm=T) > expAvg <- apply(yeastDat, 2, mean, na.rm=T)
Implicit Loops – lapply and sapply The lapply and sapply functions work on list objects. > fruitList <- list(red=c("apple", "tomato"), yellow=c("banana"), orange=c("orange", "carrot", "tangerine")) > l.out <- lapply(fruitList, length) > s.out <- sapply(fruitList, length) What's the difference between the two output objects? > class(l.out) > class(s.out)
Making Sequences Rather than always building vectors from the ground up, we can also generate these with other functions. One example: the sequence function. > seq(from=1, to=20, by=2) > seq(from=1, to=20, length.out=9) > seq(50) Note: instead of the last example, you could also use 1:50. These constructs come in handy for plotting curves and graphics. They are also useful when we want to build vectors of indices and extract these from other objects, as we’ll soon see later.
Replication Creation – the rep Function We can generate repeated values using rep. > x <- 1:4 > rep(x, 2) > rep(x, each=2) A simple function can scale up to be quite complex. Try these examples out: > rep(x, c(2,1,2,1)) > rep(x, each = 2, len = 4) > rep(x, each = 2, times = 3) The input object x can be any vector or list.
Sorting Values Suppose 5 animals had a race, here are their race times: > times <- c(10, 9.5, 11.2, 30.4, 21.5) > racers <- c("wallaby", "kangaroo", "emu", "koala", "wombat") Our task is to order the animals, from fastest to slowest. We can use the functions sort and order to do this. > sort(times) > order(times) > racers[order(times)] Say we wanted the animals from slowest to fastest instead? > rev(order(times)) > racers[rev(order(times))]
Adding Names to Vectors We can also attach labels to the elements of a vector. > names(times) <- racers An alternative to the code on slide 17 then, is: > sort(times) In this way, we don't have to use order to get back the names of the animals. Thanks Lingling for the terrific suggestion!
Conditional Statements – if else These allow us to control the flow of computations within our code. Examples: if else, while, repeat, break Consider the following toy example: > sky <- "blue" > if( sky == "blue" ){ now <- "day" } else{ now <- "night" } Alternatively, the same can be achieved more succinctly: > now <- ifelse(sky == "blue", "day", "night")
Conditional Statements – while while(condition) expression The expression is continues to be executed whenever the condition holds true. For example: Newton's method for calculating the square root of y. > y <- 12345 > x <- y/2 > while( abs(x*x-y) > 1e-10 ) x <- (x + y/x)/2 > x^2
Conditional Statements - repeat Alternatively we can use repeat and break to achieve the same result. > x <- y/2 > repeat{ x <- (x + y/x)/2 if( abs(x*x-y) < 1e-10 ) break } > x^2